Replying to Avatar Kis Sean

https://www.hpc-ai.tech/blog/colossal-ai-chatgpt

There's still hope for open source large models. The giants probably still have huge influence on these open source projects. But it's better than nothing.

On the other hand, these models are humongous, the power to inference/run them is too centralized. Neuromorphic and analog chips + better quantization (model shrinking) seems to be the only way to democratize them.

#AI #ArtificialIntelligence #Future #Prediction #Hardware #Neuromorphic #Analog

Some reference to the minimum hardware requirements:

1. GPT-NeoX 20B, with 20 billion fp16 (2bytes) parameters, requires 42GB VRAM to perform near real time inference. Simple math: 20 billion * 2 bytes = 40GB. The memory requirement is too high for a single non-high end GPU, so it will need to be split which introduced other parallelism problem.

2. Qualcomm deloyed 1 billion int8(1byte) parameters model to a Snapdragon 8 Gen 2 platform. Which is about 1GB RAM(not sure what kind) required to generate 512*512 pixel image at around 15s.

It's running on the NPU/APU to boost INT8 inference performance. Considering the low VRAM and low processor frequency, this is still quite impressive.

3. ChatGPT has 175 billion parameters, it's at about 8 times GPT-NeoX. Even after quantization (if possile), it will still consume 42 * 8 / 2 = 168GB fast memory.

All of these is just on the memory requirement side, There are so much more going on with computation and IO bottleneck. My conclusion is: without proper innovation on the AI inference hardware side, the current consumer hardware won't be able to handle LLM in real time. But smaller 1-10 billion parameters models on personal devices are still very promising, although a lot of work need to be done with the NPU and memory hierarchy architecture.

https://www.qualcomm.com/news/onq/2023/02/worlds-first-on-device-demonstration-of-stable-diffusion-on-android

https://nlpcloud.com/deploying-gpt-neox-20-production-focus-deepspeed.html#:~:text=Basically%20GPT%2DNeoX%20requires%20at,%2C%20A40%2C%20and%20RTX%20A6000.

Reply to this note

Please Login to reply.

Discussion

No replies yet.