Starting to run LLMs on my own hardware. These model files are often 4-40GB in size.

Where are the torrents? #asknostr

Reply to this note

Please Login to reply.

Discussion

Have you tried smaller models like Ollama for specific tasks?

Star coder is nice for testing and research

Thanks! I’ll check it out πŸ™

Yeah, I’ve got llama 3 running via ollama. Downloading the bigger version now. Should be able to run it once my new RAM arrives.

But mostly, I’m thinking about going forward. I plan to download and try out a whole bunch of different models all of which are extraordinarily large data files. This seems like a perfect job for BitTorrent.

Is that what i’d need to make a chat bot of myself?

I believe you could do this with ollama, yes. It supports saving and loading models, so you could feed it a bunch of info, then begin the bot serving from that point.

Which LLM are you running? Mine is ridiculously slow

Right now, I’m running Mistral through ollama. Takes a while to get responses back. My machine is a 6-core Intel-based 2018 era Alienware gaming PC with 16GB RAM.

I’ve got 64GB RAM arriving today. I don’t expect it to help with performance, but it should enable me to run bigger models.

Are you using start 9?

No, this is just a repurposed, 2018-era Alienware PC.

what kind of gpu and how much vram in it? afaik running inference is constrained by gpu vram.

8B models run very fast on my 8GBvram gpu

Good question! I’m not sure, I’ll have to check.

You’re the one who is slow. πŸ˜†

πŸ˜‚πŸ˜‚