Nostr Web Client

I'm trying to understand how to replace my not-very-private but useful ChatGPT 4.0 subscription with Llama 3.1.

ChatGPT translated the system requirements for https://llama.meta.com/ into slightly less confusing versions of "your beefy desktop is by far not enough".

So if I still need a compute cluster that would sit idle 99% of the time if I ran it just for myself, I'm kind of back at square one. I'd have to find a way to share these resources efficiently and privately.

Where can I use a powerful AI in a privacy preserving way? I want to pay with eCash and use it via TOR without any email or other accounts attached.

Alan Siefert 1y ago

What are the system requirements for the 405B and 70B versions? Having trouble parsing the website myself.

Reply to this note

Please Login to reply.

Discussion

Leo Wandersleb 1y ago

It's weird they don't prominently put these models side by side to help pick one.

I want the best and I want privacy. Meta itself gave me some numbers here:

nostr:nevent1qvzqqqqqqypzq3huhccxt6h34eupz3jeynjgjgek8lel2f4adaea0svyk94a3njdqy88wumn8ghj7mn0wvhxcmmv9uq3uamnwvaz7tmwdaehgu3dwp6kytnhv4kxcmmjv3jhytnwv46z7qpqaad092282x5ucgxw8gpl9a4c4lwjxvu0fgq55zs5j9zgqdg8p9csdyerw9

Alan Siefert 1y ago

With 64GB RAM I’d guess you could locally run the 70B version just fine with a 4000 series Nvidia GPU. Not sure about the 405B version. Supposedly the 5000 series GPUs are releasing beginning of 2025. I wonder if they will have LLMs in mind in their design.

Leo Wandersleb 1y ago

I assume the GPUs are needed for performance but not for quality of the results, right? So if for my own private use I don't have any GPU, I could still get the same results but would have to wait a minute instead of a second.

Vision 1y ago

With the 405B model? I don't think you can run this with only 64gb and no GPU, not with acceptable wait times.

Martin 1y ago

There is no way you will ever run the 405bn parameter model on consumer hardware.

For all the smaller models, a M3 MacBook pro with maxed out RAM works incredibly well. The M3 chip architecture makes it so that all the RAM is available to the GPUs as well, so you can get up to 96GB of RAM.

I can run all the models with ollama with crazy speed.

Alan Siefert 1y ago

I wouldn’t rule out consumer LLM focused cards with tons of VRAM

Martin 1y ago

I feel like these cards would always be outdated half a year later as bigger and bigger models come along.

Alan Siefert 1y ago

They probably would, but being able to run the most capable models locally every couple of years would still be huge.

Ľḭṿḙśƫṟãɖãṁṹṧ💫#RunCoreV30 1y ago

Huggingface?

Leo Wandersleb 1y ago

I see there are many more models on huggingface but can't see good data on what to expect depending on which model you run and on which hardware. I assume the 405B model produces qualitatively other results than the 8B model so I probably want the highest possible for my programming tasks.