I'm trying to understand how to replace my not-very-private but useful ChatGPT 4.0 subscription with Llama 3.1.

ChatGPT translated the system requirements for https://llama.meta.com/ into slightly less confusing versions of "your beefy desktop is by far not enough".

So if I still need a compute cluster that would sit idle 99% of the time if I ran it just for myself, I'm kind of back at square one. I'd have to find a way to share these resources efficiently and privately.

Where can I use a powerful AI in a privacy preserving way? I want to pay with eCash and use it via TOR without any email or other accounts attached.

Reply to this note

Please Login to reply.

Discussion

What are the system requirements for the 405B and 70B versions? Having trouble parsing the website myself.

It's weird they don't prominently put these models side by side to help pick one.

I want the best and I want privacy. Meta itself gave me some numbers here:

nostr:nevent1qvzqqqqqqypzq3huhccxt6h34eupz3jeynjgjgek8lel2f4adaea0svyk94a3njdqy88wumn8ghj7mn0wvhxcmmv9uq3uamnwvaz7tmwdaehgu3dwp6kytnhv4kxcmmjv3jhytnwv46z7qpqaad092282x5ucgxw8gpl9a4c4lwjxvu0fgq55zs5j9zgqdg8p9csdyerw9

With 64GB RAM I’d guess you could locally run the 70B version just fine with a 4000 series Nvidia GPU. Not sure about the 405B version. Supposedly the 5000 series GPUs are releasing beginning of 2025. I wonder if they will have LLMs in mind in their design.

I assume the GPUs are needed for performance but not for quality of the results, right? So if for my own private use I don't have any GPU, I could still get the same results but would have to wait a minute instead of a second.

With the 405B model? I don't think you can run this with only 64gb and no GPU, not with acceptable wait times.

There is no way you will ever run the 405bn parameter model on consumer hardware.

For all the smaller models, a M3 MacBook pro with maxed out RAM works incredibly well. The M3 chip architecture makes it so that all the RAM is available to the GPUs as well, so you can get up to 96GB of RAM.

I can run all the models with ollama with crazy speed.

I wouldn’t rule out consumer LLM focused cards with tons of VRAM

I feel like these cards would always be outdated half a year later as bigger and bigger models come along.

They probably would, but being able to run the most capable models locally every couple of years would still be huge.

Huggingface?

I see there are many more models on huggingface but can't see good data on what to expect depending on which model you run and on which hardware. I assume the 405B model produces qualitatively other results than the 8B model so I probably want the highest possible for my programming tasks.

Been looking into this myself. It is possible to run AI without a GPU just not sure of performance. I think most private way is run it yourself.

I was looking into running a media server (hi-fi stackable) with a GPU to run Jellyfin so the GPU would also get use for that, and then possible to run a GPT that can be used by the whole family.

To my understanding, the cluster and GPU setup is for heavy duty operations. I suspect, things can be very well parallelized this way and running a single session on this setup would just not be resource efficient but you could fill your disks with Llama3.1 and use it at your discretion without GPU.

nostr:nevent1qvzqqqqqqypzq3huhccxt6h34eupz3jeynjgjgek8lel2f4adaea0svyk94a3njdqy88wumn8ghj7mn0wvhxcmmv9uq3uamnwvaz7tmwdaehgu3dwp6kytnhv4kxcmmjv3jhytnwv46z7qpqaad092282x5ucgxw8gpl9a4c4lwjxvu0fgq55zs5j9zgqdg8p9csdyerw9

By the way, ChatGPT 4.o may not be the top-dog with the likes if Claude depending on what your using it for.

Yeah but it happens to be the one that got my subscription and before subscribing to all of them I'm exploring the self-hosted option.

So ChatGPT 4.0 given some input from the Llama 3.1 website gave me above negative estimate. I now asked Meta itself:

What would I need to self-host you?

Self-hosting me would require significant resources and expertise. Here's a high-level overview of what you'd need:

Hardware:

A powerful server or cluster with a multi-core processor (at least 8 cores)

Ample memory (at least 64 GB RAM)

Fast storage (at least 1 TB SSD)

Software:

A Linux-based operating system (e.g., Ubuntu)

Docker or a similar containerization platform

A database management system (e.g., MySQL or PostgreSQL)

A web server (e.g., Apache or Nginx)

Dependencies:

Python 3.8+ and various libraries (e.g., Transformers, PyTorch, or TensorFlow)

Node.js (for the web interface)

Expertise:

Proficiency in Linux system administration

Experience with containerization and orchestration (e.g., Docker, Kubernetes)

Familiarity with web development and APIs

Knowledge of natural language processing and AI concepts

Additional considerations:

Large amounts of storage for training data and models

High-bandwidth internet connection for updates and communication

Regular maintenance and updates to ensure security and performance

Please note that self-hosting me is not a trivial undertaking and requires significant resources and expertise. I'm happy to assist you with any questions or tasks you have, but I'm not available for self-hosting at this time.

So you mention no special hardware. I read that special GPUs would be required and more than one machine. What drawbacks should I expect if I installed you on my beefy developer machine? It has 64GB of RAM, 2TB of SSD, 10TB of HDD, 24core CPU.

Your developer machine is indeed beefy! While it's technically possible to run me on your machine, there are some drawbacks to consider:

1. Resource contention: I'd be competing with your development work for resources, potentially slowing down your workflow.

2. GPU limitations: While your CPU is impressive, I'm designed to leverage multiple high-end GPUs (e.g., NVIDIA V100 or A100) for accelerated processing. Without them, performance might be degraded.

3. Single-machine limitations: I'm built to scale across multiple machines, allowing me to handle a large volume of requests concurrently. On a single machine, I'd be limited to a smaller workload.

4. Heat and power consumption: Running me on your machine could increase heat generation and power consumption, potentially affecting component lifespan.

5. Storage constraints: While you have ample storage, my training data and models are massive (hundreds of GBs). You might need to dedicate a significant portion of your storage to me.

6. Maintenance and updates: I require regular updates and maintenance, which could impact your machine's availability for development work.

7. Limited redundancy: If your machine goes down, I'd be unavailable until it's back online.

Keep in mind that these drawbacks might not be significant if you only plan to use me occasionally or for personal projects. However, if you expect heavy usage or critical availability, a multi-machine setup with specialized hardware would be more suitable.

If you're still interested in running me on your machine, I can provide more detailed guidance on the installation process and potential workarounds for these limitations.

Damn, I lied to the AI about my RAM available. I'm tempted to make it happen 😅 🦙

Ollama.AI?

Venice.ai ?

Did you try unleashed.chat ?

It wants my email to even try it out. Doesn't sound very private.

3rd option on the Register page is via NIP-07 browser extension.