Avatar
Daniel Wigton
75656740209960c74fe373e6943f8a21ab896889d8691276a60f86aadbc8f92a
Catholic stay at home father of 6. Interested in spaceflight, decentralized communication, salvation, math, twin primes, and everything else.

This note contains no sarcasm.

Llama3.3 70B is the most useful that I have tried. Any smaller and you have to stick to very general knowledge. Asking for niche knowledge from a small model is a recipe for hallucinations.

I use it both for conversational AI to help me narrow search terms and for coding help via the Continue AI vscode plugin. I use, I think, deepSeek coder for auto complete.

The main drawback is that it is somewhat slow. I get 3.3 tokens per second which is equivalent to talking to someone who types 150 words per minute.

That is actually helpful because it is not so slow as to be intolerable but not so instant that I don't try to figure things out on my own first.

Is does require some decent hardware though. I've got a 4090 an 13900k and 64 gigabytes of RAM running at 6400 MT/s That last number is key. The 4bit quantization of llama3.3 is 42 GB. With 24GB of vram that leaves 18GB that have to be processed by the CPU for each token.

The result is that the GPU is actually not doing much. You probably don't need a 4090, just as much vram as you can get. A 5090 with 32 GB of vram should be able to do 6 tokens per second simply for having only 10GB to process on the CPU.

I don't have to venture out. My wife works at a hospital so all I have to do is go to the gift shop's website, select something nice, add to cart, checkout, enter details, get notified that I can't checkout as guest since I apparently already have an account, check password manager, come up empty, use the reset password feature, check for reset email, check junk folder for reset email, start all over using a different email.

Such convenient process. If only I could just do some sort of digital signature to identify myself.

Didn't know what to do for Valentine's Day but luckily the flower shop sells these fancy Belgian style ales.

Peace comes from knowing that you are not the source and are not the end. I have no internal knowledge or being that is not a gift from above. I can accept that and trust that the giver loves me. Agency comes from trusting the giver enough to know what to give me that I can use it freely.

Does anyone else feel like USAID is just being used as a fall-guy for the CIA/FBI/NSA/(whoever is tasked with logging my notes)?

Wordle 1,335 5/6

β¬›πŸŸ¨πŸŸ¨β¬›β¬›

β¬›πŸŸ¨β¬›β¬›πŸŸ¨

πŸŸ¨β¬›β¬›πŸŸ¨β¬›

πŸŸ¨πŸŸ©πŸŸ¨πŸŸ¨β¬›

🟩🟩🟩🟩🟩

Yes. It was the Zend Framework and Magento for me. I got so sick of having to learn new incantations that I created my own php mvc book of spells that instead of of doing "convention over configuration" I did convention by configuration. I.E. the framework would do magic however I want.

Still need to go back and finish modernizing that one.

Perhaps the full model is cheaper than o1 etc. but the distills are pointless. They have the exact same cost to run, per token, as the models they are based on but are worse than.

Yup. Give 3.1 or 3.3 instructions to show their work step by step and correct any mistakes and it does light years better. 3.3 is better even without a fancy prompt.

I really don't get the deepSeek love. I haven't tried the full model, but the 70B parameter distill is trash. It isn't actually a reasoning model. It merely apes being a reasoning model. It is really good at sounding like it is reasoning but it hallucinates far more than the llama3.3 model on which it is based.

I suspect the full model has similar features. It is reassuring to users to see that it is attempting a rationalization but the actual output isn't that great.

Wordle 1,334 3/6

πŸŸ¨β¬›πŸŸ¨β¬›β¬›

β¬›πŸŸ©πŸŸ¨πŸŸ¨β¬›

🟩🟩🟩🟩🟩

Team nostr!

It is nice how a really good furnace will just work and work until the coldest night of the year.

Replying to Avatar YODL

Inspiring. And at the center of dust?

You do only live once, so might as well do whatever you do correctly. You can't fix your life in post production.

This is why refuse to use millennial acronyms. Not going to have that blot on my life.