Avatar
Glyph
bd385aa0b579765c6883c5b0eb17e8ae350c988c659510be1e8453557ee38784
he/him You probably heard about me because I am the founder of the Twisted python networking engine open source project. But I’m also the author and maintainer of several other smaller projects, a writer and public speaker about software and the things software affects (i.e.: everything), and a productivity nerd due to my ADHD. I also post a lot about politics; I’d personally prefer to be apolitical but unfortunately the global rising tide of revanchist fascism is kind of dangerous to ignore.
Replying to 3ac0182c...

nostr:npub1pfe56vzppw077dd04ycr8mx72dqdk0m95ccdfu2j9ak3n7m89nrsf9e2dm I agree that it's rude and bad to do this, but GPT-4 has a high enough hit rate IME that this part seems like a stretch:

> These tools can’t answer questions; they mash words around, and will make up nonsense.

They definitely can answer questions. With RLHF, that is specifically what they're designed/trained to do, and they're pretty good at it in many domains. But, posting the answer without checking it is, as you say, either lying or bullshit.

nostr:npub1ve7g5q4lsth9z6n39mt9sctj8708sxcn465ucm8m9fancgg02l3ql8ydyh also while the marketing claims are that it’s more factual and reliable, academic literature does not seem to bear that out as far as I’ve seen. For a recent example, https://arxiv.org/pdf/2307.09009.pdf

I’ve seen it do okay at *parsing* tasks, where it’s only responsible for interpreting input rather than producing output. Still not 100% reliable but if you can check its work it doesn’t seem too bad. A “calculator for words” if you can structure your expectations appropriately

Replying to Avatar Glyph

nostr:npub1ve7g5q4lsth9z6n39mt9sctj8708sxcn465ucm8m9fancgg02l3ql8ydyh nostr:npub1nq52crat03tppdmz5xnczxzuyx9qv3xjtrj5af2r5ug5r0tp3teqnvlyhk this is an unrelated use of the technology, but my personal experience is that they don’t function very well for this. When senior engineers use it we discover that the output is riddled with errors and it’d be faster to configure our editors to just generate any boilerplate that we need frequently. When junior engineers use it they produce code riddled with errors they’re not catching. If we’re lucky they’re just bugs and not vulnerabilities. https://gist.github.com/0xabad1dea/be18e11beb2e12433d93475d72016902

nostr:npub1ve7g5q4lsth9z6n39mt9sctj8708sxcn465ucm8m9fancgg02l3ql8ydyh nostr:npub1nq52crat03tppdmz5xnczxzuyx9qv3xjtrj5af2r5ug5r0tp3teqnvlyhk my own direct experience of using GPT for this is the same as the top of the thread though: its output is rarely even syntactically valid unless I’m asking it to regurgitate a tutorial’s examples directly. It has gotten me into the vicinity of a term that I could then look up in reference docs a few times, but asking it to _write code_ just doesn’t work unless a human being already wrote a very close analogue of the code I need and it’s in the training data

Replying to 3ac0182c...

nostr:npub1nq52crat03tppdmz5xnczxzuyx9qv3xjtrj5af2r5ug5r0tp3teqnvlyhk nostr:npub1pfe56vzppw077dd04ycr8mx72dqdk0m95ccdfu2j9ak3n7m89nrsf9e2dm it's a balancing act, but they can save a lot of time for simple tasks where lightly editing stackoverlow posts is the right answer. Lots of stuff is like that: bash commands with sed/xargs/etc, graphing data with matplotlib... Unless you specialize and do it all the time, it's faster to have the computer make up the code and manually touch it up if needed than to reread the flags section of the sed man page for the 50th time and forget it before the next time you use it.

nostr:npub1ve7g5q4lsth9z6n39mt9sctj8708sxcn465ucm8m9fancgg02l3ql8ydyh nostr:npub1nq52crat03tppdmz5xnczxzuyx9qv3xjtrj5af2r5ug5r0tp3teqnvlyhk this is an unrelated use of the technology, but my personal experience is that they don’t function very well for this. When senior engineers use it we discover that the output is riddled with errors and it’d be faster to configure our editors to just generate any boilerplate that we need frequently. When junior engineers use it they produce code riddled with errors they’re not catching. If we’re lucky they’re just bugs and not vulnerabilities. https://gist.github.com/0xabad1dea/be18e11beb2e12433d93475d72016902

Replying to b52f447e...

nostr:npub1pfe56vzppw077dd04ycr8mx72dqdk0m95ccdfu2j9ak3n7m89nrsf9e2dm

It's amazing how many programmers and colleagues tell me they're using it to write software (even bigger programs) because every bit of code I've gotten out of them, if it isn't just buggy and wrong, is clearly mashed together from stack overflow and blog posts.

Which, tbf, these are the kinds of people that would just mash together SO answers themselves. But by getting the LLM to do it, you also gets little hallucinated bugs and extras! And you still learn less than nothing!

nostr:npub1nq52crat03tppdmz5xnczxzuyx9qv3xjtrj5af2r5ug5r0tp3teqnvlyhk we are definitely learning some interesting things about the human mind from LLMs but not because they are like the human mind

Replying to 64a2ba94...

nostr:npub1pfe56vzppw077dd04ycr8mx72dqdk0m95ccdfu2j9ak3n7m89nrsf9e2dm I think "automatic free association" is one of the best descriptions of LLMs I've read. Thanks for that.

nostr:npub1nrjyfk7lemfmfqy6c4z2tpcdcfj6ry80745rhcedrt6ft8q7jycsec40hs thanks, this is a phrase I’d love to popularize :-). The difference between “brainstorming” and “bullshit” is largely a matter of context and intent rather than content. More than half of the problems with these things are dishonest marketing and the attendant incorrect user expectations of what the tools are doing.

Replying to Avatar Lenny

nostr:npub1pfe56vzppw077dd04ycr8mx72dqdk0m95ccdfu2j9ak3n7m89nrsf9e2dm somehow we've discovered the only response more annoying than "let me google that for you"

nostr:npub1hfyktnpxdkxc4n298fzl6fdwsm26hg2yhzrp5w2fq4uqunyg4mmqw9aw7q LMGTFY is at least understandable when people are clogging up a support channel with requests for easily-discovered information that more or less proves they’re just asking volunteers to do their homework for them. Still rude, still a bad idea, and it’s definitely metastasized into being an obnoxious quip from people who have no idea what the results are or whether they answer the question. But in some cases it’s an understandable frustration.

“I asked an LLM” is always wrong.

nostr:npub17sgpd5lp7xg7s53lgvr5tutnv7v8d566e6auv3kkzke0jsjt3lkqs4p3zu as with many such forms if nostalgia, I suspect that those people don’t actually miss the dock, they miss their twenties.

We. We miss our twenties

Using an LLM to discover search terms or get inspiration for research strategy which you can *independently verify* is like using a frying pan to cook a delicious frittata of facts. Posting the answers that it gives you as useful information that is true is like heating up the frying pan and then sticking your tongue directly on the pan in an attempt to eat it. The LLM output is the heat, not the food. Do not eat it.

If you want, you can ask an LLM and then do your *own* fact checking of its answer, using clues you find in its output. This can be helpful, as sometimes the phrasing you think of for a search won’t be the right magic words, and an LLM can help you find those. What these tools are doing is more like “automatic free association” than “answering questions”. They can be a useful *tool* for answering questions, but the way you use them is critically important.

Please, please do not ever respond to a post with an answer like “ChatGPT says…” or “Claude told me…”. It is very rude.

It is wrong. These tools can’t answer questions; they mash words around, and will make up nonsense. When the machine does it, it’s just gibberish, but by posting it, you’re turning it into a lie, and anyone who posts or repeats it without attribution will turn it into disinformation.

It wastes time. Now everyone has to fact-check you instead of researching the question.

nostr:npub1p93hvunhq2nsl4hxt40zc3gmwphw74dv5p9eycn39yzmxeshz3kquwrd82 There are apps that act more like paper. Playing with "Penbook" was great fun. But either I'm at my desk, in which case I will either use my actual paper notebook for journaling or just type for any other activity. I gather that there may be ways to do after-the-fact handwriting recognition, but I haven't figured that out.