Imagine an empty room with 1000 coins on the floor. The room sits above a subway line, and when trains go by it jostles the coins and flips some over.
The room is locked and you have the only key to this room.
You come into this room for the first time on Tuesday and you count 300 heads. On Wednesday you count 400 heads. What do you believe the count was on Monday and why?
And what does this have to do with the universe at large?

Source: x.com/unclebobmartin/status/1824068375640678769
It's been sobering to talk with engineers who use these tools, day-in, day-out.
I sense that AI tooling is not yet quite where engineers want it to be (and what some vendors claim their tools can do), but it’s headed in the right direction of making us a lot more productive.

Source: x.com/GergelyOrosz/status/1824014434261418185
Uuuu maybe is time to bring my viking followers to X … didn’t touch fb since 2020 when their stupid ai messed up my ads account

Source: x.com/nfkmobile/status/1824021380913942893
The latest Windows exploit is so bad that they got the social team out here trying to make people update lmao

Source: x.com/t3dotgg/status/1824011521346982200
Neo also made a good list.
Check it out!

Source: x.com/denicmarko/status/1823826901301125413
“[Agile] does not attempt to forecast, it attempts to be as productive as possible while not knowing the future. It focuses on resilience as a risk management strategy, not anticipation.”
Chris Morris (@the_chrismo by way of @tottinge)

Source: x.com/allenholub/status/1823957536141349284
The advertisers left on X are… scam projects impersonating the owner of the site.
And X lets this fly: not even vetting obvious crypto scams.
This is an actual ad (and a scam) that is allowed to run here, while X is busy suing advertisers to come back.

Source: x.com/GergelyOrosz/status/1823955061266948249

Source: x.com/t3dotgg/status/1823961274277093865
What if fetch was, like, better? I made a video discussing the possibility and the utility of libraries like ky (and why they might not be worth it)

Source: x.com/t3dotgg/status/1823961237874729068
I talk about a lot of things (e.g. working without estimates, not using up-front "requirements," the user's story, &c.) that can be challenging. I've a class coming up in three weeks [Practical Agility: From Stories to Code -- https://holub.com/classes] that covers all these topics, and more. Check it out!

Source: x.com/allenholub/status/1823780904457920627
There are a few words/phrases that describe pretty-good things but immediately set off a warning klaxton in my mind:
innovative
disruptive
here's a great idea!
the customers will love this
team player
Steve Jobs would have...
Care to add to the list?

Source: x.com/allenholub/status/1823788075161718942
ooo well ... lol Grok being GROK rofl :D WHO ... GFY lol

Source: x.com/nfkmobile/status/1823896766963405096
If this happened at an Apple event they would have murdered the engineers responsible

Source: x.com/t3dotgg/status/1823938397469335971
I'll start:
@natmiletic
@Shefali__J
@RitikaAgrawal08
@webdevluc
@RaulJuncoV
@alexxubyte
@milan_milanovic
@csaba_kissi
@TreciaKS
and many more.
Give them a follow!

Source: x.com/denicmarko/status/1823677936962003418
Share your favorite content creators.

Source: x.com/denicmarko/status/1823675457016893861
For those of you who want a more in-depth look at how I work, I've scheduled a class. This class covers pretty much the whole process of creating software with agility, using the user's story as a driver. Hope to see you there!

Source: x.com/allenholub/status/1823185179013497267
Love curry

Source: x.com/wesbos/status/1823172515910631778
Attempting to migrate from Styled Components → Panda CSS, but keep the styled.div`` API so I can have server component support

Source: x.com/wesbos/status/1823356244633047398
SQL injection-like attack on LLMs with special tokens
The decision by LLM tokenizers to parse special tokens in the input string (, <|endoftext|>, etc.), while convenient looking, leads to footguns at best and LLM security vulnerabilities at worst, equivalent to SQL injection attacks.
!!! User input strings are untrusted data !!!
In SQL injection you can pwn bad code with e.g. the DROP TABLE attack. In LLMs we'll get the same issue, where bad code (very easy to mess up with current Tokenizer APIs and their defaults) will parse input string's special token descriptors as actual special tokens, mess up the input representations and drive the LLM out of distribution of chat templates.
Example with the current huggingface Llama 3 tokenizer defaults:
Two unintuitive things are happening at the same time:
1. The <|begin_of_text|> token (128000) was added to the front of the sequence.
2. The <|end_of_text|> token (128001) was parsed out of our string and the special token was inserted. Our text (which could have come from a user) is now possibly messing with the token protocol and taking the LLM out of distribution with undefined outcomes.
I recommend always tokenizing with two additional flags, disabling (1) with add_special_tokens=False and (2) with split_special_tokens=True, and adding the special tokens yourself in code. Both of these options are I think a bit confusingly named. For the chat model, I think you can also use the Chat Templates apply_chat_template.
With this we get something that looks more correct, and we see that <|end_of_text|> is now treated as any other string sequence, and is broken up by the underlying BPE tokenizer as any other string would be:
TLDR imo calls to encode/decode should never handle special tokens by parsing strings, I would deprecate this functionality entirely and forever. These should only be added explicitly and programmatically by separate code paths. In tiktoken, e.g. always use encode_ordinary. In huggingface, be safer with the flags above. At the very least, be aware of the issue and always visualize your tokens and test your code. I feel like this stuff is so subtle and poorly documented that I'd expect somewhere around 50% of the code out there to have bugs related to this issue right now.
Even ChatGPT does something weird here. At best it just deletes the tokens, at worst this is confusing the LLM in an undefined way, I don't really know happens under the hood, but ChatGPT can't repeat the string "<|endoftext|>" back to me:
Be careful out there.

Source: x.com/karpathy/status/1823418177197646104
Solution writeup is now public:

Source: x.com/marktenenholtz/status/1823400293284917758