Nostr Web Client

i think it’s because reddit makes up such a large portion of the training data —i’m always deleting emojis from generated code lol … should probably add that to my rules file like “you are a middle aged dev. you think emojis in code are cringe.”

Anthony Accioly 2mo ago

100% agreed. Statistically, AI is trained on social media posts and the garbage throw away exercise / practice code in personal GitHub repos. Meanwhile, good specs and code (i.e. most things of proper size, with, for example, a complex domain and a rich DDD model, Hexagonal Architecture from well-crafted specs, a proper test pyramid, good CI/CD practices, observability, etc.) are the exception. Examples that are often locked away in private repos or scattered across books and rare content that make up, what, than 0.00001% of the datasets these models are trained on.

Good specs and code are basically irrelevant compared to all of the social media posts and horrible code the models ingest.

I don’t get why people are even surprised by specs full of emojis, agents publishing PII to public S3 buckets and deleting tests when asked to fix something. This is exactly how I’d expect models to behave given the training data they’ve been fed.

Reply to this note

Please Login to reply.

Discussion

No replies yet.