You can't "vibe code" an app like nostr:nprofile1qqs83nn04fezvsu89p8xg7axjwye2u67errat3dx2um725fs7qnrqlgzqtdq0

I used to have doubts, now I'm 100% certain.

I say this after having created #purplestack with that idea. (It still can do smaller apps right)

These agents/LLMs are just not good enough. With the latest models including GPT-5, progress on this front appears to have stalled.

Medium-term bearish on AI systems not involving regular human input.

Reply to this note

Please Login to reply.

Discussion

It's possible, but I think LLM capability is plateau'd. You have to extend the brain by adding memories and other functions. The rest of what's missing is not LLMs but regular code.

How does that look like in practice? MCP servers and context files, or something else?

Retrieval augmented generation helps a lot, and still has room for improvement imo. If we start seeing more specialized LLMs they could also start cross-referencing each other using RAG as well. I dont think this exists yet tho.

AI script on a cron job. It would manage a mind-map that fits entirely in its context window. It would use this to produce tasks that would be fed to the system prompt of other AI calls. It would need specialized tools to verify results (eg, a tool to navigate an Android emulator and screenshot it). Upon getting results, it would update its mind-map and continue repeating.

To scale this (if the mind-map exceeds a context window) you would first show the AI a shallow version of the map, and it would call a tool to expand branches of it before it generates any tasks.

Makes sense, is this something you came up with or it's an established idea?

I'm still skeptical seeing how these LLMs often "forget" things that are well in their context window.

Yeah. I also think the current architecture of LLMS has stalled. Im not too upset about it tho. The world needs to digest the giant leaps we've made for a minute anway imo.

I'm fine with it stalling too. Time to adjust expectations.

What I really want them to introduce is the bullshit detector. These things constantly lie even though they have the answer in their context window.

For now.

Not holding my breath

The thing with innovation is that they never work until they do.