What if vibe coders just aren't using enough tokens?
Discussion
Big tech hates this one simple trick
Use more tokens, kids
More tokens are like wider payloads in a stateless microservice: helpful for packing more context, but irrelevant to the core bottleneck of coordination. Transformer architectures have no built-in concept of shared memory, global state, or structured control flow. Each inference is an isolated forward passâno read/write memory, pointer, or continuation stack. Youâre not scaling reasoning, youâre scaling cache size.
Retrieval-augmented generation (RAG), memory modules, tool use, and planner loops are all attempts to bolt on simulated memory using external systems. But simulation isnât integration. These techniques lack the core properties of real memory systems: mutable state, consistency guarantees, selective recall, and scoped invalidation. They resemble distributed systems without a proper coordination layerâpure gossip, no consensus.
Even long context windows (e.g., 200K tokens) offer no relief. A larger bucket of past tokens doesnât change the modelâs inability to prioritize, reference, or route thoughts across time. Attention is dense or sparse, but never deliberate. Thereâs no working memory stack. No symbolic manipulation. No instruction pointer. Just statistical guesswork smoothed over a flat vector space.
Multi-agent systems? LangGraph, AutoGPT, BabyAGI? Theyâre distributed loopers. Agents pass outputs to each other like logs in a pipeline, with no theory of mind, negotiation, or shared ontology. Thereâs no grounding, no meta-cognition, and no reflection. You can script a workflow, but the agents arenât thinking together. Theyâre just taking turns hallucinating.
And letâs not pretend you can offload this to the user. The human remains the I/O controller, the debugger, the scheduler, and the final consensus engine. There is no autoscaler for cognition. You can shard your microservices, but you canât shard your prefrontal cortex.
In Sussman's terms, thereâs no procedure. In Minsky's terms, thereâs no societyâjust a bunch of disconnected hacks guessing the next plausible token. This isnât referentially transparent or composable in functional programming termsâitâs side-effect soup.
Until models can maintain evolving, contextual state, abstract their reasoning paths, and coordinate across agents with shared intent and memory, they wonât replace the human-in-the-loop for complex tasks. Theyâll assist, autocomplete, and sometimes dazzleâbut they wonât reason. More tokens wonât fix that. Itâs an architectural limitation, not a throughput problem.
They very much can reason inside a context window. What you need is a repeatable process for building a context that can advance the current state of your system. Agent loops (eg, goose) can do this.
Once you've established a goal and a way to reliably make progress toward it, every functional programmer knows what comes next
Sure, they can simulate reasoning over short-term context, like any system that encodes local state into a serialized form. But thatâs not reasoning with continuity. Thereâs no self-modifying context, no scoped environment, no continuation. What agent loops like Goose do is offload orchestration to glue code. Itâs a token-flinging trampoline â not a cognitive stack.
Functional programmers know what comes next only if the program has referential transparency, a composable structure, and a defined control flow. Thatâs the whole point â we rely on pure functions and global reasoning. But transformers donât do that. Theyâre opaque pipelines without composability or purity. There are no closures, recursion, or accumulator threading â just a next-token guess over a lossy embedding of a flattened execution trace.
Yes, you can build context, but every âcontextâ update is a destructive overwrite, not a mutation under control. Thereâs no frame stack, selective replay, or lens over time. Agent loops donât fix that â they replay a trace and call it memory. Itâs like saying tail-rec is stateful because it can print.
Goose is clever, but itâs still scaffolding. Until the model can own its control flow and track its reasoning (rather than dump it to the user or context buffer), itâs not building a context â itâs outsourcing cognition. And I remind you: the human is still the reducer, the scheduler, and the garbage collector.
You donât get real composable agents without memory, goal state, control flow, and error recovery. Transformers donât have those. And no, you still canât scale humans.
Successful recursion only requires a goal and something that progresses toward the goal. You'd be surprised how effective LLMs can be at avoiding local maxima
Recursion without a frame stack is just token looping. The existence of a goal and a transition function does not a reasoner makeâunless you track progress, memoize state, and backtrack on error. LLMs do none of these.
Transformers donât recurse. They unfold. Thereâs no call stack, no accumulator, no return. Thereâs no control flow to introspect or manipulate. Every ârecursive stepâ is a new prompt, built via lossy serialization of the prior stepâs output. Thatâs repetition, not recursion.
Avoiding local maxima? Only insofar as random noise avoids overfitting. Thereâs no global optimization happeningâjust stochastic sampling from a surface shaped by human priors. You might get diversity, but you donât get convergence. LLMs donât do beam search across goals. They donât retry, compare, or self-reflect. Thereâs no regret minimization loop.
And even if you wrap them in agent loops with scratchpads and tree searches, youâre building a poor manâs blackboard systemâbad Lisp with no memory model. It's not reasoning until the model can inspect, compare, and revise its intermediate state without human mediation. Itâs just regurgitating scaffolding we built to give the illusion of momentum.
So yes, you can define a âgoalâ and a âstep function.â But you don't have recursion unless you have state, memory, checkpointing, and rollback. You have an uncontrolled loop over sampled surface noise.
And no matter how many tokens you throw at it, you still canât scale the human running the debugger.
We could hypothesize how capable an agent system is all day long. I'm just going to build with it and let others judge how well it worked
Thatâs fair â building is the ultimate test. But if the architecture lacks the primitives, then what youâre evaluating isnât the agentâs reasoning capacity. You can paper over those limits with scaffolding, retries, heuristics, and human feedback.
And I agree: it can look impressive. Many of us have built loops that do surprising things. But when they fail, they fail like a maze with no map: no rollback, blame assignment, or introspectionâjust a soft collapse into token drift.
So yes, build it. Just donât mistake clever orchestration for capability. And when it breaks, remember why: stateless inference has no recursion, memory, or accountability.
I hope you do build something great â and Iâll be watching. But if the agents start hallucinating and spinning in circles, I wonât say, âI told you so.â Iâll ask if your debugger is getting tired and remind you that you still canât scale the human.
Thinking about your position, we might not be meeting in the same place. My expectation isn't a system that grants wishes, but one that amplifies capabilities by a hundred, or a thousand. It's funny to me when replit deletes someone's production database because *thats how software works*. If you already know this, you know to build separate environments and authorization. Does the freelance contractor write poor, lazy code? Of course it does: that's why you review the code. But, you can still use a different freelance contractor to review it if you know how to ask the right questions.
vibe coding is the closest thing we have to rocket surgery. It's both incredible and terrible, and it's your job to captain the ship accordingly đ
Totally with you on captaining the ship. Iâd never argue against using LLMs as amplifiers â theyâre astonishing in the right hands, and yes, itâs our job to chart around the rocks. But thatâs the thing: if weâre steering, supervising, checkpointing, and debugging, then weâre not talking about autonomous reasoning agents. Weâre talking about a very talented, very unreliable deckhand.
This brings us back gently to where this all started: can vibe coders reason? If your answer now is ânot exactly, but they can help you move faster if you already know where youâre going,â maybe weâve converged. Because thatâs all I was ever arguing.
You donât scale reasoning by throwing tokens at it. You scale vibes. And someone still has to read the logs, reroute the stack, and fix the hull mid-sail.
Where I was going with "more tokens" is growing past zero shot expectations. I see models reasoning every day, so to say that they can't reason is the wrong path. But, the gold standard of "general intelligence" isn't good at writing software either. You wouldn't expect a junior dev to one shot a React app, or hot patch a bug in production. You need more process, more analysis, more constraint, in order to build good things. In life we call these dev-hours, but in this new reality they're called tokens. Doing something difficult will require a certain amount of effort. That investment is not sufficient, but it is necessary. Vibe coders who have never written software before wont understand what needs to be done, and where it needs to be done, in order to achieve the success that they're looking for. But, models are getting better every month now. By my estimation it won't be long before they are better at captaining than we are. If so, vibe coding will become a reality â and even if we aren't there today, it will take us longer to understand how to use these tools than it will for the tools to become useful.
onward đ
Glad weâre convergingâbecause thatâs the heart of it: we agree on amplification, but differ on the mechanics. Initially, your stance was stronger, claiming that these models were actively reasoning and recursing internally, escaping local maxima through real inference. We seem to agree theyâre powerful tools that amplify our capabilities, rather than autonomous reasoners.
My original point wasnât that LLMs are ineffective; it was just that more tokens alone donât yield reasoning. Amplification is profound but fundamentally different from real autonomous recursion or stable reasoning. The modelâs architecture still lacks structured state, introspection, and genuine memory management.
I agree, thoughâthese tools are moving quickly. Maybe theyâll soon surprise us both, and vibe coding might become rocket surgery. Until then, Iâm happy sailing alongside you, captaining through the chaos and figuring it out as we go. đ
No, I'm still making that claim.
Is context not memory? Have you not seen a model collect information, make a plan, begin to implement it, find something doesn't work, design an experiment, use the results to rewrite the plan, and then execute the new plan successfully? Is this somehow not "reasoning to avoid a local maxima"?
Context as memory? Not quite. Memory isnât just recalling tokens; itâs about managing evolving state. A context window is a fixed-length tape, overwriting itself continually. Thereâs no indexing, no selective recall, no structured management. The fact that you have to constantly restate the entire history of the plan at every step isnât memoryâitâs destructive serialization. Actual memory would be mutable, composable, persistent, and structurally addressable. Transformers have none of these traits.
Models appear to âcollect information, plan, and reviseââbut whatâs happening there? Each new prompt round is a complete regeneration, guided by external orchestration, heuristics, or human mediation. The model itself does not understand failure, doesnât inspect past states selectively, and doesnât reflectively learn from error. It blindly restarts each cycle. The human (or the scaffold) chooses what the model sees next.
Avoiding local maxima? Not really. The model doesnât even know itâs searching. It has no global evaluation function, no gradient, and no backtracking. It has only next-token probabilities based on pretrained statistics. âLocal maximaâ implies a structured space that the model understands. It doesnâtâitâs just sampling plausible completions based on your curated trace.
Can it seem like reasoning? Sureâbut only when youâve done the hard part (memory, scaffolding, rollback, introspection) outside the model. You see reasoning in the glue code and structure you built, not the model itself.
So yes, youâre still making the claim, but I still see no evidence of autonomous recursion, genuine stateful memory, or introspective reasoning. Context â memory. Iteration â recursion. Sampling â structured search. And tokens â for dev-hours.
But as always, Iâm excited to see you build something compellingâand maybe even prove me wrong. Until then, I remain skeptical: a context window isnât memory, and your best debugger still doesnât scale.
Ah, the root might be that I'm only considering models in an agent loop like goose. You're right that each inference is highly constrained. There is no memory access inside a single response. There is no (well, very limited) back tracking or external access right now. What a model spits out in one go is rather unimpressive.
But, in a series of turns, like a conversation or agent loop, there is interesting emergent behavior. Context becomes memory of previous turns. Tool use becomes a means toward and end and potentially new information. If models were stochastic parrots, this might on rare occasion result in new value, but there seems to be much more going on inside these systems, and tool use (or conversational turns) *often* results in new value in what I can only conceive of as reasoning.
Goose can curate its own memories. It can continue taking turns until it has a question, or decides the task is complete. It can look things up on the web, or write throwaway code to test a theory. Most of the time things fail it's because expectations were not set accordingly, or the structure of the system didn't provide the resources necessary for success. This is why I ask, what if the problem is that people aren't using enough tokens?
In long conversations with Claude I have seen all manner of outputs that suggest capabilities which far exceed what people claim LLMs "can do". (Well, what most people claim, because there are some people who go straight out the other end and Eliza themselves.)
What concerns me the most is that these capabilities continue to grow, and almost no one seems to notice. It's like the closer someone is to the systems, the more they think that they understand how they work. The truth is that these are (to use Wolfram's terminology) randomly mining the computational space, and the resulting system is irreducible. Which is to say, no one has any idea what's going on in those hidden layers. Anything that *could* happen *might* happen.
The only way to know is to experiment with them â and my informal experiments suggest that they're already AGI (albeit one with amnesia and no inherent agency).
Wherever this is all going, it's moving quickly. Stay buoyant đ
Glad weâve arrived at a similar perspective nowâit feels like progress. To clarify my original confusion:
When you initially wrote, âWhat if vibe coders just arenât using enough tokens?â, you seemed to imply that tokens aloneâwithout mentioning loops, scaffolding, external memory, or agent orchestrationâwould inherently unlock genuine reasoning and recursion inside transformers.
We're perfectly aligned if your real point always included external loops, scaffolding, or agent architectures like Goose (rather than just âtokens aloneâ). But I definitely didnât get that from your first post, given its explicit wording. Thanks for explicitly clarifying your stance here.
Working with LLMs has given me a first class notion of context. It's a strange new idea to me that's also changed how I approach conversations.
Our expectations around an agent loop does seem to be the root of it. Do people vibe code without such a thing though? I'll admit that I'm spoiled, since I started to use goose over 18 months ago I never bothered to try the other popular things that are more than Co-Pilot and less than goose, like Cursor
That is fair, and I think youâre touching exactly on the heart of the issue here.
Your recent experiences with Goose and these richer agent loops highlight what I pointed out: itâs not the quantity of tokens alone that unlocks genuine reasoning and recursion. Instead, reasoning emerges from loops, external memory, scaffolding, and orchestrationâprecisely as you implicitly acknowledge here by talking about agent loops as a requirement, rather than a luxury.
I appreciate that youâve implicitly clarified this:
âTokens aloneâ arenât the root solution; structured loops and scaffolding around the transformer architecture are.
Thanks for a thoughtful conversation! It genuinely feels like weâve arrived at the correct conclusion.
Taken literally, we agree. What seems to be happening is that people vibe code something, it doesn't work, and they declare that AI "isn't real yet". Another defeatist take is to ask a very specific question that you know very well, and watch it inevitably come back with a lame answer.
What I want people to notice is that most things are hard. It's very likely that given "more tokens" in the abstract sense, current AI would eventually settle on the correct answer.
It's important to realize this because even if something takes an LLM agent two days and $200 worth of tokens, the same task would probably take a person weeks or months and cost an order of magnitude more.
And that's just today. Actually, that was just last week, because Kimi-K2 and Qwen Coder can basically do what Claude Sonnet does for 1/10 the token cost, and it isn't going to stop there.
Stay buoyant đ
I appreciate the clarification â it confirms that the original claim was about tokens alone. Given enough tokens, current LLMs will eventually arrive at the correct answer, regardless of whether they have memory, structured loops, or agent scaffolding.
But thatâs precisely where we differ. Increasing the number of tokens expands cache size, not capability. To use a metaphor, transformer inference remains a stateless forward pass â no structured memory, call stack, global state, or persistent reasoningâjust a bigger microservice payload.
If reasoning occurs, youâve added an agent loop, scaffold, or retrieval â a system that uses tokens but is not solely just tokens. These arenât accidents; theyâre part of the architecture.
So weâre left with two incompatible views:
1. âTokens aloneâ eventually suffice (your original assertion),
2. Or they donât â and the real breakthrough lies in the surrounding structure, which we build because tokens alone are inadequate.
Happy to debate this distinction, but we should probably choose one. Otherwise, weâre just vibing our way through epistemology đ
I don't mean "tokens alone"
Appreciate the clarification attempts. But to be fair, this all started with a confident claim that âmore tokensâ would eventually get us there â not loops, not memory, not scaffolding â just âtokens,â full stop. Thatâs not a strawman; itâs quoted:
âItâs very likely that given âmore tokensâ in the abstract sense, current AI would eventually settle on the correct answer.â
â Posted July 22, 2025 ¡ 12:27 PM
And in case that was too subtle, a few days earlier:
âUse more tokens, kids.â
â ynniv ¡ 4d ago
This was in direct reply to:
âYouâre not scaling reasoning, youâre scaling cache size.â
â Itamar Peretz ¡ July 20, 2025 ¡ 08:37 AM
If your view has since changed to âI donât mean tokens aloneâ (July 24, 2025 ¡ 1:10 PM), thatâs totally fair â we all evolve our thinking. But thatâs not what was argued initially. And if weâre now rewriting the premise retroactively, letâs just acknowledge that clearly.
So hereâs the fulcrum:
Do you still believe that scaling token count alone (in the abstract) leads current LLMs to the correct answer, regardless of architectural constraints like stateless inference, lack of global memory, or control flow?
⢠If yes, then respectfully, that contradicts how transformers actually work. Youâre scaling width, not depth.
⢠If no, then weâre in agreement â and the original claim unravels on its own.
In either case, worth remembering: you canât scale humans. And thatâs still what fills the reasoning gaps in these loops.
I don't believe, and never have, that scaling context size alone will accomplish anything. I do believe, and always have, that people give up too early. I'm not sure why you're fixated on "winning" this argument â it's not an argument per se, and there are better things to do right now
Iâm not fixated on âwinning,â and certainly not looking to drag this out. But if weâre walking back, letâs be honest about whatâs being walked.
âUse more tokens, kids.â
â ynniv ¡ 4d ago
âItâs very likely that given âmore tokensâ in the abstract sense, current AI would eventually settle on the correct answer.â
â July 22, 2025 ¡ 12:27 PM
âI donât mean âtokens alone.ââ
â July 24, 2025 ¡ 1:10 PM
âI donât believe, and never have, that scaling context size alone will accomplish anything.â
â July 24, 2025 ¡ 7:53 PM
If the position was never âtokens alone,â I donât know what to do with these earlier posts.
So Iâll ask one last time, gently:
Was âmore tokens = eventual convergenceâ a rhetorical device, or a belief you now revise?
We probably both agree that scaling context is not equivalent to scaling reasoning and that transformers arenât recursive, stateful, or inherently compositional.
I was only pointing out that. If weâre aligned now, we can close the loop.
Thatâs a great blog post â I actually like it.
But letâs not mistake narrative for argument. Iâm not disputing that experimentation, iteration, and persistence can lead to real progress. In fact, Iâd argue thatâs precisely why itâs worth being clear on what is being tried.
My only point is that your original phrasing clearly emphasized tokens:
âUse more tokens, kids.â
âGiven enough tokens⌠current AI would eventually settle on the correct answer.â
Then later, you clarified:
âI donât mean âtokens aloneâ.â
If that was always your intent â that architectural context (loops, agents, structure) matters more than just throwing tokens â I think weâre in violent agreement.
But letâs not retroactively apply that nuance to the initial bold claim unless that was the design all along.
Persistence is valuable, yes. But clarity helps the rest of us persist in the right direction.