mostly Gemini 2.5 pro. it is generally amazing at most things, I just have to follow behind it with a mop.

what you say makes sense. and in the context of arenas, the scorers aren't committing that code or and having to interact with it, or iteratively improve it.

Reply to this note

Please Login to reply.

Discussion

Sonnet probably best right now for code. But 3.7 bloats it more than 3.5. I have to often tell it to make minimal changes now. Deepseek R2 is going to be great. Gemini is good with context window but in general once codebase gets large, they are start to warp a bit. It's surprisingly similar to video generation, in that respect.

I should go back and try 3.7, it's been a few weeks, and my methods have evolved as well. o3 has been good at research, but as a non-driving code-pairing partner, doesn't do well. it might be better with tool use, but it is cost prohibitive.