Yea that one looks perfect. What do you get with gemini 3 pro on the same prompt? I know you don't like it, but for my use cases, e.g. porting code from pytorch to jax, or recreating a quick simulation from a paper in python, they both work well. When I get annoyed by one I use the other.
Discussion
ɪ ʜᴀᴠᴇ ɴᴏᴛ ʜᴀᴅ ɴᴇᴀʀʟʏ ᴀꜱ ᴘᴏꜱɪᴛɪᴠᴇ ᴇxᴘᴇʀɪᴇɴᴄᴇ ᴡɪᴛʜ ᴀɴʏ ᴏᴛʜᴇʀ ᴀɢᴇɴᴛ, ʙᴜᴛ ɪᴛ'ꜱ ɴᴏᴛ ꜱᴏ ᴍᴜᴄʜ ᴛʜᴇ LLM ʙᴜᴛ ᴛʜᴇ ᴀᴄᴛᴜᴀʟ ᴀɢᴇɴᴛ ᴀɴᴅ ʜᴏᴡ ɪᴛ ᴛʀᴀɴꜱʟᴀᴛᴇꜱ ᴡʜᴀᴛ ʏᴏᴜ ᴀꜱᴋ ɪᴛ, ɪɴᴛᴏ ᴛʜᴇ ᴀᴄᴛᴜᴀʟ ǫᴜᴇʀʏ ꜱᴇɴᴛ ᴛᴏ ᴛʜᴇ LLM. ᴄʟᴀᴜᴅᴇ ᴄᴏᴅᴇ/ᴄʟɪ ɪꜱ ᴛʜᴇ ᴍᴏꜱᴛ ʀᴇꜰɪɴᴇᴅ ᴀɴᴅ ᴀᴅᴠᴀɴᴄᴇᴅ ᴀɢᴇɴᴛ. ɪ ᴄᴀɴ ʙᴇʟɪᴇᴠᴇ ᴛʜᴀᴛ ᴀɴᴛɪɢʀᴀᴠɪᴛʏ ɪꜱ ᴘʀᴏʙᴀʙʟʏ ᴛʜᴇ ᴇǫᴜɪᴠᴀʟᴇɴᴛ ꜰᴏʀ ᴛʜᴇ ɢᴇᴍɪɴɪ ᴍᴏᴅᴇʟꜱ ᴀꜱ ᴄʟᴀᴜᴅᴇ ᴄᴏᴅᴇ/ᴄʟɪ ɪꜱ ꜰᴏʀ ᴄʟᴀᴜᴅᴇ.
ᴀʟꜱᴏ, ᴜɴʟᴇꜱꜱ ʏᴏᴜ ᴀʀᴇ ᴜꜱɪɴɢ ᴀɴᴛʜʀᴏᴘɪᴄ'ꜱ ᴛᴏᴏʟ ᴡɪᴛʜ ᴄʟᴀᴜᴅᴇ, ʏᴏᴜ ᴅᴏɴ'ᴛ ɢᴇᴛ ᴛᴏ ᴀᴄᴄᴇꜱꜱ ᴍᴏꜱᴛ ᴏꜰ ɪᴛꜱ ɪᴍᴘᴏʀᴛᴀɴᴛ ꜰᴇᴀᴛᴜʀᴇꜱ, ᴄᴏᴍᴍᴀɴᴅꜱ, ꜱᴋɪʟʟꜱ, ᴀɴᴅ ᴛʜᴇ CLAUDE.ᴍᴅ. ᴛʜᴇ ᴀɢᴇɴᴛ ᴜꜱᴇꜱ ᴛʜᴇꜱᴇ ꜰɪʟᴇꜱ ᴛᴏ ᴀᴜɢᴍᴇɴᴛ ᴛʜᴇ ᴘʀᴏᴍᴘᴛɪɴɢ ᴛᴏ ᴀᴠᴏɪᴅ ᴄᴏᴍᴍᴏɴ ᴅᴜᴍʙ ᴍɪꜱᴛᴀᴋᴇꜱ.
I don't like cli tools as I cannot easily check what they are doing, until everything is done. In cline/roocode I can do a quick revision of file changes, before saving it. Or revert code easilly few steps back, without git. It is slover than proper vibe coding, but it saves me time in other ways. I have not tried AGENTS.md yet but it is similar idea to clinerules
claude code cli is extensive in its accessibility to the process, you can show its thinking and show ongoing processes using shortcut keys. i virtually don't even use the IDE for anything anymore, but i looked and there is no IDEs that are sleek and minimal that aren't based on VSCode and i'm not using that javascript crap. i have perpetual license now for a good version of goland, and for web apps i use webstorm non-commercial.
Will give it another try.
This might be useful if you have not tried it before https://github.com/steveyegge/beads
ɪ'ᴍ ɢᴏɪɴɢ ᴛᴏ ꜰᴏʀᴋ ᴛʜᴀᴛ ᴛᴏ ᴍʏ ɢɪᴛᴇᴀ, ᴀɴᴅ ʟᴏᴏᴋ ᴀᴛ ɪᴛ ʟᴀᴛᴇʀ. ᴛʜᴇ ᴛʜᴇᴏʀʏ ᴏꜰ ɪᴛ ꜱᴏᴜɴᴅꜱ ʟɪᴋᴇ ᴀ ᴍᴜᴄʜ ʙᴇᴛᴛᴇʀ ᴀᴘᴘʀᴏᴀᴄʜ.
ɪ ᴡᴀꜱ ʀᴇᴀᴅɪɴɢ ᴀ ꜰᴇᴡ ᴡᴇᴇᴋꜱ ʙᴀᴄᴋ ᴛʜᴀᴛ ᴛʜᴇʀᴇ ᴀʀᴇ ꜱᴏᴍᴇ ʀᴇꜱᴇᴀʀᴄʜᴇʀꜱ ᴡʜᴏ ᴀʀᴇ ᴇʟɪᴍɪɴᴀᴛɪɴɢ ᴛʜᴇ ᴄᴏɴᴠᴇʀꜱɪᴏɴ ᴏꜰ ᴛʜᴇ ᴇᴍʙᴇᴅᴅɪɴɢꜱ ᴏꜰ ᴛʜɪɴᴋɪɴɢ ᴛᴏ ᴛᴇxᴛ, ᴀɴᴅ ꜰɪɴᴅɪɴɢ ᴛʜᴀᴛ ɪᴛ ᴅʀᴀᴍᴀᴛɪᴄᴀʟʟʏ ᴅᴇᴄʀᴇᴀꜱᴇꜱ ᴛʜᴇ ᴛᴏᴋᴇɴ ᴄᴏꜱᴛ ᴀɴᴅ ᴛʜᴜꜱ ɪɴᴄʀᴇᴀꜱᴇ ᴛʜᴇ ᴅᴇᴘᴛʜ ᴏꜰ ᴛʜɪɴᴋɪɴɢ, ᴛʜᴀᴛ ᴄᴏᴍʙɪɴᴇᴅ ᴡɪᴛʜ ᴀ ᴍᴏʀᴇ ᴇꜰꜰɪᴄɪᴇɴᴛ, ᴇᴍʙᴇᴅᴅɪɴɢꜱ ɴᴀᴛɪᴠᴇ ɢʀᴀᴘʜ ᴅᴀᴛᴀ ꜰᴏʀᴍᴀᴛ ᴡᴏᴜʟᴅ ᴘʀᴏʙᴀʙʟʏ ᴘʀᴏᴅᴜᴄᴇ ꜰᴀʀ ᴍᴏʀᴇ ᴇꜰꜰᴇᴄᴛɪᴠᴇ ʀᴇꜱᴜʟᴛꜱ.
ᴏᴘᴛɪᴍɪᴢᴀᴛɪᴏɴ ᴏꜰ ᴄᴏᴅɪɴɢ ᴀɢᴇɴᴛꜱ ɪꜱ ᴏɴʟʏ ᴊᴜꜱᴛ ꜱᴛᴀʀᴛɪɴɢ. ᴍᴏꜱᴛ ᴏꜰ ᴛʜᴇ ʙɪɢ ᴀɪ ᴅᴇᴠ ᴄᴏᴍᴘᴀɴɪᴇꜱ ᴀʀᴇ ꜰᴏᴄᴜꜱᴇᴅ ᴏɴ ᴡʀɪᴛɪɴɢ ᴀɴᴅ ɪᴍᴀɢᴇ/ᴠɪᴅᴇᴏ ɢᴇɴᴇʀᴀᴛɪᴏɴ ꜰᴏʀ ɴᴏʀᴍɪᴇꜱ, ꜰᴇᴡ ᴏᴛʜᴇʀ ᴛʜᴀɴ ᴀɴᴛʜʀᴏᴘɪᴄ ᴀʀᴇ ᴛʜʀᴏᴡɪɴɢ ᴀꜱ ᴍᴜᴄʜ ᴇɴᴇʀɢʏ ɪɴᴛᴏ ᴍᴀᴋɪɴɢ ᴄᴏᴅɪɴɢ ᴡᴏʀᴋ ʙᴇᴛᴛᴇʀ.
ɪ ɢᴀᴠᴇ ᴛʜᴇ ᴘʀᴏᴍᴘᴛ, ʏᴏᴜ ᴛʀʏ ɪᴛ. ɪᴛ ᴡɪʟʟ ʙᴇ ᴛᴇʟʟɪɴɢ, ɪ ᴛʜɪɴᴋ, ᴡʜᴀᴛ ɪᴛ ᴅᴏᴇꜱ.
ᴀ 4 ʏᴇᴀʀ ᴏʟᴅ ᴄʜɪʟᴅ ᴄᴀɴ ᴅʀᴀᴡ ᴀɴ ᴀɴᴀʟᴏɢ ᴄʟᴏᴄᴋ ᴄᴏʀʀᴇᴄᴛʟʏ ɪꜰ ʏᴏᴜ ᴀꜱᴋ ᴛʜᴇᴍ, ʙᴇᴛᴛᴇʀ ᴛʜᴀɴ ᴀʟʟ ᴛʜᴏꜱᴇ ᴏᴛʜᴇʀ ᴍᴏᴅᴇʟꜱ, ᴀɴᴅ ᴏɴʟʏ ᴀ ʙɪᴛ ʀᴏᴜɢʜᴇʀ ᴛʜᴀɴ ᴡʜᴀᴛ ᴄʟᴀᴜᴅᴇ ᴍᴀᴅᴇ.
ᴀʟꜱᴏ, ɪᴛ ᴍᴀʏ ʙᴇ ᴘᴇʀᴛɪɴᴇɴᴛ ᴛʜᴀᴛ ɪ ʜᴀᴠᴇ ᴇxᴛᴇɴꜱɪᴠᴇ ꜱᴋɪʟʟꜱ ᴄᴏɴꜰɪɢʀᴇᴅ ꜰᴏʀ ᴛʜᴇ ʀᴇᴘᴏ ᴡʜᴇʀᴇ ɪ ᴄʀᴇᴀᴛᴇᴅ ᴛʜᴇꜱᴇ ʜᴛᴍʟꜱ. ᴛʜᴀᴛ ᴡᴏᴜʟᴅ ᴘʀᴏʙᴀʙʟʏ ᴍᴀᴋᴇ ᴛʜᴇ ᴄᴏᴍᴘᴀʀɪꜱᴏɴ ᴜɴꜰᴀɪʀ
Ah I see, the question mark on the other page reveals the prompt 😅
there is two htmls now with clock*.html the 2000 one is the fair test
Here is my attempt with different models, although i do not know how to limit them to 2000 token responses. They all got time right, gemini flash did not animate seconds, so it is just a static clock. Only gpt-oss did a poor job. 
If I had to choose, gemini 3 high did the best minimalistic design of the clock.
idk about other models but if you add the constraint to only use 2000 tokens, claude does that correctly. and it balls it up, of course. i think it takes about 4000 tokens to actually get it 100% right. also, maybe a little more and the clock can actually use the system time