Here is my attempt with different models, although i do not know how to limit them to 2000 token responses. They all got time right, gemini flash did not animate seconds, so it is just a static clock. Only gpt-oss did a poor job. 
Discussion
If I had to choose, gemini 3 high did the best minimalistic design of the clock.
idk about other models but if you add the constraint to only use 2000 tokens, claude does that correctly. and it balls it up, of course. i think it takes about 4000 tokens to actually get it 100% right. also, maybe a little more and the clock can actually use the system time