The key to success is using an agent loop like goose, and specifying a hard test. People love to write tiny "coverage" unit tests, but those are garbage. You write one test that says "do the thing like it will be done in production", and then you burn tokens until it passes. It might take a while, and you might need to suggest some ideas, but it will probably only cost a few dollars and not much of your mental energy. Then you review the code, find what else is wrong with it, add another hard test, and repeat