Nice report. Btw your wallet says it doesn't work when trying to zap you.
Have you looked into METR at all? They’ve done some interesting work on measuring AI model capability and autonomy against a benchmark of software engineering tasks which have baseline measurements of how long they take a human to do.
TL;DR, frontier models currently have a ~50% success rate when performing tasks that would take humans a little over two hours, and that time has been doubling every 7 months. Assuming that trajectory, they’ll have the autonomy to complete a human 40 hour work week worth of work in less than three years.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Discussion
No replies yet.