Have you looked into METR at all? Theyโ€™ve done some interesting work on measuring AI model capability and autonomy against a benchmark of software engineering tasks which have baseline measurements of how long they take a human to do.

TL;DR, frontier models currently have a ~50% success rate when performing tasks that would take humans a little over two hours, and that time has been doubling every 7 months. Assuming that trajectory, theyโ€™ll have the autonomy to complete a human 40 hour work week worth of work in less than three years.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Reply to this note

Please Login to reply.

Discussion

Nice report. Btw your wallet says it doesn't work when trying to zap you.