Have you looked into METR at all? Theyโve done some interesting work on measuring AI model capability and autonomy against a benchmark of software engineering tasks which have baseline measurements of how long they take a human to do.
TL;DR, frontier models currently have a ~50% success rate when performing tasks that would take humans a little over two hours, and that time has been doubling every 7 months. Assuming that trajectory, theyโll have the autonomy to complete a human 40 hour work week worth of work in less than three years.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/