Jetpacks for software engineers.

“We evaluated Devin on SWE-bench, a challenging benchmark that asks agents to resolve real-world GitHub issues found in open source projects like Django and scikit-learn.

Devin correctly resolves 13.86%* of the issues end-to-end, far exceeding the previous state-of-the-art of 1.96%. Even when given the exact files to edit, the best previous models can only resolve 4.80% of issues.”

https://www.cognition-labs.com/introducing-devin

Reply to this note

Please Login to reply.

Discussion

Heard about this on AI unchained, but haven't finished the episode.

It won’t take software engineers’ jobs. It will enable them to build bigger things in less time. 🫡

Looking forward to the open source equivalent.

👀

I’ll take 13%

I’ve tried a few different LLMs for coding and found they’re good for really constrained issues with well known apis. For example it is good at svelte and swiftui but even when you remind it and give examples it hallucinates about ndk and Nostr actually work.

Sometimes they just get stuck and stop providing anything useful at all until I start over from scratch.

I am hopeful as context windows grow that we’ll be able fix that.

Has anyone tried this new one yet?