My day job is thinking about how to make code secure. I’ve been thinking about this research a lot.
There are two main challenges here:
1. Most code that is used to train LLMs was written by humans. Humans do not write secure code.
2. Data poisoning is a real attack vector and it has a non linear affect on LLM output.
Securing code at scale before LLMs was incredibly difficult. Now? The game is 10x harder.
Also, in before someone suggests just having LLMs review the code for vulnerabilities 😅


