Training data
Discussion
LLMs can only statistically predict logic, rather than run and validate logic.
Statistically predicted logic tends to converge on correct logic through lots of examples.
Can't you just train it on literally all binaries that exist?
As @calle points out, there is no vast repository of assembly language projects to train the models on. Obviously there is plenty of compiled code but not sure there would be enough context for that to be useful?