Some of the ways the IABIED plan can backfire
Published on September 22, 2025 3:02 PM GMTIf one thinks the chance of an existential disaster is close to 100%, one might tend to worry less about the potential of a plan to counter it to backfire. It's not clear if that is a correct approach even if one thinks the chances of an existential disaster are that high, but I am going to set that aside.
If one thinks the chance of an existential disaster is "anywhere between 10% and 90%", one should definitely worry about the potential of any plan to counter it to backfire.
Out of all ways the IABIED plan to ban AI development and to ban publication of AI research could potentially backfire, I want to list three most obvious ways which seem to be particularly salient. I think it's useful to have them separately from object-level discussions.
1. Change of the winner. The most obvious possibility is that the plan would fail to stop ASI, but would change the winner of the race. If one thinks that the chance of an existential disaster is "anywhere between 10% and 90%", but that the actual probability depends on the identity and practices of the race winner(s), this might make the chances much worse. Unless one thinks the chances of an existential disaster are already very close to 100%, one should not like the potential of an underground lab winning the race during the prohibition period.
2. Intensified race and other possible countermeasures. A road to prohibition is a gradual process, it's not a switch one can immediately flip on. This plan is not talking about a "prohibition via a coup". When it starts looking like the chances of a prohibition to be enacted are significant, this can spur a particularly intense race (a number of AI orgs would view the threat of prohibition on par with the threat of a competitor winning). Again, if one thinks the chances of an existential disaster are already very close to 100%, this might not matter too much, but otherwise the further accelerated race might make the chances of avoiding existential disasters worse. Before succeeding at "shutting it all down", gradual advancement of this plan will have an effect of creating a "crisis mode", and various actors doing various things in "crisis mode".
3. Various impairments for AI safety research. Regarding the proposed ban on publication of AI research, one needs to ask where various branches of AI safety research stand. The boundary between safety research and capability research is thin, there is a large overlap. For example, talking about interpretability research, Nate was saying (April 2023, https://www.lesswrong.com/posts/BinkknLBYxskMXuME/if-interpretability-research-goes-well-it-may-get-dangerous
)
I'm still supportive of interpretability research. However, I do not necessarily think that all of it should be done in the open indefinitely. Indeed, insofar as interpretability researchers gain understanding of AIs that could significantly advance the capabilities frontier, I encourage interpretability researchers to keep their research closed.
It would be good to have some clarity on this from the authors of the plan. Do they propose the ban on publications to cover all research that might advance AI capabilities, including AI safety research that might advance the capabilities? Where do they stand on this? For those of us who have the chance of an existential disaster is "anywhere between 10% and 90%", this feels like something with strong potential of making our chances worse. Not only this whole plan is increasing the chances of shifting the ASI race winner to be an underground lab, but would that underground lab also be deprived of benefits of being aware of advances in AI safety research, and would the AI safety research itself slow down orders of magnitude?
https://www.lesswrong.com/posts/iDXzPK6Jzovxcu3Q3/some-of-the-ways-the-iabied-plan-can-backfire