I guess that flagging and filtering out images is very expensive, because you need to use classic or ML based computer vision models. I cannot think to a cheaper way to do it. Plus, training a CNN to filter CSAM material means collecting such a dataset and training it. Training it means putting the images on a training environment...probably AWS or others, which would put you in a dangerous position...
Something may already exist out there, but certainly that is more complex than just filtering text.
LLMs are not the best tool for that probably