I don't have images displayed by default and I don't have any way to tell (other than the person I'm following) whether to display the image or not. Once I see it I cannot unsee it.
I think encouraging self-publication of honest text descriptions of non-text content is the way to go. I recently encountered someone who did not wish to publish a content warning so my recourse was to mute that person. If we repeat this process continuously, where both clients and relays can try to evaluate whether a description of media is honest or not, and block dishonest descriptions, that would go a long way.
In other words, use classification systems not to identify one particular type of content on a scale from 0.0 to 1.0, but rather have it judge whether the attached description of an image or video is honest on a scale from 0.0 to 1.0. Then everybody can block dishonest npubs and filter what they wish to see or not see based on descriptions. If an image or video URL does not have a description alongside it, score it 0.0. I don't know which NIPs this would use or add. And the classification systems would not be required to be used by anyone, but they might help identify dishonest sources.