Yes, I think so. Anything that has a sequential structure could be done by transformer. Text, music, video.
But just focusing on image, this v5.1 Midjouney image is crazy. Some photo shoots for ads could be done completely. Way cheaper than ads done by a traditional human force using models, photographers, and so on…