Maybe just establish a simple standard that maps emojis and positions to random chunks of hex character strings of length = final string length / number of emojis and positions in the seed?
So there's a standardised "translation" step from emojis to a long hex string (made from comcantating the shorter hex character chunks)