phones have pretty good transcription apis these days, I would imagine nostr:npub1t89vhkp66hz54kga4n635jwqdc977uc2crnuyddx7maznwfrpupqwra5h9 is using device translation.
why is it a weird client requirement? it seems practical. having transcriptions that are asyncronously tacked on afterwards by others seems a lot more complicated and not reliable.