It’s a tough problem. I think about this a lot and it’s just all tradeoffs. What you described is worth pursuing, and if someone needs perfect privacy, they have to run the models themselves on their own hardware. TEE environments from AWS are getting better, and we might get some advances from FHE tech but that’s probably still years away and will be more expensive than the cheap models offered by big tech.
Discussion
Trade-offs all the way down.
I've thought about AWS Nitro enclaves + NVIDIA "Confidential Computing" like nostr:npub10hpcheepez0fl5uz6yj4taz659l0ag7gn6gnpjquxg84kn6yqeksxkdxkr. I can see nostr:npub130mznv74rxs032peqym6g3wqavh472623mt3z5w73xq9r6qqdufs7ql29s proxy providers having this kind of set up as a premium tier for their services.
There’s one more option I’ve recently discovered that might be a new spot on the spectrum of tradeoffs… So there are GPU providers like modal.com that let you spin up GPU environments and only pay for the seconds or minutes it’s up and running. So you could create a container boots up with a newly generated key, messages get decrypted, run through the model, encrypt the outputs and send back to user, and the plaintext content never leaves RAM or VRAM. So the cloud provider should theoretically have a harder time to spy on you, even though they could…. And because this is on demand, you can use big open source models.