It’s brilliant to have a PoC implementation before writing a spec.
Have you measured the average size of these “not” messages vs a standard query? They’re bound to be smaller than pulling the messages (obviously) but one thing to consider is data transfer speeds tend to be highly asymmetric, with uploads being 10x slower or more. Something to consider.
It also means running hundreds or thousands of comparison operations on the server per user - it’ll be interesting to see how server CPU usage on relays is affected. Ngentropy does seem more efficient in that regard since it’s binary search.
I look forward to seeing developments. I volunteer to run a public test relay with your fork if that would help at any point.