Cc nostr:nprofile1qythwumn8ghj76twvfhhstnwdaehgu3wwa5kuef0qyghwumn8ghj7mn0wd68ytnhd9hx2tcprpmhxue69uhkxun9v968ytnwdaehgu3wwa5kuef0qqsrmpp2lmx4u2fl9zmxy7fnwp9rlwxwz5a2j8tep2c376n494z2gtgtunul2 the case for DVMs
Discussion
I definitely understand DVMs for things that are not tied to relays/nostr, like translations.
What I don’t really understand is DVMs for things like search instead of NIP-50. You still are using a small handful of relays (since only a few support text search) and adding a ton of round trips.
DVM Search flow:
1. Client connects to relay(s) where DVM is listening.
2. Client makes dvm search request on behalf of user by signing a job request
3. DVM uses NIP-50 search or out of band API to gather results for job request
4. DVM responds with event IDs that match the search
5. Client makes REQ across multiple relays for result IDs
6. Client displays results (if they can find the events)
NIP-50 Flow
1. Client connects to relays who support NIP-50
2. Client makes a REQ to relay
3. Client displays results
What am I missing here?
Also in terms of ease to implement, the DVM flow for search is a good bit different than translate on the client side.
Just because it’s under the “DVM” name doesn’t mean they don’t have to do more work to make it useful in their client.
Adding a new REQ term “search” and keeping all the event parsing exactly the same seems WAY easier to implement.
DVMs are very easy to implement for clients in fact (unless you're paying for the service, that's a little bit harder). See my search implementation here: https://github.com/coracle-social/coracle/blob/master/src/engine/network/utils/dvms.ts, it's only about 50 lines of code.
Have you read my article on DVMs as functions? I think I dropped the link before but it describes what I'm thinking of here: https://habla.news/u/hodlbod@coracle.social/0gmn3DDizCIesG-PCD-JK
Here's how I imagine the DVM search flow looking:
1. Client connects to relay(s) where any DVM that supports search is listening (based on a nip 89 advertisement).
2. Client makes dvm search request on behalf of user by signing a job request, optionally including the relay urls they want to search
3. DVM uses negentropy or some other sync implementation to index the requested relays or a static set of relays.
4. DVM responds with event IDs that match the search
5. Client makes REQ across multiple relays for result IDs
6. Client displays results (if they can find the events)
Step 3 is obviously the difference here. It's a fairly difficult implementation for the relay, because it relies on keeping an index across multiple relays and potentially rebuilding it on the fly for clients.
There are two benefits though:
- Any relay can be searched using any algorithm, regardless of nip 50 support or implementation quality
- DVM caches can be as big or small as they want. If they want to index the whole network, they can, and you avoid duplicate results from querying a bunch of relays using nip 50. If they only want to index a single relay (even a private one), they can. Returning event ids means they're not leaking any information about private data, since clients still have to AUTH with the relay that holds the information.
The idea is that relays are data structures (or names encapsulating data structures), and DVMs are functions. Functions can do *anything* with data. So search is only one of an infinite number of possibilities — very few of which would ever be widely supported by relays.
50 lines of code is a lot more than leaving it to a relay to implement actual search
DVMs are a retarded idea, all databases have to have some kind of indexes and metadata so it's just, IMO, a problem of too much funding going to so many clients and too little going to relay development
I’ve read a lot of your and others long form posts on DVMs. I understand the ideas and philosophy but again you’re losing me on the actual implementation details.
Step 3 as you’ve described it, is just an aggregator. Nobody can do this on the fly for text search. The operators that have this data are going to be large DBs who have been aggregating nostr events from many relays. These happen to be the exact same providers/relays that already offer NIP-50.
You’ve doubled the amount of steps and still you’re going to the same sources for information. Search is a function that is most useful when provided by aggregators.
That DVM search implementation is many times more complicated (and slower) than adding a “search” REQ filter. That’s the comparison I’m making.
and as i have been pointing out, that can be a special tag, not altering any other processes or parsing for anyone, doesn't even need to be a new key
You're right that latency is not great, and on the fly indexing might not be feasible. Maybe a direct connection, bypassing DVMs and returning actual events would be a better interface. That's another implementation detail we can add later once the DVM model is proven.
The thing about using relays as aggregators is there's currently no way to know if they have the events you want. I drafted a NIP a looong time ago that attempted to solve this problem using relays: https://github.com/nostr-protocol/nips/pull/259. The idea is that a relay would be able to recommend another relay that indexes its content, and which supports the functionality the client is looking for. So small relays could still self-host, but rely on big indexers to serve search. Otherwise, clients still have to rely on centralized services like nostr.band and nostr.wine to find the notes they're looking for.
This is particularly bad with relays that enforce AUTH, since there's no way for an indexer to build an index for content exclusively on those relays. Authorized indexers (indexers that are authenticated with these private relays) are the only way to expose that data other than making relays build every search/discovery feature directly into the relay interface. So if a client wants to search/analyze data from a particular relay, they need to know who is indexing that relay. DVM requests allow clients to specify the relays they want to look at, which DVMs can answer or not depending on whether they index the target relays.
I'm not saying DVMs are integral to this process, just that they're the only things that exist that can support separating data and functionality right now. It would be fine to have a "search relay", or an "analytics relay" and for the NIP 11 document to point clients to themselves for search/analysis, or to some other relay for search/analysis.
A succinct way to describe my goal is: there should be no stranded data in nostr. All of it should be indexable and discoverable to the extent the custodian wants it to, however small the relay instance. NIP 50 search built into nostr.wine doesn't solve this.
I know we are discussing DVMs more broadly but I want to focus on search (and I believe topic classification fits in to the same category).
For search, you generally don’t know what events you want so I don’t think not knowing if an aggregator has particular events or not is really a concern. If you don’t find what you’re looking for, you can always look on a different aggregator. Better yet, connect to a handful of them and execute your REQs in parallel. If you don’t find them on any of the aggregators, I don’t see why you would be able to find them on a DVM. Relays that enforce AUTH for certain or all events generally do NOT want their events to be indexed. Creatr.nostr.wine certainly doesn’t want public indexers involved. If we wanted to offer search for our patreon relay, we would offer it directly on the relay - that makes way more sense than authorizing certain aggregators to access the data. There have been on and off discussions about implementing NIP-50 in to strfry. A few relay implementations supporting it out of the box would decentralize search a lot more overnight. I’m afraid I still don’t see any reason to involve a DVM here.
On a separate but related note - I think we can expand NIP-50 to offer a lot more optional functions in the search, including one you mentioned.
Isolating results to only one relay (even if NIP-50 relay is an aggregator), negative keywords, wildcards, topics, language, sentiment, nsfw, etc etc.
Obviously I don’t expect every relay to implement these, but even if we only have a handful of aggregators we can query directly, thats at least on the decentralization spectrum compared to all existing social media search. We should also expect the number of aggregators and search providers to grow with time if nostr succeeds (even if their data become more fragmented).
One last thing and then I’ll stop harassing you guys, just lots of questions and want to understand.
We DO have common ground here. I think DVMs are useful, particularly for things like translation, text to speech, etc. The more generalized the request can be and the less coupled it is to a relay, the more sense it makes to me to use a DVM. In those instances it’s incredibly valuable to have multiple service providers accessible through a single job request on any client/relay.
When the providers of the data are relays, as in the case of NIP-50, I don’t see why we should bend DVMs to try to fit this use case.
Happy Sunday!
You as well! I think maybe a part of our disagreement is our vision of how big relays ought ultimately be. I would like there to be many small relays which are cheap to operate and sovereign, but which can delegate advanced functionality to other services. This doesn't make sense for nostr.wine, because you have no problem running those other services. Both visions can coexist, but as a client dev I'd like to access both with the same interface.
But these are just opinions, the proof is in the pudding. I look forward to seeing what you come up with. 🫡