Subnostr

a flurry of discussions and things relating to my work lately have led me to something

there is two current existing query types in nostr, the REQ and the COUNT

REQ has no concrete notion of signalling how many events there are in a request beyond what it has been hard coded to limit results to

COUNT doesn't have metadata to easily determine other than making multiple COUNT queries

"since" and "until" fields in filters can be used to create a boundary that limits the number of results from a REQ but it is inadequate

i could swear i already made this suggestion before

but i'm gonna make it again, if i did, or for the first time if not

there should be a query that just spits back a list of all the event IDs that match a query, and if you set no limit, it just returns the whole set

if you consider that some follow events take as much as 512kb of data, and this is often a common size limit for individual events, then this is good for somewhere around 14000 individual event IDs to be returned in a result, it could be as simple as an array, so overhead is ["",...]

perhaps this is not sufficient though, maybe you want to include the timestamp next to each event ID... or maybe you could make it so you define the first timestamp of the first event ID, and then after that it's seconds offsets from the previous, this would mean that the list would be something like

[[12345678,""],[1234,""],[2562,""], ... ]

i'm inclined to say that fuck the datestamps, i'm just gonna make a new req variant that returns the IDs instead of the results as an array, and to keep with the style, it will just be

["REQID","subscriptionID","","", ... ]

the relay already can specify some size limit according to the nip-11 relay information, so it can just stop at just before that size limit and the user can query for that event and get a timestamp "since" to use to get the rest

nostr:npub1ntlexmuxr9q3cp5ju9xh4t6fu3w0myyw32w84lfuw2nyhgxu407qf0m38t what do you think about this idea?

if the query has sufficient reasonable bounds, like, it is very unlikely you want more than 14000 events over a period of let's say, the last day, of a specific kind, and certainly not if you limit it to some set of npubs

but you would still know where the results end, and so long as you stick to the invariant of "this is what i have on hand right now" the question of propagating queries can be capped by the answer of "what i have" and it is implementation internal whether or not you have a second layer and if you then go and cache the results of that query so next time you can send a more complete list

and i am not even considering this option

what about if instead of returning the results encoded in hex (50% versus binary hash size) but instead send them as base64 encoded versions of the event IDs, that gives you 75% or in other words expands the hypothetical max results of just IDs from 14000 to 21000

semisol 1y ago

ws maximum message size is a problem so you should use multiple messages

it can be pretty low depending on the client

Reply to this note

Please Login to reply.

Discussion

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

yeah but that is configurable, you can set a standard

except gay clients, and they can have a gay standard

this is relevant to what you have been saying about protocol headers but i think it's stomping on nip-11 and if clients aren't querying for that then they can go to hell, actually, literally be set on fire and burn alive

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

oh yeah... easy to solve that

this new query type has a "max bytes" field. done

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

if they want moar, then tough shit because the relay sed, what it sed, nip-11 bitch

semisol 1y ago

also you could easily perform a DoS attack on certain events by creating a lot of events with a higher event ID (or lower depending on the relay impl) and the same created at as the target event

you reach your limit before the target event gets returned and there is no way to find it except by id

this will also become a problem once nostr has a high event volume per second

the only way to fix this is to allow support for proper pagination

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

higher event ID? by timestamp? based on a filter?

that's a pretty narrow attack surface

clients can mitigate it by being more specific, things like authors are a huge limiter

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

also i don't think you really fully understand how you implement pagination

you have to generate a list of matching event ID to do it, then the relay has to store it on a space limited query cache that expires after a time or volume

the idea to make a new query type that just returns these came from me thinking about that fact

it's the first step towards it, but it probably eliminates the problem altogether, seriously, 14000-21000 event id matches for a query without busting the current typical max response sizes?

DoS attack mitigation would not be relevant, that's a network layer issue not an application layer issue, in as far as it is a per-query limit not in the context of total volume from a client, that's not the same thing

semisol 1y ago

pagination can be represented statelessly with an event ID and its timestamp

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

no, because what happens when a new event that matches the filter gets stored?

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

what happens if i ask for the most recent 50 events matching a filter

then ask for page one of a 10 page of that result after 5 more new events come in?

nope, definitely stateful

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

i don't know what you are saying

so i send a query again for "events since X" and it's been a minute and 5 more have been stored that match

how is that not stateful

the query is pinned to a time, yes, the results, not

Silberengel 1y ago

Can't you just paginate the results or something?

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

that's why i've been saying we need at least a "results as list of ID's" query type

pagination is a query engine thing

it gets a query, and some spec of page X and per-page Y and whatever that big list of IDs has to be cached so the same query can be consumed with the other parts of the same query result list

i say, first of all, relays are just limiting results to typically 500kb

that's enough to spew 14000 event IDs

so why not have a simple query that just gives you that cache and forget the pagination, you do that yourself, bitch!

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1y ago

honestly, the "count" query is a buncha bullshit, completely retarded stupid idea

it should always have been "GETIDS" or something like that "QUERY" even fuck it what is "REQ"