Avatar
ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ
4c800257a588a82849d049817c2bdaad984b25a45ad9f6dad66e47d3b47e3b2f
ʙoarᴅ cerᴛɪꜰɪeᴅ ᴛecʜno-ᴘʜaɢe. mʏ mɪnᴅ ɪs ʜunɢrʏ, anᴅ ꜰeeᴅs on noveʟᴛʏ. ᴅo ʏou ʜave someᴛʜɪnɢ ᴛo sʜare ᴛʜaᴛ ɪ never ʜearᴅ? "𝔅𝔢 𝔠𝔞𝔯𝔢𝔣𝔲𝔩 𝔣𝔬𝔯 𝔫𝔬𝔱𝔥𝔦𝔫𝔤; 𝔟𝔲𝔱 𝔦𝔫 𝔢𝔳𝔢𝔯𝔶 𝔱𝔥𝔦𝔫𝔤 𝔟𝔶 𝔭𝔯𝔞𝔶𝔢𝔯 𝔞𝔫𝔡 𝔰𝔲𝔭𝔭𝔩𝔦𝔠𝔞𝔱𝔦𝔬𝔫 𝔴𝔦𝔱𝔥 𝔱𝔥𝔞𝔫𝔨𝔰𝔤𝔦𝔳𝔦𝔫𝔤 𝔩𝔢𝔱 𝔶𝔬𝔲𝔯 𝔯𝔢𝔮𝔲𝔢𝔰𝔱𝔰 𝔟𝔢 𝔨𝔫𝔬𝔴𝔫 𝔲𝔫𝔱𝔬 𝔊𝔬𝔡. 𝔄𝔫𝔡 𝔱𝔥𝔢 𝔭𝔢𝔞𝔠𝔢 𝔬𝔣 𝔊𝔬𝔡, 𝔴𝔥𝔦𝔠𝔥 𝔭𝔞𝔰𝔰𝔢𝔱𝔥 𝔞𝔩𝔩 𝔲𝔫𝔡𝔢𝔯𝔰𝔱𝔞𝔫𝔡𝔦𝔫𝔤, 𝔰𝔥𝔞𝔩𝔩 𝔨𝔢𝔢𝔭 𝔶𝔬𝔲𝔯 𝔥𝔢𝔞𝔯𝔱𝔰 𝔞𝔫𝔡 𝔪𝔦𝔫𝔡𝔰 𝔱𝔥𝔯𝔬𝔲𝔤𝔥 ℭ𝔥𝔯𝔦𝔰𝔱 𝔍𝔢𝔰𝔲𝔰" - 𝔓𝔥𝔦𝔩𝔦𝔭𝔭𝔦𝔞𝔫𝔰 4:6-7 ᴛᴇʟᴇɢʀᴀᴍ: @mleku1 ᴍᴀᴛʀɪx: @mleku17:matrix.org ꜱɪᴍᴘʟᴇx: https://smp15.simplex.im/a#PPkiqGvf5kZ3AbFWBh3_tw1b_YgvnkSgDEc_-IuuRWc

decentralization and small business/orgs need the small stuff, but big business needs bigger stuff. the approaches are complementary. small simple relays that are easy to set up and manage have a lower marginal cost in HR for a small user, and for the overall picture, a more resilient network, the decentralization/censorship resistance (and smaller attack surface for taking down many small targets, big systems are easier to do more damage).

the way it's played out over the history of internet services has been very clear. the more you centralize, the more brittle the system becomes.

haha, yeah... this is the flaw with the bigger is better strategy when you can get a lot of mileage out of horizontal replication, especially when the database has GC to contain disk usage and the concomittant search cost of a larger database

yeah, i think as always hybrid is generally better than dedicated. smaller, simpler parts that are built with clean simple interfaces, are much more easy to scale per use case

badger is better because it has split key/value tables. a lot less wasted time compacting every time values are written, and easier and faster to use key table to store some of the data that has to be scanned a lot.

for whatever stupid reason, nobody else in database development has realised the benefit of the key/value table splitting, even though the tech has been around for 9 years already.

probably similar reasons why so many businesses are stuck with oracle

yeah, i've been thinking about how to do SSE properly. it seems to me the right way is to open one to handle all subscriptions, starting from when the client wants a subscription to be open, one is opened, and everything goes through that, and the event format (as in SSE event) includes the subscription id related to it.

this avoids the problem of the limits, which i think are typically about 8. but even 8 is plenty, it's just not necessary because you can multiplex them just by using subscription IDs just like the websocket API does. it also simplifies maintaining the subscription channel open, and it also allows arbitrary other kinds of notifications to get pushed as well, that we haven't thought of yet, aside from subscriptions to event queries based on newly arrived events.

why i think use SSE instead of a websocket? because it's less complex, basically just a HTTP body that slowly is written to. this pushes everything to the TCP layer instead of adding a secondary additional websocket pingpong layer. the client also knows if the SSE has disconnected and can start it up again, and the subscription processing on the relay side should keep a queue of events so when a subscription SSE dies it pushes the cache of events that have come in between the last one sent to the client (so it also means there needs to be a top level subscription connection identifier, IP address is probably not going to always work for multiple users behind one NAT).

also, just keep in mind that the websocket API by default creates a subscription management thing for every single socket that is opened, whereas if you do the queries primarily by http requests, this is slimmed down to a single subscription multiplexer, which will make it more memory efficient as well.

i don't think that there is enough of a clear benefit in using websockets for everything, their only real utility is for highly interactive connections, the complexity of multiplexing them at the application layer compared to one shot requests most of the time and one subscription stream for pushing data back is a huge reduction in the required data for managing a single client connection

we need full text indexes first!

algorithms kinda depend on that

it's also a good use case for LLMs to take those fulltext searches and use their semantic juju to filter and sort results by relevance, there are ways to do this with simpler stuff but feeding an LLM a set of results to rank them by relevance is basically what they were born for. i've been using LLMs more in my programming work and what they are really good at is sifting through a lot of data and picking out relevant stuff. they are not so smart at writing code because of the causality relations between processes and ontology of the data. this is why there is these "agent" things, which basically use similar principles as language compiler state machines to define a procedure, and the LLM creates plans, and then evolves them as it acquires more input from the code.

anyway, yeah. we need full text indexes first. DVMs should not exist - the relay should have this facility built into it, and then a worker that takes results and sends them to an LLM to filter and sort them.

ah, just to explain how you do things with badger, because it differs from most other key/value stores due to the separation of key and value tables...

because writing values doesn't force any writes on the keys, keys stay in order a lot more, generally, once compacted, forever compacted (compaction is playing the log out to push it into an easily iterated, pre-sorted array)

as a result, the best strategy with badger for storing any kind of information that won't change, and needs to be scanned a lot, you put, very often, values in the keys for such immutable stuff, such as tombstones

it's also used for searching, as you would expect, but this is the reason why when you use badger (properly) to write a database, it's so much faster. it doesn't have to skip past the values when its' scanning, and you don't have to re-compact the keys when you change values (and yes, it of course has versioning of keys, i don't use this feature but in theory there is often some number of past versions of a value that can be accessed with a special accessor for this, but more generally it makes the store more resilient, as you would expect)

so, yeah, current arrangement for tombstones in realy is the first (left, most significant) part of the event ID hash is the key. finding it is thus simple and fast, just trim off the last half and prefix with the tombstone key prefix and even you can just use the "get" function on the transaction instead of making a whole iterator, so it's very neat, and very fast.

i also exploit these properties of badger key tables with the "return only the ID" functions by creating an index that contains the whole ID after the event's serial number, which means the event itself doesn't have to be decoded for this case, which is a huge performance optimization as well.

yes, that full ID index, it also contains a truncated hash of the pubkeys and the kind number and timestamp, so you can just pull all of the relevant keys for the result serials found, filter out pubkeys, kinds, slice it by range (if the index search didn't already do this) and then sort them in ascending or descending order of timestamp, and then just return the event ids in this order.

it's a much faster request to process which means once the client has this list, it can just pull the IDs with a single query for its initial display, and add some extra to add some room to pull the rest as the display requires them, lazy loading style.

this is the key reason why i make this index and i designed it so that it is as svelte and sleek as possible for both bandwidth and rendering efficiency

ah, just to explain how you do things with badger, because it differs from most other key/value stores due to the separation of key and value tables...

because writing values doesn't force any writes on the keys, keys stay in order a lot more, generally, once compacted, forever compacted (compaction is playing the log out to push it into an easily iterated, pre-sorted array)

as a result, the best strategy with badger for storing any kind of information that won't change, and needs to be scanned a lot, you put, very often, values in the keys for such immutable stuff, such as tombstones

it's also used for searching, as you would expect, but this is the reason why when you use badger (properly) to write a database, it's so much faster. it doesn't have to skip past the values when its' scanning, and you don't have to re-compact the keys when you change values (and yes, it of course has versioning of keys, i don't use this feature but in theory there is often some number of past versions of a value that can be accessed with a special accessor for this, but more generally it makes the store more resilient, as you would expect)

so, yeah, current arrangement for tombstones in realy is the first (left, most significant) part of the event ID hash is the key. finding it is thus simple and fast, just trim off the last half and prefix with the tombstone key prefix and even you can just use the "get" function on the transaction instead of making a whole iterator, so it's very neat, and very fast.

i also exploit these properties of badger key tables with the "return only the ID" functions by creating an index that contains the whole ID after the event's serial number, which means the event itself doesn't have to be decoded for this case, which is a huge performance optimization as well.

idk how it works with other databases but with badger you can use these "batch" streaming functions that automatically run with as many threads as you specify. a mark and sweep style GC pass on 18gb takes about 8 seconds on my machine, probably faster on current gen NVMe and ddr5 memory

the GC can also do multiple types of collection as well, all at the same time, so you could set it to prune stuff that you keep access counters and first-seen timestamps as well as snuffing old tombstones

mistakes were made. next round will be better.

baby carrots are so yummy. just scrub em hard with a brush or stainless steel scrubber instead of peeling them anyhow.

yeah, it also helps to make sure that there isn't any pebbles or big rocks in a carrot bed

they are harder to process for cooking when they are all deformed like this

yeah, realy has tombstones... and yeah it really should not store them, but only push them out to relays that are subscribed to them (which would be driven by the req of the replicas)

tombstones do eventually need to be cleaned up though. the tombstones in realy have a timestamp on them, i had it in mind eventually to make a GC to clear them out when they get too numerous and prune out the oldest ones that are unlikely to appear again (let's say, after 3 months or something)

yeah, there is also a major missing thing - negations of all the filter fields, it would be so simple to add it. you could then exclude events by id, pubkey, or whatever. all credible database query systems have negations

"send me all that match this, except those that match that"

and yeah, having the option to just get the event IDs instead of the whole shebang.

the real reason why all this stuff doesn't already exist is because nostr "envelopes" are such a shitty, hard to extend API, that pretends to not be an API.

yeah, the function that gives event IDs based on the database sequence number also makes syncing so easy. don't tell semisol i said thanks for giving the idea tho. he already has too much air pressure between his ears

i mean, not the rest. just the vibe coding clowns who aren't really doing anything but get all this money, and we are doing it mostly on our own dime, part time. haha. it's gonna be funny watching them try to pretend it doesn't exist.

haha, yeah, a benchmark would be pretty cool, especially if it can be pointed at any standard nip-01 relay for comparison

quality is subjective, and can't be discovered until after it's seen

unless you put some limit on what can be STORED on a relay how do you evaluate what anyone wants to see, and then ... you know.. pay someone to store it?

i say, no. you have to charge to store it. first. other relays can decide to copy that data, it can be freely accessed from teh first relay, so, whatever. but i am vehemently against popularity being any kind of means of deciding the cost of storage.

why? because storage has a fucking cost.

you make popularity a premium that means no price to post?

then you, normal, unpopular person, are shit outta luck, i mean what the fuck. come on, this is not ok

paid to show?

i don't mean that. i mean pay to store it on a proverbial relay

you are talking about, essentially, a URL right?

so, yeah, noh. wikilink is not better than a good old W3C standard spec URL

wikkilink assumes the consistency of a distributed store of data. we don't have that guarantee yet. nostr or similar protocols could create such a guarantee ... if there was a replication strategy baked into it.

making a standard nostr URL would be what you are thinking of. something that binds to a static, and permanent event ID, that is retained in order to create a history of edits. that's a lot of assumptions and a lot of protocols that don't exist yet.

actually, i discovered i had some non-functional remotes in the git config. removed them and the push actually works now as expected

there is some other issues though. the code check and dependencies updates functions seem to go spaz but it's not really causing a problem because junie is busy making problems for it that it seems to get stuck trying to figure out. haha. so, it's working fine, just some boring old regular intellij minor glitches

it is using claude 3.7 by default btw. no idea if the other option, 4, would be better. i might try it. maybe it writes more correct code more quickly

if you can do any kind of computing with these things then you can modulate a signal. you couldn't use them to do schnorr's algorithm unless there was ways to take that randomness and use it to create order.

the reaction of these things to inputs has a timing feature to it that would likely be able to encode digital bits, albeit maybe quite slowly, nevertheless, detecting the change of state should be possible within at most a few thousand points of its change pattern.

bitcoin's difficulty adjustment is a process of adding a signal to a poisson point process that uses only 2000 something samples that hit the threshold (ie, block solutions) that results in a steady token emission rate.

i'm quite sure that a similar thing can be done to identify a signal sent over such a random process that would be at least 56k modem speed. hell, if it can even do 300baud that's still enough to have IRC chat across it, given a modified, minimised protocol.

anyhow, if you go to https://realy.lol you will find there is now a branch `minimal` which has all the auth removed. it works, seemingly ok, with nostrudel but jumble doesn't seem to recognise that events are saved for whatever reason.

anyway, point being there is now an ultra bare minimum realy that should not be let outside into the wild internet where it will quickly be laden with gay porn and yodabotspam.

and maybe it needs some fixing with how it's sending back OK messages or something.

it also has the HTTP API there but all the admin stuff has been removed because there is no auth anywhere now.

didn't really take me that long to fix. just remove things, then compiler complains with lists of all the things that are broken, i just go through and remove them and recompile until nothing complains and it runs.

that's probably about it for me building nostr relays tho, unless i can get paid for it. my day job is too much of a pain at the moment and i really need to keep this job and also need to recover my dignity and sense of self worth at this point.

i just realised that i haven't installed any of my private/direct messaging things as advertised in my profile kind 0 event.

i'm having a big problem trying to justify wasting my time to enable people directly messaging me. because nobody ever did before and nobody seems to want to work with me. i'll just focus on my work. kthxbye

Replying to Avatar semisol

Hi nostr:npub1fjqqy4a93z5zsjwsfxqhc2764kvykfdyttvldkkkdera8dr78vhsmmleku.

Kindly go fuck yourself.

nostr:note14qklh45n8gpjmr55hunr7l9fd35n0h0qhyp3usav7tycknddxwdqw3wheg

this was beyond the pale.

there has been a watershed. before getting called an internet scammer by semisol, and after.

after, there shall be no more following anyone who wastes their time reading your bullshit. and i certainly won't be renewing my subscription to your gay relay.

curmudgeon is an irish word i think?

my mother's family is scottish by line of the fathers but i know we gots a bit of welsh and irish in there too. i certainly have the blarney.

i'm done with him and everyone who gives him the time of day at this point

every living thing has to eat but parasitism... i mean, parasites live on you and inside you, and they don't benefit you.

lichen is algae and bacteria helping each other out.

humans herding sheep, goats and cattle is not parasitism, the animals pretty much live their natural life under protection and instead of tearing them apart slowly we kill them with precision, and honor every part of them, feeding our plants, wearing their skins, ornamenting our walls with their horns.

most of the stuff that lives inside us is not parasitic, it is a mutual benefit between us and them.

it kinda contradicts the whole nihilistic heat death of the universe model of reality that living things generally are helping each other. the parasites are outliers. probably a great deal of relationships between life forms are actually about crowding out the space where parasites might be.

not only are they cheap and fake they are depressing.

this is one thing that sockets can do better, because they don't necessarily send events all at once. i wrote the filters previously such that they sort and return results all in one whack, i think what you probably want then is for each filter, in the response you identify the query by a number, and the client always maintains an SSE channel that allows the relay to push results.

with this, the query can then propagate, all the results that are hot in the cache are sent, and if there was events that required a query forward, those results can then get sent to the client over the SSE subscription connection.

i really really need to have some kind of elementary event query console to do these things, a rudimentary front end. i probably should just make it a TUI, i think there is at least one existing Go TUI kind 1 client... i should just build with that, instead of fighting the bizarre lack of adequate GUIs for Go