#orly #devstr #progressreport

since adding a spider to collect various directory events of the social graph of designated relay owners, the relay's memory utilization has gone up so much that it's using around 11gb of memory at peak and this is blowing up and causing the kernel OOM killer to keep on restarting the relay on wss://realy.mleku.dev

i've gotta refactor the spider so it is more memory efficient. even just necking it down to a single thread it's still burning so much memory.

aargh. the LLM coding agent wasn't successful in improving the situation either, i need to refactor this spider code so it disposes of memory more efficiently.

Reply to this note

Please Login to reply.

Discussion

Apologies if I am not following completely, but out of curiosity, what language are we talking about, and is the code on GitHub?

https://orly.dev is the address of the repo (it's a redirect to github using a reverse proxy i modded for this "go vanity imports") and yes , the language is #golang

i have discovered in the last 18 months working on nostr relay dev that it's quite easy to cause a shitload of memory allocation temporarily that causes applications to get killed by the kernel when they exhaust the available memory.

the solution usually just involves changing the algorithm to avoid piling up large amounts of data at once and instead processing things in a pipeline where the memory gets freed properly before it builds into a giant slab of OOM death (out of memory).

I am feeling you. For something as dynamic as Nostr, pipelining and strict memory hygiene seem like the only feasible way.

yeah, for this spider stuff, fetching events for whitelisted users on the relay, and for bulk import, there is some serious challenges with not having memory blow up.

One thing that comes to mind is a project like TigerBeetle choosing Zig for deterministic, explicit memory control (no GC surprises), but their use case (a financial database) is much more predictable. For your relay's open-ended datasets, careful pipelining is probably the main solution, regardless of the language.

Are you experiencing real GC pain points or just the challenges of processing large data streams?

fortunately i was able to fix the problem just by using jetbrains junie LLM coding agent, too. haha. the thing that totally nailed it was one of the longest and most complicated prompts i've written to date. it managed to understand it and guide a refactoring process that appears to have totally fixed the problem. yay!