decentralize the archives. anyone doing heavy archiving on nostr would run their own relay where they would post WARC web archive records as nostr notes.

the CDX index entries to those pages would be posted as notes and shared around, and anyone wishing to replay a page would be (automatically, in the background) finding all the relevant WARC-notes to reconstruct the page.

the same idea as the wayback machine web archive, but distributed.

Reply to this note

Please Login to reply.

Discussion

Would it be storing the pages converted into Asciidoc or leaving as HTML?

We are planning on storing entire webpages in the kind 31 citation events for "external web references". So that people can refer to those, rather than the web pages, in case the pages change. The page content is then the event "content": "".

Is this the sort of thing you mean?

https://next-alexandria.gitcitadel.eu/publication?d=gitcitadel-project-documentation-citations-specification-9-by-stella-v-1

that can work, but the playback experience will be severely limited. for instance, any "external assets" (images, CSS, etc.) will either break or try to load the live asset.

WARC handles this by capturing the original request to that asset and recording it. so when loading later you can also replay these recorded sub-assets.

see my other reply

Yeah, it's more meant as a snapshot for document references.

I dig. that makes sense for this purpose!

the system I'm describing *does* require the whole replay client side setup, so it's constrained in that way.

having a basic "reader mode" view of the snapshot you're describing sounds really appropriate for nostr.

...might have some issues with sites that load content dynamically on scroll.. but it's impossible to handle everything. the web is so sadly broken these days. hardly anything is a document anymore!

the WARC (and WACZ) file format (used by Internet Archive and others) is a bit special. HTTP requests and responses are recorded and written into the WARC file, which is subsequently used by replay software to "play back" these responses in place of the original server that once provided these responses.

this is why Wayback Machine can provide such high fidelity replay experiences.

it's a fundamentally simple approach that provides a really powerful experience. you only need to be able to store basic text in order to provide this experience, as a server.