Would it be storing the pages converted into Asciidoc or leaving as HTML?
Discussion
We are planning on storing entire webpages in the kind 31 citation events for "external web references". So that people can refer to those, rather than the web pages, in case the pages change. The page content is then the event "content": "
Is this the sort of thing you mean?
that can work, but the playback experience will be severely limited. for instance, any "external assets" (images, CSS, etc.) will either break or try to load the live asset.
WARC handles this by capturing the original request to that asset and recording it. so when loading later you can also replay these recorded sub-assets.
see my other reply
Yeah, it's more meant as a snapshot for document references.
I dig. that makes sense for this purpose!
the system I'm describing *does* require the whole replay client side setup, so it's constrained in that way.
having a basic "reader mode" view of the snapshot you're describing sounds really appropriate for nostr.
...might have some issues with sites that load content dynamically on scroll.. but it's impossible to handle everything. the web is so sadly broken these days. hardly anything is a document anymore!
the WARC (and WACZ) file format (used by Internet Archive and others) is a bit special. HTTP requests and responses are recorded and written into the WARC file, which is subsequently used by replay software to "play back" these responses in place of the original server that once provided these responses.
this is why Wayback Machine can provide such high fidelity replay experiences.
it's a fundamentally simple approach that provides a really powerful experience. you only need to be able to store basic text in order to provide this experience, as a server.