web archiving is a childs play.

what we want to do is decentralize web crawling data. nobody uses yacy which implicate it failed.

what is the total size of latest common crawl? (estimate)

Compressed size (gzip‑ed WARC) 250‑350 TB

solve this.

Reply to this note

Please Login to reply.

Discussion

No replies yet.