web archiving is a childs play.
what we want to do is decentralize web crawling data. nobody uses yacy which implicate it failed.
what is the total size of latest common crawl? (estimate)
Compressed size (gzip‑ed WARC) 250‑350 TB
solve this.
web archiving is a childs play.
what we want to do is decentralize web crawling data. nobody uses yacy which implicate it failed.
what is the total size of latest common crawl? (estimate)
Compressed size (gzip‑ed WARC) 250‑350 TB
solve this.
No replies yet.