Replying to Avatar Silberengel

Yeah, nostr:nprofile1qqs82et8gqsfjcx8fl3h8e55879zr2ufdzyas6gjw6nqlp42m0y0j2spz9mhxue69uhkummnw3ezuamfdejj7kpr32f was saying he might have a parser for books, but we'd probably need to adjust it and build a second one for research papers.

I talked to my cousin about it and he is just converting the html5 version of the project Gutenberg ebooks into epub3 after doing some clean up.

Reply to this note

Please Login to reply.

Discussion

Hm. They already usually have epub3 format.

Using HTML has the benefit of tags and images, of course. I should probably switch to those and write a parser based upon the HTML tags.

Is he just scraping their website?

I am not sure how he handles the initial download. But he wanted to put in a bunch of heuristics to make a more presentable epub without miss-sized images, unfortunate line breaks etc.

That's probably where an LLM would come in useful, for refining.