Kreuzberg is a Python library offering asynchronous text extraction capabilities from various document formats, including PDFs, images, and office files, with local processing and minimal dependencies. The library provides both single-item and batch processing options, integrating tools like Tesseract OCR and Pandoc for comprehensive format support.

https://github.com/Goldziher/kreuzberg

#textprocessing #pythonlibrary #ocr #documentextraction #fileconversion

Reply to this note

Please Login to reply.

Discussion

No replies yet.