Kreuzberg is a Python library offering asynchronous text extraction capabilities from various document formats, including PDFs, images, and office files, with local processing and minimal dependencies. The library provides both single-item and batch processing options, integrating tools like Tesseract OCR and Pandoc for comprehensive format support.
https://github.com/Goldziher/kreuzberg
#textprocessing #pythonlibrary #ocr #documentextraction #fileconversion