Pure Local Doc Extraction: Free, Open-Source & No Dependencies!
DocExt: Open-source, local document extraction tool—no OCR, no cloud. Extract fields & tables from receipts, invoices & more.
"Top Python Libraries" Publication 400 Subscriptions 20% Discount Offer Link.
DocExt is an open-source project from Nanonets, designed to provide a full-process, OCR-free, zero-cloud-dependency local document structuring and extraction tool.
It supports various document types such as receipts, passports, and invoices, with capabilities for field and table recognition.
With the advent of the large-scale AI model era, traditional OCR + LLM workflows often require manual tuning, template setup, and external API support.
DocExt (Document Extractor) revolutionizes this approach by directly leveraging Visual Language Models (VLM) for the semantic understanding of document images:
Zero OCR: Eliminates reliance on engines like Tesseract/EasyOCR, avoiding OCR error propagation;
Zero Cloud Calls: Local deployment, fully offline operation, ensuring data privacy;
Zero Template Restrictions: No need for manual template creation; works with preset or custom fields.