

Paperwise is a self-hosted system designed to turn raw files—such as PDFs, scans, and images—into structured data. By combining OCR with AI-driven extraction, it helps users organize documents through auto-tagging and metadata extraction.
The software is deployed via Docker Compose on local infrastructure, allowing users to maintain control of their data.
Beyond data extraction, the platform supports a grounded Q&A system. This allows users to ask natural language questions across multiple documents and receive answers that include source citations traceable to the original files.
Buyers should confirm they have the technical capacity to manage a Docker-based installation and provide the necessary AI model API keys or local model resources to support the extraction and Q&A functions.
Supports switching between local OCR and LLM-based OCR to handle various document qualities, including dense layouts and scans.
Supports natural language questions across documents with source quotes that link back to the original files.
Automatically categorizes documents by type, date, entity, and custom tags to create filterable views.
Provides three slots to assign different AI models for OCR, extraction, and Q&A tasks.
Deployed via Docker Compose on local infrastructure to keep data under user control.
Converting invoices and monthly billing statements into structured tables of costs and service periods.
Tracking changes in policy statements or identifying specific clauses across multiple legal agreements.
Batch uploading PDFs and images to be automatically tagged and organized by entity or date.
Pricing was not clearly available from the provided evidence. Buyers should confirm current pricing on the vendor website.
Paperwise is self-hosted and deployed using Docker Compose on your own local infrastructure or servers.
It is designed to help with scans, dense layouts, and messier files by switching between local and LLM-based OCR.
Users can ask natural language questions, and the system provides answers grounded in the uploaded documents, including quotes traceable to the original files.
Source category: Data & Analytics
Source subcategory: Document Automation
Paperwise is a self-hosted document intelligence tool for businesses that need to extract and query data from PDFs and scans while maintaining data control. It supports OCR and grounded Q&A with source citations. Buyers will need to be comfortable with Docker Compose deployment.