A scanned PDF is just pictures of pages — there’s no text for search or AI classification to work with. dossier fixes that with on-device OCR using Apple’s Vision framework. Nothing is uploaded anywhere.
When it happens automatically
- During a batch auto-file run, image-only PDFs are OCR’d on the fly so they can be classified like any other document.
When you want it manually
- Right-click a PDF in the file list → OCR.
- dossier rasterizes each page, extracts the text, and stores it in the catalog — the PDF file itself is not modified.
How you see the result
- OCR’d files show an OCR badge in the file list.
- The extracted text is immediately part of full-text search, and there’s an Only OCR’d files toggle in the search filters.
- The AI can now read the document’s content when proposing where it belongs.
OCR quality depends on the scan. For a crooked 1990s fax, expect the text layer to be imperfect — usually still plenty for search and classification.