What this tool does
OCR PDF takes a scanned, image-only PDF and makes the text inside it searchable and selectable by adding an invisible text layer on top of each page. The visual output looks identical to the input — the text layer sits behind the existing page image, so the document still looks like a scan, but Ctrl+F and copy-paste now work.
When to use this
- Scanned contracts where you need to search for clauses or copy specific sections.
- Scanned receipts you want to feed into an expense tracker.
- Old book pages or PDFs from image-based scanners where the text is currently "locked" inside pixels.
- Before running PDF-to-Word or PDF-to-Excel on a scanned document — those tools need a text layer to extract anything.
When NOT to use this
- If your PDF already has selectable text (try Ctrl+F to check), OCR won't improve anything — just use the text that's there.
- For handwritten notes, our OCR engine (and OCR in general) produces poor results. This tool only handles printed, typeset text.
Input limits
- Single file per OCR job
- Up to 100 MB per upload
- One job at a time — OCR is memory-heavy (around 500 MB RAM per concurrent run) so we serialise requests. If another OCR job is in progress you'll see a "server busy" message; retry in a minute.
Language: English only
This tool is English-only by design. We removed the multi-language picker because users kept confusing it with translation — picking "Hindi" on an English document made the OCR engine look for Devanagari characters that don't exist in the scan, producing garbage output and a frustrating experience.
How OCR runs
The pipeline:
- Your PDF is uploaded over HTTPS to a temporary folder.
- Each page is rendered to a high-resolution raster image.
- The OCR engine analyses each image and recognises the printed characters along with their position on the page.
- The recognised text is embedded back into the PDF as an invisible layer, aligned with the visible text, so the page looks unchanged but the words are now selectable.
- The combined PDF is returned to you and the working files are deleted.
Expected accuracy and speed
- Typed, high-quality scans — accuracy is typically 95%+. You'll see occasional mistakes on unusual typefaces or low-resolution scans.
- Faded, skewed, or photocopied-many-times scans — accuracy drops. Pre-processing the scan (deskew, increase contrast) before OCR helps a lot.
- Handwritten content — don't rely on OCR for this.
- Speed — OCR is the slowest tool in our suite. Expect noticeably longer processing than other conversions, especially for long documents — each page has to go through a full image-to-text recognition pass.
Tips for the best OCR result
- Check first — press Ctrl+F in your PDF. If text is already found, it's not a scan and OCR isn't needed.
- Cleaner scans = better accuracy. Straighten skewed pages and raise contrast before uploading; faded photocopies recognise worst.
- OCR, then convert. Run OCR first, then feed the result into PDF to Word or PDF to Excel — those need a text layer to extract anything.
- Just need the words? Extract Text pulls the recognised text out as plain text.
Troubleshooting
- "Server busy" — only one OCR job runs at a time; retry in a minute.
- Garbled output — the scan is too low quality, skewed, or handwritten; OCR handles only clean printed text.
- No selectable text after OCR — the source may already be vector text (nothing to recognise) or the page images are too poor.
- Upload rejected — confirm a valid PDF under 100 MB.
Privacy and file handling
OCR runs on our secure server because character recognition is far too heavy to run in a browser tab. Your PDF is uploaded over HTTPS, processed, and both the source PDF and the generated text-layer files are deleted as soon as your download is ready. Close the tab mid-OCR and the job is cancelled and temporary files cleared automatically. No sign-up, no watermark, no copies retained.