How To Ocr A Pdf

THL Toolbox > Scanning & OCR > How to OCR a PDF

How to OCR a PDF Using Adobe Acrobat Professional

Contributor(s): Scholars' Lab staff, Adriana Barcenas, Steven Weinberger, Zach Rowinski

This is the process for running OCR on a PDF so that it is searchable, using Acrobat Professional:

  1. For most PDFs, you want to run Optimize after you scan them. First rename the file; then pull down the Document menu and select Optimize.
  2. Then, to run OCR: open the PDF file you want to run OCR on.
  3. Pull down the File menu, choose "Save as," and add "-ocr.pdf" to the file name
  4. Pull down the Document menu, point to "OCR Text Recognition," and then point to "Recognize Text Using OCR…" and "start"
  5. The OCR process will start. It will take some time, depending on the number of pages in the PDF.
  6. When it finishes, save the file. Be sure to check by doing a search on "the" or another word in the file and make sure it returns results.

To OCR roman text with diacritic characters, investigate using Abbyy's FineReader (external link: http://www.abbyy.com/). No THL staff have used this and we have no experience with it. For more information, see Zach Rowinski's assesssment.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library