Personal tools
You are here: Home Code How to Install Tesseract OCR

How to Install Tesseract OCR

This tutorial explains how to install Tesseract OCR and libtiff on Linux, from my experience.

You can also OCR documents using Tesseract by uploading documents to this web site. Click here for more info.

 

Install libtiff (optional)

Tesseract will work without libtiff but it will not be able to recognize compressed tif images.

 

  • Download Libtiff from here. (I used tiff-3.8.2.tar.gz). Extract it and go to that directory.
  • In accordance with this, run the following three commands:
    % ./configure
        ...lots of messages...
    % make
        ...lots of messages...
    # make install


Installing Tesseract

  • Download the Tesseract code from here (I used tesseract-2.01.tar.gz).  The latest code should be in the featured download box. Extract it, it should create a directory with a name like tesseract-2.01.
  • Download the language file for the language you want Tesseract to recognize. (I used tesseract-2.00.eng.tar.gz.). Navigate to the directory created in the last step and extract the tar here:
    % cd YOURDOWNLOADDIRECTORY/tesseract-2.01
    % tar zxf YOURDOWNLOADDIRECTORY/tesseract-2.00.eng.tar.gz
  • In accordance with the Tesseract ReadMe, run the following three commands:
    ./configure
    make
    make install


Testing Installation

Test Tesseract by downloading phototest.tif or using the one that came with tesseract and running:

tesseract phototest.tif output

Tesseract should then output a file output.txt, containing the results of the recognition.

If you installed libtiff, test recognition of compressed images, by downloading phototest-fax4.tif and following the above instructions.

 

Hopefully that works for you.  For further reading you can check out the scripts we use with Tesseract on this site. You can read more about A Billion Billion here or read more about A Billion Billion's free OCR.

Document Actions
Advertisement
Log in


Forgot your password?
New user?