How to Install Tesseract OCR
This tutorial explains how to install Tesseract OCR and libtiff on Linux, from my experience.
Install libtiff (optional)
Tesseract will work without libtiff but it will not be able to recognize compressed tif images.
- Download Libtiff from here. (I used tiff-3.8.2.tar.gz). Extract it and go to that directory.
- In accordance with this, run the following three commands:
% ./configure ...lots of messages... % make ...lots of messages... # make install
Installing Tesseract
- Download the Tesseract code from here (I used tesseract-2.01.tar.gz). The latest code should be in the featured download box. Extract it, it should create a directory with a name like tesseract-2.01.
- Download the language file for the language you want Tesseract to recognize. (I used tesseract-2.00.eng.tar.gz.). Navigate to the directory created in the last step and extract the tar here:
% cd YOURDOWNLOADDIRECTORY/tesseract-2.01 % tar zxf YOURDOWNLOADDIRECTORY/tesseract-2.00.eng.tar.gz
- In accordance with the Tesseract ReadMe, run the following three commands:
./configure make make install
Testing Installation
Test Tesseract by downloading phototest.tif or using the one that came with tesseract and running:
tesseract phototest.tif output
Tesseract should then output a file output.txt, containing the results of the recognition.
If you installed libtiff, test recognition of compressed images, by downloading phototest-fax4.tif and following the above instructions.
Hopefully that works for you. For further reading you can check out the scripts we use with Tesseract on this site. You can read more about A Billion Billion here or read more about A Billion Billion's free OCR.

