Personal tools
You are here: Home Blog

Blog

Saturday February 23, 2008

OCRing of color TIFFs is now supported. By default the engine we use to OCR documents, Tesseract OCR, does not work with color images and when attempting to OCR a color TIFF the OCR would give garbage output. We now automatically reduce the color depth to grayscale before sending the image to be OCRed so that we can get good results. Learn more about OCR on A Billion Billion here. Developers, check out the scripts we used to accomplish this here.

 

Document Actions

Plone OCR

Posted by Veikko at Mar 20, 2008 05:23 PM
Great work

I have been looking for this kind of application for years especially for Plone for its superior authentication system.

I think the next steps are

1.
Output to pdf file
- convert image and embed the text (doesnt necessarily have to be visually over the text at the first stage)
- - example can be found in scanr.com

2.
Background conversion and indexing
- inbox -> outbox (?)
- this way the scanned documents can be copied to server using a synchronized (webdav) folder in the desktop computer

3.
Then you have a nice document management system for small companies, small associations, clubs, etc.
- indexed and OCR'd documents
- multilevel authentication for users to see the doduments
Log in


Forgot your password?
New user?