This powerful optical character recognition engine allows users to convert text on scanned images to editable form. There are tools for automatic line finding.
Tesseract
Tesseract is a Windows software component for optical character recognition. It can be used as a standalone utility or integrated with other programs to create a complete OCR solution.
Conversion
Similar to ABBYY FineReader, the engine processes input pictures by computationally analyzing all pixels and detecting text blocks. There are advanced classification methods for recognizing individual characters and matching them with corresponding symbols. At the last stage, all letter outlines are gathered together into text lines.
Tesseract was developed by Hewlett Packard in the early 90s. The project was released as an open source program in late 2005.
Advanced capabilities
A flexible adaptive classifier is included. This functionality is intended for improving overall OCR accuracy. Additionally, you can automatically identify all paragraph lines to ensure that the page format is correctly converted.
This tool is available for free under the Apache License 2.0. It is possible to integrate OCR functions into any utility using the provided C++ application programming interface.
Features
- contains a popular open source optical character recognition engine;
- free to download and use;
- it is possible to automatically convert scanned images to editable form;
- provides advanced symbol classification methods;
- you can integrate Tesseract into any other program using the included API;
- compatible with all modern versions of Windows.