Tesseract OCR is a Windows software that lets you convert text on photos and scanned images to editable form using advanced optical character recognition methods. Please note that all interactions with Tesseract OCR are performed via a command line console.

Text extraction

The list of available CLI arguments with detailed descriptions is located in the provided documentation. Users can process multiple imported pictures in batch mode, saving time and effort. Similar to ABBYY FineReader, there is support for a wide variety of common languages from around the world such as:

Chinese;
English;
Arabic;
Russian and more.

It is possible to integrate the program into Python scripts and third party applications. You are able to save the output as plain text, searchable PDF document or HTML page.

Use cases

The utility is invaluable when transforming paper based records into editable files. Moreover, users can quickly extract data from scanned invoices, receipts and business cards for database entries. Old books, manuscripts and historical documents may be preserved in digital form as well, simplifying further research.

Features

free to download and use;
designed to help you convert words on images to editable text documents;
supports over 100 languages, including English, Portuguese and German;
requires certain familiarity with the command line interface;
compatible with all modern versions of Windows.