With the help of this open source image processing library written in Python users can automatically improve picture quality and extract text from scanned pages.
Vndocr
Vndocr is a Windows Python module for automatic optical character recognition. There are instruments for cleaning up scanned documents to improve the accuracy of text extraction. Supported languages include Vietnamese, English, French, German, Chinese, etc.
Visual enhancements
After importing the photo you can remove digital noise and deskew the image. Additionally, it is possible to adjust the overall brightness and contrast. This is helpful for increasing the text clarity and ensuring correct conversion.
It is worth mentioning that Vndocr is discontinued and no longer supported by the developer. The installation files are not available. There are other solutions with similar functionality such as HP Smart Document Scan.
Document processing
Tools for analyzing the layout of scanned pages to identify different elements such as text, images and tables are provided. Users are able to generate structured documents from source pictures.
There is support for multiple input formats, including PDF, JPEG and PNG. You can copy extracted text to the system clipboard or save it as a text file.
Using Vndocr developers are able to automate the processing of scanned documents and improve the accuracy of data entry. Moreover, tools for creating new OCR applications are provided.
Features
- free to download and use;
- contains a Python package for optical character recognition;
- it is possible to automatically improve the quality of scanned pages;
- users can analyze the document structure to identify text blocks, tables and pictures;
- compatible with all modern versions of Windows.