Python text recognition library

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

docTR (Document Text Recognition) — a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

License

mindee/doctr

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch

What you can expect from this repository:

  • efficient ways to parse textual information (localize and identify each word) from your documents
  • guidance on how to integrate this in your current architecture

OCR_example

Getting your pretrained model

End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.

from doctr.models import ocr_predictor model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)

Documents can be interpreted from PDF or images:

from doctr.io import DocumentFile # PDF pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Image single_img_doc = DocumentFile.from_images("path/to/your/img.jpg") # Webpage webpage_doc = DocumentFile.from_url("https://www.yoursite.com") # Multiple page images multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])

Let’s use the default pretrained model for an example:

from doctr.io import DocumentFile from doctr.models import ocr_predictor model = ocr_predictor(pretrained=True) # PDF doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Analyze result = model(doc)

Dealing with rotated documents

Should you use docTR on documents that include rotated pages, or pages with multiple box orientations, you have multiple options to handle it:

  • If you only use straight document pages with straight words (horizontal, same reading direction), consider passing assume_straight_boxes=True to the ocr_predictor. It will directly fit straight boxes on your page and return straight boxes, which makes it the fastest option.
  • If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations will be converted to straight boxes), you need to pass export_as_straight_boxes=True in the predictor. Otherwise, if assume_straight_pages=False , it will return rotated bounding boxes (potentially with an angle of 0°).

If both options are set to False, the predictor will always fit and return rotated boxes.

To interpret your model’s predictions, you can visualize them interactively as follows:

Or even rebuild the original document from its predictions:

import matplotlib.pyplot as plt synthetic_pages = result.synthesize() plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

Synthesis sample

The ocr_predictor returns a Document object with a nested structure (with Page , Block , Line , Word , Artefact ). To get a better understanding of our document model, check our documentation:

You can also export them as a nested dict, more appropriate for JSON format:

json_output = result.export()

The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.

The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.

from doctr.io import DocumentFile from doctr.models import kie_predictor # Model model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True) # PDF doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Analyze result = model(doc) predictions = result.pages[0].predictions for class_name in predictions.keys(): list_predictions = predictions[class_name] for prediction in list_predictions: print(f"Prediction for class_name>: prediction>")

The KIE predictor results per page are in a dictionary format with each key representing a class name and it’s value are the predictions for that class.

If you are looking for support from the Mindee team

Bad OCR test detection image asking the developer if they need help

Python 3.8 (or higher) and pip are required to install docTR.

Since we use weasyprint, you will need extra dependencies if you are not running Linux.

For MacOS users, you can install them as follows:

brew install cairo pango gdk-pixbuf libffi

For Windows users, those dependencies are included in GTK. You can find the latest installer over here.

You can then install the latest release of the package using pypi as follows:

⚠️ Please note that the basic installation is not standalone, as it does not provide a deep learning framework, which is required for the package to run.

We try to keep framework-specific dependencies to a minimum. You can install framework-specific builds as follows:

# for TensorFlow pip install "python-doctr[tf]" # for PyTorch pip install "python-doctr[torch]"

For MacBooks with M1 chip, you will need some additional packages or specific versions:

Alternatively, you can install it from source, which will require you to install Git. First clone the project repository:

git clone https://github.com/mindee/doctr.git pip install -e doctr/.

Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:

# for TensorFlow pip install -e doctr/.[tf] # for PyTorch pip install -e doctr/.[torch]

Credits where it’s due: this repository is implementing, among others, architectures from published research papers.

The full package documentation is available here for detailed specifications.

A minimal demo app is provided for you to play with our end-to-end OCR models!

Demo app

Courtesy of 🤗 Hugging Face 🤗 , docTR has now a fully deployed version available on Spaces! Check it out

If you prefer to use it locally, there is an extra dependency (Streamlit) that is required.

pip install -r demo/tf-requirements.txt

Then run your app in your default browser with:

USE_TF=1 streamlit run demo/app.py
pip install -r demo/pt-requirements.txt

Then run your app in your default browser with:

USE_TORCH=1 streamlit run demo/app.py

Instead of having your demo actually running Python, you would prefer to run everything in your web browser? Check out our TensorFlow.js demo to get started!

TFJS demo

If you wish to deploy containerized environments, you can use the provided Dockerfile to build a docker image:

docker build . -t YOUR_IMAGE_TAG>

An example script is provided for a simple documentation analysis of a PDF or image file:

python scripts/analyze.py path/to/your/doc.pdf

All script arguments can be checked using python scripts/analyze.py —help

Looking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful FastAPI framework.

Specific dependencies are required to run the API template, which you can install as follows:

cd api/ pip install poetry make lock pip install -r requirements.txt

You can now run your API locally:

uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app

Alternatively, you can run the same server on a docker container if you prefer using:

PORT=8002 docker-compose up -d --build

Your API should now be running locally on your port 8002. Access your automatically-built documentation at http://localhost:8002/redoc and enjoy your three functional routes («/detection», «/recognition», «/ocr», «/kie»). Here is an example with Python to send a request to the OCR route:

import requests with open('/path/to/your/doc.jpg', 'rb') as f: data = f.read() response = requests.post("http://localhost:8002/ocr", files='file': data>).json()

Looking for more illustrations of docTR features? You might want to check the Jupyter notebooks designed to give you a broader overview.

If you wish to cite this project, feel free to use this BibTeX reference:

@miscdoctr2021, title=docTR: Document Text Recognition>, author=Mindee>, year=2021>, publisher = GitHub>, howpublished = \url> >

If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?

You’re in luck, we compiled a short guide (cf. CONTRIBUTING ) for you to easily do so!

Distributed under the Apache 2.0 License. See LICENSE for more information.

About

docTR (Document Text Recognition) — a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

module for text recognition from an image

ruzhova/text-recognition

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Распознавание рукописного текста

Данный модуль позволяет довольно точно распознавать рукописный текст.

Создать виртуальное окружение и запустить его:

python3 -m venv venv source venv/bin/activate 
pip install -r requirements.txt 

Скачать базу emnist с 62 несбалансированными классами и указать до нее путь в файле cnn.py. Выбрать изображение, необходимое для обработки и указать до него путь в файле project.py. Выбрать в качестве интерпретатора venv, сохранить все изменения и запустить файл project.py.

При возникновении проблем с распознаванием следует подстроить ядра свертки, которые используются в функции erode в файле preprocessing.py. Если буквы не распознаются совсем, то ядро слишком прямоугольное и следует использовать более «квадратную» пропорцию. Если буквы распознаются в неверном порядке, то наоборот, более прямоугольную. Первое значение всегда должно быть больше, иначе распознаваться будут сразу буквы и их порядок будет неверен.

Примерные изображения для распознавания: handwritten1.jpg и hand-written-3.jpg.

About

module for text recognition from an image

Источник

Читайте также:  Javascript кнопка отправки формы
Оцените статью