- Saved searches
- Use saved searches to filter your results more quickly
- License
- mindee/doctr
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Saved searches
- Use saved searches to filter your results more quickly
- ruzhova/text-recognition
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
docTR (Document Text Recognition) — a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
License
mindee/doctr
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
What you can expect from this repository:
- efficient ways to parse textual information (localize and identify each word) from your documents
- guidance on how to integrate this in your current architecture
Getting your pretrained model
End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.
from doctr.models import ocr_predictor model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
Documents can be interpreted from PDF or images:
from doctr.io import DocumentFile # PDF pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Image single_img_doc = DocumentFile.from_images("path/to/your/img.jpg") # Webpage webpage_doc = DocumentFile.from_url("https://www.yoursite.com") # Multiple page images multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
Let’s use the default pretrained model for an example:
from doctr.io import DocumentFile from doctr.models import ocr_predictor model = ocr_predictor(pretrained=True) # PDF doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Analyze result = model(doc)
Dealing with rotated documents
Should you use docTR on documents that include rotated pages, or pages with multiple box orientations, you have multiple options to handle it:
- If you only use straight document pages with straight words (horizontal, same reading direction), consider passing assume_straight_boxes=True to the ocr_predictor. It will directly fit straight boxes on your page and return straight boxes, which makes it the fastest option.
- If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations will be converted to straight boxes), you need to pass export_as_straight_boxes=True in the predictor. Otherwise, if assume_straight_pages=False , it will return rotated bounding boxes (potentially with an angle of 0°).
If both options are set to False, the predictor will always fit and return rotated boxes.
To interpret your model’s predictions, you can visualize them interactively as follows:
Or even rebuild the original document from its predictions:
import matplotlib.pyplot as plt synthetic_pages = result.synthesize() plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
The ocr_predictor returns a Document object with a nested structure (with Page , Block , Line , Word , Artefact ). To get a better understanding of our document model, check our documentation:
You can also export them as a nested dict, more appropriate for JSON format:
json_output = result.export()
The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.
The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.
from doctr.io import DocumentFile from doctr.models import kie_predictor # Model model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True) # PDF doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Analyze result = model(doc) predictions = result.pages[0].predictions for class_name in predictions.keys(): list_predictions = predictions[class_name] for prediction in list_predictions: print(f"Prediction for class_name>: prediction>")
The KIE predictor results per page are in a dictionary format with each key representing a class name and it’s value are the predictions for that class.
If you are looking for support from the Mindee team
Python 3.8 (or higher) and pip are required to install docTR.
Since we use weasyprint, you will need extra dependencies if you are not running Linux.
For MacOS users, you can install them as follows:
brew install cairo pango gdk-pixbuf libffi
For Windows users, those dependencies are included in GTK. You can find the latest installer over here.
You can then install the latest release of the package using pypi as follows:
⚠️ Please note that the basic installation is not standalone, as it does not provide a deep learning framework, which is required for the package to run.
We try to keep framework-specific dependencies to a minimum. You can install framework-specific builds as follows:
# for TensorFlow pip install "python-doctr[tf]" # for PyTorch pip install "python-doctr[torch]"
For MacBooks with M1 chip, you will need some additional packages or specific versions:
Alternatively, you can install it from source, which will require you to install Git. First clone the project repository:
git clone https://github.com/mindee/doctr.git pip install -e doctr/.
Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:
# for TensorFlow pip install -e doctr/.[tf] # for PyTorch pip install -e doctr/.[torch]
Credits where it’s due: this repository is implementing, among others, architectures from published research papers.
The full package documentation is available here for detailed specifications.
A minimal demo app is provided for you to play with our end-to-end OCR models!
Courtesy of 🤗 Hugging Face 🤗 , docTR has now a fully deployed version available on Spaces! Check it out
If you prefer to use it locally, there is an extra dependency (Streamlit) that is required.
pip install -r demo/tf-requirements.txt
Then run your app in your default browser with:
USE_TF=1 streamlit run demo/app.py
pip install -r demo/pt-requirements.txt
Then run your app in your default browser with:
USE_TORCH=1 streamlit run demo/app.py
Instead of having your demo actually running Python, you would prefer to run everything in your web browser? Check out our TensorFlow.js demo to get started!
If you wish to deploy containerized environments, you can use the provided Dockerfile to build a docker image:
docker build . -t YOUR_IMAGE_TAG>
An example script is provided for a simple documentation analysis of a PDF or image file:
python scripts/analyze.py path/to/your/doc.pdf
All script arguments can be checked using python scripts/analyze.py —help
Looking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful FastAPI framework.
Specific dependencies are required to run the API template, which you can install as follows:
cd api/ pip install poetry make lock pip install -r requirements.txt
You can now run your API locally:
uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app
Alternatively, you can run the same server on a docker container if you prefer using:
PORT=8002 docker-compose up -d --build
Your API should now be running locally on your port 8002. Access your automatically-built documentation at http://localhost:8002/redoc and enjoy your three functional routes («/detection», «/recognition», «/ocr», «/kie»). Here is an example with Python to send a request to the OCR route:
import requests with open('/path/to/your/doc.jpg', 'rb') as f: data = f.read() response = requests.post("http://localhost:8002/ocr", files='file': data>).json()
Looking for more illustrations of docTR features? You might want to check the Jupyter notebooks designed to give you a broader overview.
If you wish to cite this project, feel free to use this BibTeX reference:
@miscdoctr2021, title=docTR: Document Text Recognition>, author=Mindee>, year=2021>, publisher = GitHub>, howpublished = \url> >
If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?
You’re in luck, we compiled a short guide (cf. CONTRIBUTING ) for you to easily do so!
Distributed under the Apache 2.0 License. See LICENSE for more information.
About
docTR (Document Text Recognition) — a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
module for text recognition from an image
ruzhova/text-recognition
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Распознавание рукописного текста
Данный модуль позволяет довольно точно распознавать рукописный текст.
Создать виртуальное окружение и запустить его:
python3 -m venv venv source venv/bin/activate
pip install -r requirements.txt
Скачать базу emnist с 62 несбалансированными классами и указать до нее путь в файле cnn.py. Выбрать изображение, необходимое для обработки и указать до него путь в файле project.py. Выбрать в качестве интерпретатора venv, сохранить все изменения и запустить файл project.py.
При возникновении проблем с распознаванием следует подстроить ядра свертки, которые используются в функции erode в файле preprocessing.py. Если буквы не распознаются совсем, то ядро слишком прямоугольное и следует использовать более «квадратную» пропорцию. Если буквы распознаются в неверном порядке, то наоборот, более прямоугольную. Первое значение всегда должно быть больше, иначе распознаваться будут сразу буквы и их порядок будет неверен.
Примерные изображения для распознавания: handwritten1.jpg и hand-written-3.jpg.
About
module for text recognition from an image