OCR in Odoo

Table of Contents

OCR using Tesseract Open Source OCR Engine

Optical character recognition (also optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast).

Why Tesseract.js for OCR ?

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine.

This library supports over 60 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS.

Tesseract.js can be integrated with odoo frontend and use for conversion of images to text.

Example : (Image consisting of english text converted to text)

Example : (Image consisting of chinese text converted to chinese text)

Tesseract js accepts any Image like object, which can be of following type

An img, video, or canvas element
A CanvasRenderingContext2D (returned by canvas.getContext(‘2d’))
A File object (from a file <input> or drag-drop event)
A Blob object
A ImageData instance (an object containing width, height and data properties)
A path or URL to an accessible image (the image must either be hosted locally or accessible by CORS)

How we implemented OCR in Odoo ?

We had a pdf to parse in incoming mail, During this process we parsed that pdf, by first converting it to image, the converted image was then passed to tesseract.js which parsed contents of image to text.

However the accurateness of extracted text depends on the quality of image passed to OCR.

Below is the ocr content(Dutch Language) after processing it with custom logic.

SHARE | FOLLOW | SUBSCRIBE

Odoo Quickbooks Connector Make It Meal – Odoo

Beyonce Adams

OCR in Odoo

OCR using Tesseract Open Source OCR Engine

Why Tesseract.js for OCR ?

Tesseract js accepts any Image like object, which can be of following type

How we implemented OCR in Odoo ?

Leave a Reply Cancel reply

Subscribe to Blog via Email.

Recent Posts

Recent Comments

Categories

Related Posts

How to ensure every new contact in Odoo receives a manager’s approval

Mastering Landed Cost in Odoo 18 – A Guide for Smartphone & Gadget Importers (2025 Edition)

What Shopify doesn’t do after ‘checkout’ – and why it’s hurting your operations

How tour operators can fix the 7 most expensive mistakes in 2025