Pragmatic – Leading provider of open source business applications OpenERP, Ruby on Rails, Node.js, Talend, jaspersoft  – Pragmatic
Beyonce Adams

OCR in Odoo

OCR using Tesseract Open Source OCR Engine

Optical character recognition (also optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast).

 

Why Tesseract.js for OCR ?

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine.

This library supports over 60 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS.

Tesseract.js can be integrated with odoo frontend and use for conversion of images to text.

Example : (Image consisting of english text converted to text)

 

Example : (Image consisting of chinese text converted to chinese text)

 

 

Tesseract js accepts any Image like object, which can be of following type

  • An img, video, or canvas element
  • A CanvasRenderingContext2D (returned by canvas.getContext(‘2d’))
  • A File object (from a file <input> or drag-drop event)
  • A Blob object
  • A ImageData instance (an object containing width, height and data properties)
  • A path or URL to an accessible image (the image must either be hosted locally or accessible by CORS)

 

How we implemented OCR in Odoo ?

We had a pdf to parse in incoming mail, During this process we parsed that pdf, by first converting it to image, the converted image was then passed to tesseract.js which parsed contents of image to text.

However the accurateness of extracted text depends on the quality of image passed to OCR.

Below is the ocr content(Dutch Language) after processing it with custom logic.

 

Leave a Reply

Subscribe to Blog via Email.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Related Posts