Insurance Industry, US

The Situation

In first-of-its kind data & analytics platform to address challenges faced by automobile insurance industry and auto buyers, MotorDNA, a US based Data & Analytics company, created the largest repository of vehicle specifications for all make and models in the US. The idea was to use this knowledge to build actionable insights that would transform how auto insurance industry develops the risk factors to make smart underwriting decisions; and how consumers could save money on insurance premium by choosing safest vehicle, indicated by Vehicle IQ™ Score. This required OCR implementation at a mammoth scale to read process and preserve more than 25 million documents (Vehicle Specification Sticker)


The original files were old, archived documents with distorted fonts and characters, unrecognizable color depth, huge array of varied structure in the documents, and sheer scale of data.

Solution / Result

Our team completed one of the largest OCR implementations in the industry using the Google TensorFlow (default logic). And we created an additional model called ‘Character Purification’ to further clean the data as near-to-perfect accuracy and usability. The data collected was found fit to be utilized to build insights, using Artificial Intelligence and Machine Learning techniques, and developing Vehicle Build Specification in a normalized and standardized format across all the makes and models of vehicles sold in US.

Finance Industry

The Situation

A global client in financial industry required OCR implementation to read and evaluate financial documents for audit purposes.


The challenges were found across complex medley of various formats, structure, distortion of fonts and characters, and quality of paper and print


We developed OCR software with significant investment of time and resources in R & D of volumes of documents in context of challenges at forefront. Using the framework of TensorFlow and Python, we developed and deployed the tool with accuracy and extraction up to 82%.


The client was able to use the tool to process lakhs of documents and conduct audit without facing challenges around data extraction, purification and usage.

What OCR System is

Optical character recognition technology utilizes automated data extraction, data storage, data recognition and data analysis capabilities, taking advantages of Artificial Intelligence (AI). The program is a part of intelligence business processes that saves time, cost and other resources.

An OCR program scans documents, images, originally stored as pdfs and jpegs, isolating characters in documents, re-arranging, thus enabling editing and repurposing of the original content, eliminating the need for huge volumes of manual data entry efforts.

OCR programs can deliver with great accuracy. They can be trained to extract trillions of data, unthinkable to extract and utilize through any other method. They can automate complex document-processing workflows, and are even available to public.

Leveraging AI modelling, an OCR software can be trained through advanced methods that can identify handwriting styles and various languages.

How does OCR System work


OCR systems combine hardware and software to convert physical, printed documents into machine-readable text.


OCR software then converts the document into a colored or black-and-white version.


The scanned image or bitmap carrying strings of words, numbers , images is analyzed for light and dark areas


The dark areas are identified as characters that need to be recognized. The light areas are identified as background.


The dark areas are processed to find alphabetic letters or numeric digits, targeting one character, word or block of text at a time.


Characters are then identified using algorithms such as pattern recognition, feature recognition, and more on rules the system is trained on.


When a character is identified, it is converted into an ASCII code that computer systems use to handle further manipulations.


An OCR program analyzes the structure of a document image. It divides the page into elements such as blocks of texts, tables or images.


After processing all likely matches, the program presents the recognized text, as results, that proceed to Document Management Systems, from where they can be used as reliable sources of information for varied purposes

What are the benefits

OCR technology simplifies the data-entry process by creating effortless text searches, editing and storage. OCR allows businesses and individuals to store files on their computers, laptops and other devices, ensuring constant access to all documentation.

OCR Use Case Examples

OCR is recognized as a business necessity for data-entry automation; Indexing documents for search engines, such as passports, license plates, invoices, bank statements, business cards; Optimizing big-data modeling by converting paper and scanned image documents into machine-readable, searchable pdf files.

OCR is widely used in industries such as banking, education, insurance, communication, tourism, health, legal, and retail.


OCR based cheque depositing, evaluating customer’s data, maintaining workflow efficiently, capturing sensitive information in pay slips, mortgage and loan applications


Digitizing affidavits, judgments, filings, statements, wills, among other documents


automatic check-ins by scanning passports to hotel’s website or mobile application


Redeeming redeem vouchers by scanning for serial codes using mobile phones


Digitizing report data from X-rays, patient’s history, treatments, diagnostics, tests, and overall hospital records


Automating insurance claims processing for faster transactions


Digitizing books and unstructured documents to make communication easier. E.g. Google Translate OCR

Other Success Stories