Text Detection and Recognition
Detecting and recognizing text in images is a common task performed in computer vision applications. For example, you can capture video of a road scene from a moving vehicle, recognize signposts in the captured scene, and alert the driver about the signs. The toolbox provides functions to detect and recognize text in multiple languages.
The first step in text recognition is to detect and segment the text regions in an image. To detect the text regions, use local image feature detectors and descriptors, or pretrained deep learning models trained to detect text in complex image scenes. The examples in the toolbox demonstrate how to use blob analysis, the maximally stable extremal regions (MSER) feature detector, and the character region awareness for text detection (CRAFT) deep learning model for text detection.
Blob analysis works well if the test image is a binarized image with text regions in the foreground. The method uses region statistics to effectively localize and extract text in the image foreground. Use segmentation approaches like image thresholding to binarize an image.
The MSER feature detector works well if the geometric characteristics of the text regions in the image are known in advance. Also, the text regions in the image must be high-contrast regions with uniform intensity or color values. The feature detector use geometric constraints to filter out non-text regions and detect text regions in images with both uniform and complex backgrounds.
The CRAFT model is a robust approach to detecting text regions in images regardless of factors like image background, contrast, and intensity values. Use the CRAFT model when segmenting the text regions in an image is difficult. This model requires more computational resources than other text detection approaches.
You can perform text segmentation as a preprocessing or post processing step for improving accuracy of text detection. To segment text from an image region, use image segmentation techniques such as image thresholding and clustering. For information about MATLAB® functions for image segmentation, see Image Segmentation. Alternatively, you can use the Color Thresholder and Image Segmenter apps to interactively segment the desired text regions in the image.
The next step is to recognize the text in the detected or segmented regions by
using machine learning (ML) based classification or the optical character
recognition (OCR) method. The ocr
function uses the OCR
Language Data support files from the OCR Engine page, Tesseract Open Source OCR
Engine. The support files contain pretrained language data files for
recognizing characters in multiple languages. You can download the additional
language files using either the visionSupportPackages
function
or the Add-On Explorer. For more information on downloading add-ons, see Get and Manage Add-Ons. For procedures about how to install and
use the OCR Language Data support files from Tesseract Open Source OCR
Engine, see Install OCR Language Data Files.
Apps
OCR Trainer | Train an optical character recognition model to recognize a specific set of characters |
Functions
Topics
Get Started
- Local Feature Detection and Extraction
Learn the benefits and applications of local feature detection and extraction. - Point Feature Types
Choose functions that return and accept points objects for several types of features.
Use Optical Character Recognition
- Train Optical Character Recognition for Custom Fonts
Train theocr
function to recognize a custom language or font by using the OCR app - Install OCR Language Data Files
Support files for optical character recognition (OCR) languages. - Troubleshoot ocr Function Results
Troubleshooting for Optical character recognition (OCR) ocr function