Train Custom OCR Model

Training an Optical Character Recognition (OCR) model to recognize custom text consists of three steps:

Prepare training data
Train an OCR model
Evaluate OCR training

Prepare Training Data

The Computer Vision Toolbox™ provides deep learning based OCR training and supports transfer learning and fine-tuning of OCR models shipped with the toolbox. Training with deep learning requires hundreds of training samples, of each character part of the character set. After collecting training images, you must label, save, and combine the data into a datastore before training an OCR Model. Use these steps to prepare the data.

Diagram showing ground truth object containing labeled images and data as a groundTruth object The ocrTrainingData takes the labeled ground truth data as inputs and returns datstores. The combine function combines the image, ROI, and text datastores into one datastore that the trainOCR function requires .

Label Training Images

You can use the Image Labeler app to interactively label image ground truth data. Ground truth for OCR must contain the location of text regions and the actual text within the regions. You can specify the location and size of the text region using a rectangle ROI label. You can specify the actual text within each rectangle ROI by adding a string Attribute to the rectangle ROI label. Use one of these methods to launch the Image Labeler:

MATLAB^® Toolstrip: On the Apps tab, under Image Processing and Computer Vision, click the Image Labeler app icon .
MATLAB command prompt: Enter imageLabeler.

The Image Labeler toolstrip provides these buttons to use for labeling OCR data:

Import — Load a collection of images.
Label — Add Rectangle bounding box labels.
Attribute — Add a string Attribute to the rectangle ROI label which defines the type of content in the bounding box.
Export — Export labels and label definitions as a ground truth object.

For more details about using the Image Labeler app, see Get Started with the Image Labeler.

Create Label Data Using Image Labeler

Load an image collection from a folder or an ImageDatastore object into the Image Labeler app.
Define a rectangle ROI and name it. For example, Text.
Define a string attribute for the label, which defines the type of text string in the ROI, and name it. For example, word.
Label the text in the collection of images, or use an automation algorithm to prelabel some of the text automatically. For more details using an automation algorithm, see Automate Ground Truth Labeling for OCR.
Export the labeled data to the workspace or save it to a file. The app exports the labels as a groundTruth object.

Load Training Data From Ground Truth

Use the ocrTrainingData function to load training data from the exported groundTruth object. The ocrTrainingData function returns three datastores for images, bounding boxes, and text. For the purposes of training, combine those datastores using the combine function.

Train an OCR model

Use the trainOCR function to train an OCR model and configure the training options using the ocrTrainingOptions function. Optionally, for faster performance, you can quantize the trained models using the quantizeOCR function, but this can decrease the accuracy of the model. This can be helpful if the OCR model will be deployed in resource constrained systems. For an example that demonstrates how to use these functions, see Train an OCR Model to Recognize Seven-Segment Digits.

Evaluate OCR training

Use the metrics generated by the evaluateOCR function to evaluate the quality of the OCR model.