Optical Character Recognition (OCR) with OpenVINO

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS.

Github

This tutorial demonstrates how to perform optical character recognition (OCR) with OpenVINO models. It is a continuation of the 004-hello-detection tutorial, which shows only text detection.

The horizontal-text-detection-0001 and text-recognition-resnet models are used together for text detection and then text recognition.

In this tutorial, Open Model Zoo tools including Model Downloader, Model Converter and Info Dumper are used to download and convert the models from the Open Model Zoo. See the 104-model-tools tutorial for more information about these tools.

Imports

import shutil
import sys
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import Markdown, display
from PIL import Image
from openvino.runtime import Core
from yaspin import yaspin

sys.path.append("../utils")
from notebook_utils import load_image

Settings

ie = Core()

model_dir = Path("model")
precision = "FP16"
detection_model = "horizontal-text-detection-0001"
recognition_model = "text-recognition-resnet-fc"
base_model_dir = Path("~/open_model_zoo_models").expanduser()
omz_cache_dir = Path("~/open_model_zoo_cache").expanduser()

model_dir.mkdir(exist_ok=True)

Download Models

The next cells will run Open Model Zoo’s Model Downloader to download the detection and recognition models. If the models have been downloaded before, they will not be downloaded again.

download_command = f"omz_downloader --name {detection_model},{recognition_model} --output_dir {base_model_dir} --cache_dir {omz_cache_dir} --precision {precision}"
display(Markdown(f"Download command: `{download_command}`"))
with yaspin(text=f"Downloading {detection_model}, {recognition_model}") as sp:
    download_result = !$download_command
    print(download_result)
    sp.text = f"Finished downloading {detection_model}, {recognition_model}"
    sp.ok("✔")

Download command: omz_downloader --name horizontal-text-detection-0001,text-recognition-resnet-fc --output_dir /opt/home/k8sworker/open_model_zoo_models --cache_dir /opt/home/k8sworker/open_model_zoo_cache --precision FP16

⠸ Downloading horizontal-text-detection-0001, text-recognition-resnet-fc['################|| Downloading horizontal-text-detection-0001 ||################', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/intel/horizontal-text-detection-0001/FP16/horizontal-text-detection-0001.xml from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/intel/horizontal-text-detection-0001/FP16/horizontal-text-detection-0001.bin from the cache', '', '################|| Downloading text-recognition-resnet-fc ||################', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/model.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/weight_init.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/fc_head.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/body.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/component.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/bricks.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/resnet.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/conv_module.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/fc_module.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/norm.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/common.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/configs/resnet_fc.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/ckpt/resnet_fc.pth from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/addict-2.4.0-py3-none-any.whl from the cache', '', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/component.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Unpacking /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/addict-2.4.0-py3-none-any.whl', '']
✔ Finished downloading horizontal-text-detection-0001, text-recognition-resnet-fc
### The text-recognition-resnet-fc model consists of many files. All filenames are printed in
### Model Downloader's output. Uncomment the next two lines to show this output

# for line in download_result:
#    print(line)

Convert Models

The downloaded detection model is an Intel model, which is already in OpenVINO’s Intermediate Representation (IR) format. The text recognition model is a public model which needs to be converted to IR. Since this model was downloaded from Open Model Zoo we can use Model Converter to convert the model to IR format.

Model Converter output will be displayed. Conversion was succesful if the last lines of output include [ SUCCESS ] Generated IR version 11 model.

convert_command = f"omz_converter --name {recognition_model} --precisions {precision} --download_dir {base_model_dir} --output_dir {base_model_dir}"
display(Markdown(f"Convert command: `{convert_command}`"))
display(Markdown(f"Converting {recognition_model}..."))
! $convert_command

Convert command: omz_converter --name text-recognition-resnet-fc --precisions FP16 --download_dir /opt/home/k8sworker/open_model_zoo_models --output_dir /opt/home/k8sworker/open_model_zoo_models

Converting text-recognition-resnet-fc…

========== Converting text-recognition-resnet-fc to ONNX
Conversion to ONNX command: /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-188/.workspace/scm/ov-notebook/.venv/bin/python -- /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-188/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino/model_zoo/internal_scripts/pytorch_to_onnx.py --model-path=/opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-188/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino/model_zoo/models/public/text-recognition-resnet-fc --model-path=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc --model-name=get_model --import-module=model '--model-param=file_config=r"/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/configs/resnet_fc.py"' '--model-param=weights=r"/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/ckpt/resnet_fc.pth"' --input-shape=1,1,32,100 --input-names=input --output-names=output --output-file=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/resnet_fc.onnx

ONNX check passed successfully.

========== Converting text-recognition-resnet-fc to IR (FP16)
Conversion command: /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-188/.workspace/scm/ov-notebook/.venv/bin/python -- /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-188/.workspace/scm/ov-notebook/.venv/bin/mo --framework=onnx --data_type=FP16 --output_dir=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16 --model_name=text-recognition-resnet-fc --input=input '--mean_values=input[127.5]' '--scale_values=input[127.5]' --output=output --input_model=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/resnet_fc.onnx '--layout=input(NCHW)' '--input_shape=[1, 1, 32, 100]'

Model Optimizer arguments:
Common parameters:
    - Path to the Input Model:  /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/resnet_fc.onnx
    - Path for generated IR:    /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16
    - IR output name:   text-recognition-resnet-fc
    - Log level:    ERROR
    - Batch:    Not specified, inherited from the model
    - Input layers:     input
    - Output layers:    output
    - Input shapes:     [1, 1, 32, 100]
    - Source layout:    Not specified
    - Target layout:    Not specified
    - Layout:   input(NCHW)
    - Mean values:  input[127.5]
    - Scale values:     input[127.5]
    - Scale factor:     Not specified
    - Precision of IR:  FP16
    - Enable fusing:    True
    - User transformations:     Not specified
    - Reverse input channels:   False
    - Enable IR generation for fixed input shape:   False
    - Use the transformations config file:  None
Advanced parameters:
    - Force the usage of legacy Frontend of Model Optimizer for model conversion into IR:   False
    - Force the usage of new Frontend of Model Optimizer for model conversion into IR:  False
OpenVINO runtime found in:  /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-188/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino
OpenVINO runtime version:   2022.1.0-7019-cdb9bec7210-releases/2022/1
Model Optimizer version:    2022.1.0-7019-cdb9bec7210-releases/2022/1
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16/text-recognition-resnet-fc.xml
[ SUCCESS ] BIN file: /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16/text-recognition-resnet-fc.bin
[ SUCCESS ] Total execution time: 3.69 seconds.
[ SUCCESS ] Memory consumed: 1443 MB.
It's been a while, check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html?cid=other&source=prod&campid=ww_2022_bu_IOTG_OpenVINO-2022-1&content=upg_all&medium=organic or on the GitHub*
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai

Copy Models

To make it easier to work with the models, we copy the models from the Open Model Zoo tree to the model subdirectory relative to this Jupyter notebook. We get the path to the Open Model Zoo model directory from Open Model Zoo’s omz_info_dumper tool.

models_info_output = %sx omz_info_dumper --name $detection_model,$recognition_model
print(f'sx omz_info_dumper --name {detection_model},{recognition_model}')
detection_model_info, recognition_model_info = [
    {
        "name": "horizontal-text-detection-0001",
        "composite_model_name": None,
        "description": "Horizontal text detector based on FCOS with light MobileNetV2 backbone",
        "framework": "dldt",
        "license_url": "https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/master/LICENSE",
        "precisions": [
            "FP16",
            "FP16-INT8",
            "FP32"
        ],
        "quantization_output_precisions": [],
        "subdirectory": "intel/horizontal-text-detection-0001",
        "task_type": "detection"
    },
    {
        "name": "text-recognition-resnet-fc",
        "composite_model_name": None,
        "description": "\"text-recognition-resnet-fc\" is a simple and preformant scene text recognition model based on ResNet with Fully Connected text recognition head. Source implementation on a PyTorch* framework could be found here <https://github.com/Media-Smart/vedastr>. Model is able to recognize alphanumeric text.",
        "framework": "pytorch",
        "license_url": "https://raw.githubusercontent.com/Media-Smart/vedastr/0fd2a0bd7819ae4daa2a161501e9f1c2ac67e96a/LICENSE",
        "precisions": [
            "FP16",
            "FP32"
        ],
        "quantization_output_precisions": [],
        "subdirectory": "public/text-recognition-resnet-fc",
        "task_type": "optical_character_recognition"
    }
]

for model_info in (detection_model_info, recognition_model_info):
    omz_dir = Path(model_info["subdirectory"])
    omz_model_dir = base_model_dir / omz_dir / precision
    print(omz_model_dir)
    for model_file in omz_model_dir.iterdir():
        try:
            shutil.copyfile(model_file, model_dir / model_file.name)
        except FileExistsError:
            pass

detection_model_path = (model_dir / detection_model).with_suffix(".xml")
recognition_model_path = (model_dir / recognition_model).with_suffix(".xml")
sx omz_info_dumper --name horizontal-text-detection-0001,text-recognition-resnet-fc
/opt/home/k8sworker/open_model_zoo_models/intel/horizontal-text-detection-0001/FP16
/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16

Object Detection

Load the detection model, load an image, do inference and get the detection inference result.

Load Detection Model

detection_model = ie.read_model(
    model=detection_model_path, weights=detection_model_path.with_suffix(".bin")
)
detection_compiled_model = ie.compile_model(model=detection_model, device_name="CPU")

detection_input_layer = detection_compiled_model.input(0)

Load an Image

# image_file can point to a URL or local image
image_file = "https://github.com/openvinotoolkit/openvino_notebooks/raw/main/notebooks/004-hello-detection/data/intel_rnb.jpg"

image = load_image(image_file)

# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = detection_input_layer.shape

# Resize image to meet network expected input sizes
resized_image = cv2.resize(image, (W, H))

# Reshape to network input shape
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)

plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB));
../_images/208-optical-character-recognition-with-output_15_0.png

Do Inference

Text boxes are detected in the images and returned as blobs of data in the shape of [100, 5]. Each detection description has the format [x_min, y_min, x_max, y_max, conf].

output_key = detection_compiled_model.output("boxes")
boxes = detection_compiled_model([input_image])[output_key]

# Remove zero only boxes
boxes = boxes[~np.all(boxes == 0, axis=1)]

Get Detection Results

def multiply_by_ratio(ratio_x, ratio_y, box):
    return [
        max(shape * ratio_y, 10) if idx % 2 else shape * ratio_x
        for idx, shape in enumerate(box[:-1])
    ]


def run_preprocesing_on_crop(crop, net_shape):
    temp_img = cv2.resize(crop, net_shape)
    temp_img = temp_img.reshape((1,) * 2 + temp_img.shape)
    return temp_img


def convert_result_to_image(bgr_image, resized_image, boxes, threshold=0.3, conf_labels=True):
    # Define colors for boxes and descriptions
    colors = {"red": (255, 0, 0), "green": (0, 255, 0), "white": (255, 255, 255)}

    # Fetch image shapes to calculate ratio
    (real_y, real_x), (resized_y, resized_x) = image.shape[:2], resized_image.shape[:2]
    ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

    # Convert base image from bgr to rgb format
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

    # Iterate through non-zero boxes
    for box, annotation in boxes:
        # Pick confidence factor from last place in array
        conf = box[-1]
        if conf > threshold:
            # Convert float to int and multiply position of each box by x and y ratio
            (x_min, y_min, x_max, y_max) = map(int, multiply_by_ratio(ratio_x, ratio_y, box))

            # Draw box based on position, parameters in rectangle function are: image, start_point, end_point, color, thickness
            cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max), colors["green"], 3)

            # Add text to image based on position and confidence, parameters in putText function are: image, text, bottomleft_corner_textfield, font, font_scale, color, thickness, line_type
            if conf_labels:
                # Create background box based on annotation length
                (text_w, text_h), _ = cv2.getTextSize(
                    f"{annotation}", cv2.FONT_HERSHEY_TRIPLEX, 0.8, 1
                )
                image_copy = rgb_image.copy()
                cv2.rectangle(
                    image_copy,
                    (x_min, y_min - text_h - 10),
                    (x_min + text_w, y_min - 10),
                    colors["white"],
                    -1,
                )
                # Add weighted image copy with white boxes under text
                cv2.addWeighted(image_copy, 0.4, rgb_image, 0.6, 0, rgb_image)
                cv2.putText(
                    rgb_image,
                    f"{annotation}",
                    (x_min, y_min - 10),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.8,
                    colors["red"],
                    1,
                    cv2.LINE_AA,
                )

    return rgb_image

Text Recogntion

Load the text recognition model and do inference on the detected boxes from the detection model.

Load Text Recognition Model

recognition_model = ie.read_model(
    model=recognition_model_path, weights=recognition_model_path.with_suffix(".bin")
)

recognition_compiled_model = ie.compile_model(model=recognition_model, device_name="CPU")

recognition_output_layer = recognition_compiled_model.output(0)
recognition_input_layer = recognition_compiled_model.input(0)

# Get height and width of input layer
_, _, H, W = recognition_input_layer.shape

Do Inference

# Calculate scale for image resizing
(real_y, real_x), (resized_y, resized_x) = image.shape[:2], resized_image.shape[:2]
ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

# Convert image to grayscale for text recognition model
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Get dictionary to encode output, based on model documentation
letters = "~0123456789abcdefghijklmnopqrstuvwxyz"

# Prepare empty list for annotations
annotations = list()
cropped_images = list()
# fig, ax = plt.subplots(len(boxes), 1, figsize=(5,15), sharex=True, sharey=True)
# For each crop, based on boxes given by detection model we want to get annotations
for i, crop in enumerate(boxes):
    # Get coordinates on corners of crop
    (x_min, y_min, x_max, y_max) = map(int, multiply_by_ratio(ratio_x, ratio_y, crop))
    image_crop = run_preprocesing_on_crop(grayscale_image[y_min:y_max, x_min:x_max], (W, H))

    # Run inference with recognition model
    result = recognition_compiled_model([image_crop])[recognition_output_layer]

    # Squeeze output to remove unnececery dimension
    recognition_results_test = np.squeeze(result)

    # Read annotation based on probabilities from output layer
    annotation = list()
    for letter in recognition_results_test:
        parsed_letter = letters[letter.argmax()]

        # Returning 0 index from argmax signalises end of string
        if parsed_letter == letters[0]:
            break
        annotation.append(parsed_letter)
    annotations.append("".join(annotation))
    cropped_image = Image.fromarray(image[y_min:y_max, x_min:x_max])
    cropped_images.append(cropped_image)

boxes_with_annotations = list(zip(boxes, annotations))

Show Results

Show Detected Text Boxes and OCR Results for the Image

Visualize the result by drawing boxes around recognized text and showing the OCR result from the text recognition model

plt.figure(figsize=(12, 12))
plt.imshow(convert_result_to_image(image, resized_image, boxes_with_annotations, conf_labels=True));
../_images/208-optical-character-recognition-with-output_25_0.png

Show the OCR Result per Bounding Box

Depending on the image, the OCR result may not be readable in the image with boxes as displayed in the cell above. In the next cell, we show the extracted boxes, and the OCR result per box.

for cropped_image, annotation in zip(cropped_images, annotations):
    display(cropped_image, Markdown("".join(annotation)))
../_images/208-optical-character-recognition-with-output_27_0.png

building

../_images/208-optical-character-recognition-with-output_27_2.png

noyce

../_images/208-optical-character-recognition-with-output_27_4.png

2200

../_images/208-optical-character-recognition-with-output_27_6.png

n

../_images/208-optical-character-recognition-with-output_27_8.png

center

../_images/208-optical-character-recognition-with-output_27_10.png

robert