Optical Character Recognition (OCR) with OpenVINO™

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS.

Github

This tutorial demonstrates how to perform optical character recognition (OCR) with OpenVINO models. It is a continuation of the 004-hello-detection tutorial, which shows only text detection.

The horizontal-text-detection-0001 and text-recognition-resnet models are used together for text detection and then text recognition.

In this tutorial, Open Model Zoo tools including Model Downloader, Model Converter and Info Dumper are used to download and convert the models from Open Model Zoo. For more information, refer to the 104-model-tools tutorial.

Imports

import shutil
import sys
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import Markdown, display
from PIL import Image
from openvino.runtime import Core
from yaspin import yaspin

sys.path.append("../utils")
from notebook_utils import load_image

Settings

ie = Core()

model_dir = Path("model")
precision = "FP16"
detection_model = "horizontal-text-detection-0001"
recognition_model = "text-recognition-resnet-fc"
base_model_dir = Path("~/open_model_zoo_models").expanduser()
omz_cache_dir = Path("~/open_model_zoo_cache").expanduser()

model_dir.mkdir(exist_ok=True)

Download Models

The next cells will run Model Downloader to download the detection and recognition models. If the models have been downloaded before, they will not be downloaded again.

download_command = f"omz_downloader --name {detection_model},{recognition_model} --output_dir {base_model_dir} --cache_dir {omz_cache_dir} --precision {precision}"
display(Markdown(f"Download command: `{download_command}`"))
with yaspin(text=f"Downloading {detection_model}, {recognition_model}") as sp:
    download_result = !$download_command
    print(download_result)
    sp.text = f"Finished downloading {detection_model}, {recognition_model}"
    sp.ok("✔")

Download command: omz_downloader --name horizontal-text-detection-0001,text-recognition-resnet-fc --output_dir /opt/home/k8sworker/open_model_zoo_models --cache_dir /opt/home/k8sworker/open_model_zoo_cache --precision FP16

⠼ Downloading horizontal-text-detection-0001, text-recognition-resnet-fc['################|| Downloading horizontal-text-detection-0001 ||################', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/intel/horizontal-text-detection-0001/FP16/horizontal-text-detection-0001.xml from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/intel/horizontal-text-detection-0001/FP16/horizontal-text-detection-0001.bin from the cache', '', '################|| Downloading text-recognition-resnet-fc ||################', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/model.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/weight_init.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/fc_head.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/body.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/component.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/bricks.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/resnet.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/builder.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/conv_module.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/fc_module.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/norm.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/__init__.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/common.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/registry.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/configs/resnet_fc.py from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/ckpt/resnet_fc.pth from the cache', '', '========== Retrieving /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/addict-2.4.0-py3-none-any.whl from the cache', '', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/heads/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/sequences/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/component.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/decoders/bricks/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/backbones/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/bodies/feature_extractors/encoders/enhance_modules/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/models/utils/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/__init__.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Replacing text in /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/utils/config.py', '========== Unpacking /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/addict-2.4.0-py3-none-any.whl', '']
✔ Finished downloading horizontal-text-detection-0001, text-recognition-resnet-fc
### The text-recognition-resnet-fc model consists of many files. All filenames are printed in
### the output of Model Downloader. Uncomment the next two lines to show this output.

# for line in download_result:
#    print(line)

Convert Models

The downloaded detection model is an Intel model, which is already in OpenVINO Intermediate Representation (OpenVINO IR) format. The text recognition model is a public model which needs to be converted to OpenVINO IR. Since this model was downloaded from Open Model Zoo, use Model Converter to convert the model to OpenVINO IR format.

The output of Model Converter will be displayed. When the conversion is successful, the last lines of output will include [ SUCCESS ] Generated IR version 11 model.

convert_command = f"omz_converter --name {recognition_model} --precisions {precision} --download_dir {base_model_dir} --output_dir {base_model_dir}"
display(Markdown(f"Convert command: `{convert_command}`"))
display(Markdown(f"Converting {recognition_model}..."))
! $convert_command

Convert command: omz_converter --name text-recognition-resnet-fc --precisions FP16 --download_dir /opt/home/k8sworker/open_model_zoo_models --output_dir /opt/home/k8sworker/open_model_zoo_models

Converting text-recognition-resnet-fc…

========== Converting text-recognition-resnet-fc to ONNX
Conversion to ONNX command: /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-231/.workspace/scm/ov-notebook/.venv/bin/python -- /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-231/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino/model_zoo/internal_scripts/pytorch_to_onnx.py --model-path=/opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-231/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino/model_zoo/models/public/text-recognition-resnet-fc --model-path=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc --model-name=get_model --import-module=model '--model-param=file_config=r"/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/configs/resnet_fc.py"' '--model-param=weights=r"/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/vedastr/ckpt/resnet_fc.pth"' --input-shape=1,1,32,100 --input-names=input --output-names=output --output-file=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/resnet_fc.onnx

ONNX check passed successfully.

========== Converting text-recognition-resnet-fc to IR (FP16)
Conversion command: /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-231/.workspace/scm/ov-notebook/.venv/bin/python -- /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-231/.workspace/scm/ov-notebook/.venv/bin/mo --framework=onnx --data_type=FP16 --output_dir=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16 --model_name=text-recognition-resnet-fc --input=input '--mean_values=input[127.5]' '--scale_values=input[127.5]' --output=output --input_model=/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/resnet_fc.onnx '--layout=input(NCHW)' '--input_shape=[1, 1, 32, 100]'

Model Optimizer arguments:
Common parameters:
    - Path to the Input Model:  /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/resnet_fc.onnx
    - Path for generated IR:    /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16
    - IR output name:   text-recognition-resnet-fc
    - Log level:    ERROR
    - Batch:    Not specified, inherited from the model
    - Input layers:     input
    - Output layers:    output
    - Input shapes:     [1, 1, 32, 100]
    - Source layout:    Not specified
    - Target layout:    Not specified
    - Layout:   input(NCHW)
    - Mean values:  input[127.5]
    - Scale values:     input[127.5]
    - Scale factor:     Not specified
    - Precision of IR:  FP16
    - Enable fusing:    True
    - User transformations:     Not specified
    - Reverse input channels:   False
    - Enable IR generation for fixed input shape:   False
    - Use the transformations config file:  None
Advanced parameters:
    - Force the usage of legacy Frontend of Model Optimizer for model conversion into IR:   False
    - Force the usage of new Frontend of Model Optimizer for model conversion into IR:  False
OpenVINO runtime found in:  /opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-231/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino
OpenVINO runtime version:   2022.1.0-7019-cdb9bec7210-releases/2022/1
Model Optimizer version:    2022.1.0-7019-cdb9bec7210-releases/2022/1
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16/text-recognition-resnet-fc.xml
[ SUCCESS ] BIN file: /opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16/text-recognition-resnet-fc.bin
[ SUCCESS ] Total execution time: 3.45 seconds.
[ SUCCESS ] Memory consumed: 1447 MB.
It's been a while, check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html?cid=other&source=prod&campid=ww_2022_bu_IOTG_OpenVINO-2022-1&content=upg_all&medium=organic or on the GitHub*
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai

Copy Models

To make it easier to work with the models, copy the models from the Open Model Zoo tree to the model subdirectory relative to this Jupyter notebook. Get the path to the model directory of Open Model Zoo from the omz_info_dumper tool.

models_info_output = %sx omz_info_dumper --name $detection_model,$recognition_model
print(f'sx omz_info_dumper --name {detection_model},{recognition_model}')
detection_model_info, recognition_model_info = [
    {
        "name": "horizontal-text-detection-0001",
        "composite_model_name": None,
        "description": "Horizontal text detector based on FCOS with light MobileNetV2 backbone",
        "framework": "dldt",
        "license_url": "https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/master/LICENSE",
        "precisions": [
            "FP16",
            "FP16-INT8",
            "FP32"
        ],
        "quantization_output_precisions": [],
        "subdirectory": "intel/horizontal-text-detection-0001",
        "task_type": "detection"
    },
    {
        "name": "text-recognition-resnet-fc",
        "composite_model_name": None,
        "description": "\"text-recognition-resnet-fc\" is a simple and preformant scene text recognition model based on ResNet with Fully Connected text recognition head. Source implementation on a PyTorch* framework could be found here <https://github.com/Media-Smart/vedastr>. Model is able to recognize alphanumeric text.",
        "framework": "pytorch",
        "license_url": "https://raw.githubusercontent.com/Media-Smart/vedastr/0fd2a0bd7819ae4daa2a161501e9f1c2ac67e96a/LICENSE",
        "precisions": [
            "FP16",
            "FP32"
        ],
        "quantization_output_precisions": [],
        "subdirectory": "public/text-recognition-resnet-fc",
        "task_type": "optical_character_recognition"
    }
]

for model_info in (detection_model_info, recognition_model_info):
    omz_dir = Path(model_info["subdirectory"])
    omz_model_dir = base_model_dir / omz_dir / precision
    print(omz_model_dir)
    for model_file in omz_model_dir.iterdir():
        try:
            shutil.copyfile(model_file, model_dir / model_file.name)
        except FileExistsError:
            pass

detection_model_path = (model_dir / detection_model).with_suffix(".xml")
recognition_model_path = (model_dir / recognition_model).with_suffix(".xml")
sx omz_info_dumper --name horizontal-text-detection-0001,text-recognition-resnet-fc
/opt/home/k8sworker/open_model_zoo_models/intel/horizontal-text-detection-0001/FP16
/opt/home/k8sworker/open_model_zoo_models/public/text-recognition-resnet-fc/FP16

Object Detection

Load a detection model, load an image, do inference and get the detection inference result.

Load a Detection Model

detection_model = ie.read_model(
    model=detection_model_path, weights=detection_model_path.with_suffix(".bin")
)
detection_compiled_model = ie.compile_model(model=detection_model, device_name="CPU")

detection_input_layer = detection_compiled_model.input(0)

Load an Image

# The `image_file` variable can point to a URL or a local image.
image_file = "https://github.com/openvinotoolkit/openvino_notebooks/raw/main/notebooks/004-hello-detection/data/intel_rnb.jpg"

image = load_image(image_file)

# N,C,H,W = batch size, number of channels, height, width.
N, C, H, W = detection_input_layer.shape

# Resize the image to meet network expected input sizes.
resized_image = cv2.resize(image, (W, H))

# Reshape to the network input shape.
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)

plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB));
../_images/208-optical-character-recognition-with-output_15_0.png

Do Inference

Text boxes are detected in the images and returned as blobs of data in the shape of [100, 5]. Each description of detection has the [x_min, y_min, x_max, y_max, conf] format.

output_key = detection_compiled_model.output("boxes")
boxes = detection_compiled_model([input_image])[output_key]

# Remove zero only boxes.
boxes = boxes[~np.all(boxes == 0, axis=1)]

Get Detection Results

def multiply_by_ratio(ratio_x, ratio_y, box):
    return [
        max(shape * ratio_y, 10) if idx % 2 else shape * ratio_x
        for idx, shape in enumerate(box[:-1])
    ]


def run_preprocesing_on_crop(crop, net_shape):
    temp_img = cv2.resize(crop, net_shape)
    temp_img = temp_img.reshape((1,) * 2 + temp_img.shape)
    return temp_img


def convert_result_to_image(bgr_image, resized_image, boxes, threshold=0.3, conf_labels=True):
    # Define colors for boxes and descriptions.
    colors = {"red": (255, 0, 0), "green": (0, 255, 0), "white": (255, 255, 255)}

    # Fetch image shapes to calculate a ratio.
    (real_y, real_x), (resized_y, resized_x) = image.shape[:2], resized_image.shape[:2]
    ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

    # Convert the base image from BGR to RGB format.
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

    # Iterate through non-zero boxes.
    for box, annotation in boxes:
        # Pick a confidence factor from the last place in an array.
        conf = box[-1]
        if conf > threshold:
            # Convert float to int and multiply position of each box by x and y ratio.
            (x_min, y_min, x_max, y_max) = map(int, multiply_by_ratio(ratio_x, ratio_y, box))

            # Draw a box based on the position. Parameters in the `rectangle` function are: image, start_point, end_point, color, thickness.
            cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max), colors["green"], 3)

            # Add a text to an image based on the position and confidence. Parameters in the `putText` function are: image, text, bottomleft_corner_textfield, font, font_scale, color, thickness, line_type
            if conf_labels:
                # Create a background box based on annotation length.
                (text_w, text_h), _ = cv2.getTextSize(
                    f"{annotation}", cv2.FONT_HERSHEY_TRIPLEX, 0.8, 1
                )
                image_copy = rgb_image.copy()
                cv2.rectangle(
                    image_copy,
                    (x_min, y_min - text_h - 10),
                    (x_min + text_w, y_min - 10),
                    colors["white"],
                    -1,
                )
                # Add weighted image copy with white boxes under a text.
                cv2.addWeighted(image_copy, 0.4, rgb_image, 0.6, 0, rgb_image)
                cv2.putText(
                    rgb_image,
                    f"{annotation}",
                    (x_min, y_min - 10),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.8,
                    colors["red"],
                    1,
                    cv2.LINE_AA,
                )

    return rgb_image

Text Recogntion

Load the text recognition model and do inference on the detected boxes from the detection model.

Load Text Recognition Model

recognition_model = ie.read_model(
    model=recognition_model_path, weights=recognition_model_path.with_suffix(".bin")
)

recognition_compiled_model = ie.compile_model(model=recognition_model, device_name="CPU")

recognition_output_layer = recognition_compiled_model.output(0)
recognition_input_layer = recognition_compiled_model.input(0)

# Get the height and width of the input layer.
_, _, H, W = recognition_input_layer.shape

Do Inference

# Calculate scale for image resizing.
(real_y, real_x), (resized_y, resized_x) = image.shape[:2], resized_image.shape[:2]
ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

# Convert the image to grayscale for the text recognition model.
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Get a dictionary to encode output, based on the model documentation.
letters = "~0123456789abcdefghijklmnopqrstuvwxyz"

# Prepare an empty list for annotations.
annotations = list()
cropped_images = list()
# fig, ax = plt.subplots(len(boxes), 1, figsize=(5,15), sharex=True, sharey=True)
# Get annotations for each crop, based on boxes given by the detection model.
for i, crop in enumerate(boxes):
    # Get coordinates on corners of a crop.
    (x_min, y_min, x_max, y_max) = map(int, multiply_by_ratio(ratio_x, ratio_y, crop))
    image_crop = run_preprocesing_on_crop(grayscale_image[y_min:y_max, x_min:x_max], (W, H))

    # Run inference with the recognition model.
    result = recognition_compiled_model([image_crop])[recognition_output_layer]

    # Squeeze the output to remove unnecessary dimension.
    recognition_results_test = np.squeeze(result)

    # Read an annotation based on probabilities from the output layer.
    annotation = list()
    for letter in recognition_results_test:
        parsed_letter = letters[letter.argmax()]

        # Returning 0 index from `argmax` signalizes an end of a string.
        if parsed_letter == letters[0]:
            break
        annotation.append(parsed_letter)
    annotations.append("".join(annotation))
    cropped_image = Image.fromarray(image[y_min:y_max, x_min:x_max])
    cropped_images.append(cropped_image)

boxes_with_annotations = list(zip(boxes, annotations))

Show Results

Show Detected Text Boxes and OCR Results for the Image

Visualize the result by drawing boxes around recognized text and showing the OCR result from the text recognition model.

plt.figure(figsize=(12, 12))
plt.imshow(convert_result_to_image(image, resized_image, boxes_with_annotations, conf_labels=True));
../_images/208-optical-character-recognition-with-output_25_0.png

Show the OCR Result per Bounding Box

Depending on the image, the OCR result may not be readable in the image with boxes, as displayed in the cell above. Use the code below to display the extracted boxes and the OCR result per box.

for cropped_image, annotation in zip(cropped_images, annotations):
    display(cropped_image, Markdown("".join(annotation)))
../_images/208-optical-character-recognition-with-output_27_0.png

building

../_images/208-optical-character-recognition-with-output_27_2.png

noyce

../_images/208-optical-character-recognition-with-output_27_4.png

2200

../_images/208-optical-character-recognition-with-output_27_6.png

n

../_images/208-optical-character-recognition-with-output_27_8.png

center

../_images/208-optical-character-recognition-with-output_27_10.png

robert