This demo shows how to run Text Spotting models. Text Spotting models allow us to simultaneously detect and recognize text.
NOTE: Only batch size of 1 is supported.
The demo application expects a text spotting model that is split into three parts. Every model part must be in the Intermediate Representation (IR) format.
First model is Mask-RCNN like text detector with the following constraints:
im_data
for input image and im_info
for meta-information about the image (actual height, width and scale).boxes
with absolute bounding box coordinates of the input imagescores
with confidence scores for all bounding boxesclasses
with object class IDs for all bounding boxesraw_masks
with fixed-size segmentation heat maps for all classes of all bounding boxestext_features
with text features which are fed to Text Recognition Head furtherSecond model is Text Recognition Encoder that takes text_features
as input and produces encoded text
.
Third model is Text Recognition Decoder that takes encoded text
from Text Recognition Encoder ,previous symbol
and hidden state
. On the first step special Start Of Sequence (SOS)
symbol and zero hidden state
are fed to Text Recognition Decoder. The decoder produces symbols distribution
, current hidden state
each step until End Of Sequence (EOS)
symbol is generated.
Examples of valid inputs to specify with a command-line argument -i
are a path to a video file or a numeric ID of a web camera.
The demo workflow is the following:
im_data
).im_info
input blob passes resulting resolution and scale of a pre-processed image to the network to perform inference of Mask-RCNN-like text detector.--show_boxes
and --show_scores
arguments, bounding boxes and confidence scores are also shown.--no_track
argument.NOTE: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with
--reverse_input_channels
argument specified. For more information about the argument, refer to When to Reverse Input Channels section of Converting a Model Using General Conversion Parameters.
Run the application with the -h
option to see the following usage message:
Running the application with an empty list of options yields the short version of the usage message and an error message.
To run the demo, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO Model Downloader or go to https://download.01.org/opencv/.
NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (
*.xml
+*.bin
) using the Model Optimizer tool.
To run the demo, please provide paths to the model in the IR format and to an input with images:
The application uses OpenCV to display resulting text instances and current inference performance.