text-recognition-0014#
Use Case and High-Level Description#
This is a network for text recognition scenario. It consists of ResNext101-like backbone (stage-1-2) and bidirectional LSTM encoder-decoder. The network is able to recognize case-insensitive alphanumeric text (36 unique symbols).
Example#
-> openvino
Specification#
Metric |
Value |
---|---|
Accuracy on the alphanumeric subset of ICDAR13 |
0.8887 |
Accuracy on the alphanumeric subset of ICDAR03 |
0.9077 |
Accuracy on the alphanumeric subset of ICDAR15 |
0.6908 |
Accuracy on the alphanumeric subset of SVT |
0.83 |
Accuracy on the alphanumeric subset of IIIT5K |
0.8157 |
Text location requirements |
Tight aligned crop |
GFlops |
0.2726 |
MParams |
1.4187 |
Source framework |
PyTorch* |
Inputs#
Image, name: imgs
, shape: 1, 1, 32, 128
in the format B, C, H, W
, where:
B
- batch sizeC
- number of channelsH
- image heightW
- image width
Note that the source image should be tight aligned crop with detected text converted to grayscale.
Outputs#
The net output is a blob with name logits
and the shape 16, 1, 37
in the format W, B, L
, where:
W
- output sequence lengthB
- batch sizeL
- confidence distribution across alphanumeric symbols:#0123456789abcdefghijklmnopqrstuvwxyz
, where # - special blank character for CTC decoding algorithm.
The network output can be decoded by CTC Greedy Decoder or CTC Beam Search decoder.
Use text-detection demo#
Model is supported by text-detection c++ demo. In order to use this model in the demo, user should pass the following options:
-m_tr_ss "0123456789abcdefghijklmnopqrstuvwxyz", note special symbol `#` should not be used.
-tr_pt_first
-tr_o_blb_nm "logits"
For more information, please, see documentation of the demo.
Demo usage#
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
Legal Information#
[*] Other names and brands may be claimed as the property of others.