formula-recognition-medium-scan-0001 (composite)#

Use Case and High-Level Description#

This is an im2latex composite model that recognizes latex formulas. The model uses vocabulary file vocab.json to predict sequence of latex tokens. The model is built on the ResNeXt-50 backbone with additional attention-based text recognition head.

Vocabulary file is downloaded tp corresponding model directory, <models_dir>/models/intel/formula-recognition-medium-scan-0001/formula-recognition-medium-scan-0001-im2latex-decoder/vocab.json. Model can predict big and small letters, numbers, some greek letters, trigonometric functions (e.g. cos, sin, coth), logarithmic function, sqrt and superscript.

Example of the input data#

Example of the output#

4 7 4 W ^ { 1 } + 7 . 1 9 o ^ { 4 } - 6 - 0 . 9 6 L ^ { 1 } y

Composite model specification#

Metric

Value

im2latex_medium_photographed dataset, im2latex-match-images metric

81.5%

im2latex_medium_rendered dataset, im2latex-match-images metric

95.7%

Source framework

PyTorch*

Im2latex-match-images metric is calculated by <omz_dir>/tools/accuracy_checker/accuracy_checker/metrics/im2latex_images_match.py

Encoder model specification#

The formula-recognition-medium-scan-0001-encoder model is a ResNeXt-50 like backbone with some initialization layers for decoder

Metric

Value

GFlops

16.56

MParams

1.69

Inputs#

Image, name: imgs, shape: 1, 3, 160, 1400 in the 1, C, H, W format, where:

  • C - number of channels

  • H - image height

  • W - image width

The expected channel order is BGR.

Outputs#

  1. Name: hidden, shape: 1, 512. Initial context state of the LSTM cell.

  2. Name: context, shape: 1, 512. Initial hidden state of the LSTM cell.

  3. Name: init_0, shape: 1, 256. Initial state of the decoder.

  4. Name: row_enc_out, shape: 1, 20, 75, 512. Features from encoder that are fed to a decoder.

Decoder model specification#

The formula-recognition-medium-scan-0001-decoder model is an LSTM based decoder with attention module.

Metric

Value

GFlops

1.86

MParams

2.56

Inputs#

  1. Name: dec_st_c, shape: 1, 512. Current context state of the LSTM cell.

  2. Name: dec_st_h, shape: 1, 512. Current hidden state of the LSTM cell.

  3. Name: output_prev, shape: 1, 256. Current state of the decoder.

  4. Name: row_enc_out, shape: 1, 20, 175, 512. Encoded features.

  5. Name: tgt, shape: 1, 1. Index of the previous symbol.

Outputs#

  1. Name: dec_st_c, shape: 1, 512. Current context state of the LSTM cell.

  2. Name: dec_st_h, shape: 1, 512. Current hidden state of the LSTM cell.

  3. Name: output, shape: 1, 256. Current state of the decoder.

  4. Name: logit, shape: 1, Vocab_Size. Classification confidence scores in the [0, 1] range for every token.

Demo usage#

The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: