machine-translation-nar-ru-en-0002

Use Case and High-Level Description

This is a Russian-English machine translation model based on non-autoregressive Transformer topology. The model is trained on internal dataset.

Tokenization occurs using the SentencePieceBPETokenizer (see the demo code for implementation details) and the enclosed tokenizer_src and tokenizer_tgt folders.

Specification

Metric

Value

GOps

23.17

MParams

69.29

Source framework

PyTorch*

Accuracy

The quality metrics were calculated on the wmt19-ru-en dataset (“test” split in lower case).

Metric

Value

BLEU

23.1 %

Use accuracy_check [...] --model_attributes <path_to_folder_with_downloaded_model> to specify the path to additional model attributes. path_to_folder_with_downloaded_model is a path to the folder, where the current model is downloaded by Model Downloader tool.

Input

name: tokens shape: 1, 192 description: sequence of tokens (integer values) representing the tokenized sentence. The sequence structure is as follows (<s>, </s> and <pad> should be replaced by corresponding token IDs as specified by the dictionary): <s> + tokenized sentence + </s> + (<pad> tokens to pad to the maximum sequence length of 192)

Output

name: pred shape: 1, 192 description: sequence of tokens (integer values) representing the tokenized translation. The sequence structure is as follows (<s>, </s> and <pad> should be replaced by corresponding token IDs as specified by the dictionary): <s> + tokenized sentence + </s> + (<pad> tokens to pad to the maximum sequence length of 192)

Demo usage

The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: