machine-translation-nar-en-ru-0002¶
Use Case and High-Level Description¶
This is an English-Russian machine translation model based on non-autoregressive Transformer topology. The model is trained on internal dataset.
Tokenization occurs using the SentencePieceBPETokenizer (see the demo code for implementation details) and is enclosed in tokenizer_src and tokenizer_tgt folders.
Specification¶
Metric |
Value |
---|---|
GOps |
23.17 |
MParams |
69.29 |
Source framework |
PyTorch* |
Accuracy¶
The quality metrics were calculated on the wmt19-ru-en dataset (“test” split in lower case).
Metric |
Value |
---|---|
BLEU |
22.7 % |
Use accuracy_check [...] --model_attributes <path_to_folder_with_downloaded_model>
to specify the path to additional model attributes. path_to_folder_with_downloaded_model
is a path to the folder, where the current model is downloaded by Model Downloader tool.
Input¶
name: tokens
shape: 1, 192
description: sequence of tokens (integer values) representing the tokenized sentence.
The sequence structure is as follows (<s>
, </s>
and <pad>
should be replaced by corresponding token IDs as specified by the dictionary):
<s>
+ tokenized sentence + </s>
+ (<pad>
tokens to pad to the maximum sequence length of 192)
Output¶
name: pred
shape: 1, 192
description: sequence of tokens (integer values) representing the tokenized translation.
The sequence structure is as follows (<s>
, </s>
and <pad>
should be replaced by corresponding token IDs as specified by the dictionary):
<s>
+ tokenized sentence + </s>
+ (<pad>
tokens to pad to the maximum sequence length of 192)
Demo usage¶
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
Legal Information¶
[*] Other names and brands may be claimed as the property of others.