gpt-2¶
Use Case and High-Level Description¶
The gpt-2
model is a one of Generative Pre-trained Transformer (GPT) model family, pre-trained on a very large corpus of English data in a self-supervised fashion. The GPT architecture implements a deep neural network, specifically a transformer model, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.
More details provided in the paper, repository and model card.
Specification¶
Metric |
Value |
---|---|
Type |
Text Prediction |
GFlops |
293.0489 |
MParams |
175.6203 |
Source framework |
PyTorch* |
GFlops calculated for 1, 1024
input shape, that is suitable for long context
Accuracy¶
Perplexity obtained on WikiText-2 raw character level data dataset for converted model.
Metric |
Value |
---|---|
Perplexity |
29.00% |
Input¶
Original model¶
Token ids, name: input
, dynamic shape in the format B, L
, where:
B
- batch sizeL
- sequence length
Converted model¶
Token ids, name: input
, dynamic shape in the format B, L
, where:
B
- batch sizeL
- sequence length
Output¶
Original model¶
Prediction scores of language modeling head, name: output
, dynamic shape B, L, 50257
in the format B, L, S
, where:
B
- batch sizeL
- sequence lengthS
- vocab size
Converted model¶
Prediction scores of language modeling head, name: output
, dynamic shape B, L, 50257
in the format B, L, S
, where:
B
- batch sizeL
- sequence lengthS
- vocab size
Download a Model and Convert it into OpenVINO™ IR Format¶
You can download models and if necessary convert them into OpenVINO™ IR format using the Model Downloader and other automation tools as shown in the examples below.
An example of using the Model Downloader:
omz_downloader --name <model_name>
An example of using the Model Converter:
omz_converter --name <model_name>
Demo usage¶
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
Legal Information¶
The original model is distributed under the mit License.