Most Efficient Large Language Models for AI PC#
This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.2, 30 June 2025.
The tables below list the key performance indicators for inference on built-in GPUs.
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec (2nd lat^(-1)) |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 13.2 | 3.2 | 968.7 | 312.5 |
opt-125m-gptq | INT4-MIXED | 1024 | 22.2 | 3.4 | 1084.8 | 294.1176471 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 99.1 | 23.6 | 3231.4 | 42.37288136 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1024 | 436.7 | 27.9 | 3491.7 | 35.84229391 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 154.3 | 28.5 | 3295.2 | 35.0877193 |
phi-2 | INT4-MIXED | 32 | 156 | 30.6 | 3174.3 | 32.67973856 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 66.8 | 33.3 | 4340.5 | 30.03003003 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 58.5 | 34.1 | 4370.6 | 29.3255132 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 90.7 | 34.5 | 3631.2 | 28.98550725 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 387.2 | 34.5 | 3677.8 | 28.98550725 |
stablelm-3b-4e1t | INT8-CW | 32 | 84.4 | 34.6 | 4364.6 | 28.9017341 |
phi-2 | INT4-MIXED | 1024 | 386.9 | 35.8 | 3430 | 27.93296089 |
phi-2 | INT8-CW | 32 | 95.4 | 35.9 | 4284.6 | 27.8551532 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 116.1 | 36.1 | 3425.3 | 27.70083102 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1024 | 376.2 | 36.5 | 4606.7 | 27.39726027 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 332.9 | 36.9 | 4645.4 | 27.100271 |
stablelm-3b-4e1t | INT8-CW | 1024 | 345 | 37.8 | 4629.7 | 26.45502646 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 539.5 | 38 | 4073.2 | 26.31578947 |
phi-2 | INT8-CW | 1024 | 329.1 | 38.2 | 4525.7 | 26.17801047 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 330.2 | 39.5 | 3634.9 | 25.3164557 |
chatglm3-6b | INT4-MIXED | 32 | 71.7 | 44.3 | 5139.6 | 22.57336343 |
flan-t5-xxl | INT4-MIXED | 33 | 100.2 | 44.8 | 13574.6 | 22.32142857 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 74.8 | 45.3 | 5378.8 | 22.07505519 |
chatglm3-6b | INT4-MIXED | 1024 | 720.4 | 46.1 | 4696.4 | 21.69197397 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 445.9 | 48.7 | 5674.8 | 20.5338809 |
flan-t5-xxl | INT4-MIXED | 1139 | 367.4 | 52.5 | 15343.1 | 19.04761905 |
gpt-j-6b | INT4-MIXED | 32 | 191.1 | 52.9 | 5170.3 | 18.90359168 |
gpt-j-6b | INT4-MIXED | 1024 | 774.2 | 56.6 | 6185.7 | 17.66784452 |
falcon-7b-instruct | INT4-MIXED | 32 | 110.7 | 57.5 | 5449.4 | 17.39130435 |
codegen25-7b | INT4-MIXED | 32 | 122.5 | 58 | 5387.4 | 17.24137931 |
falcon-7b-instruct | INT4-MIXED | 1024 | 789.3 | 60 | 5045.2 | 16.66666667 |
codegen25-7b | INT4-MIXED | 1024 | 878.6 | 62.4 | 5775.7 | 16.02564103 |
gemma-7b-it | INT4-MIXED | 32 | 132.3 | 68.8 | 6467.9 | 14.53488372 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 76.1 | 69.5 | 6687.4 | 14.38848921 |
chatglm3-6b | INT8-CW | 32 | 97.2 | 69.8 | 7527.1 | 14.32664756 |
llama-2-7b-gptq | INT4-MIXED | 32 | 107.8 | 70.1 | 5019.9 | 14.26533524 |
phi-2 | FP16 | 32 | 110.2 | 71.4 | 6930 | 14.00560224 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 119.6 | 71.4 | 5795.7 | 14.00560224 |
chatglm3-6b | INT8-CW | 1024 | 510.8 | 72 | 7336.2 | 13.88888889 |
gemma-7b-it | INT4-MIXED | 1024 | 920.7 | 72.2 | 6815 | 13.85041551 |
stable-zephyr-3b-dpo | FP16 | 32 | 93.5 | 73.3 | 6941.9 | 13.6425648 |
stablelm-3b-4e1t | FP16 | 32 | 127.2 | 73.5 | 6932.8 | 13.60544218 |
red-pajama-incite-chat-3b-v1 | FP16 | 1024 | 519.7 | 73.6 | 7502.4 | 13.58695652 |
flan-t5-xxl | INT8-CW | 33 | 325.1 | 73.7 | 23418.2 | 13.56852103 |
gpt-j-6b | INT8-CW | 32 | 132.8 | 74.1 | 7406.1 | 13.49527665 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 601.5 | 74.4 | 5570.8 | 13.44086022 |
phi-2 | FP16 | 1024 | 477.2 | 76.3 | 7507.7 | 13.1061599 |
chatglm3-6b-gptq | INT4-MIXED | 32 | 83.8 | 77 | 5596.8 | 12.98701299 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 192 | 77.2 | 6241.4 | 12.95336788 |
stable-zephyr-3b-dpo | FP16 | 1024 | 521.4 | 77.6 | 7520.5 | 12.88659794 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 568.4 | 77.6 | 6556 | 12.88659794 |
gpt-j-6b | INT8-CW | 1024 | 627.9 | 78 | 8648.4 | 12.82051282 |
stablelm-3b-4e1t | FP16 | 1024 | 517.9 | 78.4 | 7519.9 | 12.75510204 |
chatglm3-6b-gptq | INT4-MIXED | 1024 | 544.5 | 78.9 | 5419.5 | 12.67427123 |
codegen25-7b | INT8-CW | 32 | 110.9 | 79.7 | 8283.6 | 12.54705144 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 762.3 | 81.5 | 6633.9 | 12.26993865 |
mistral-7b-v0.1 | INT8-CW | 32 | 120.4 | 84 | 8586 | 11.9047619 |
codegen25-7b | INT8-CW | 1024 | 589.2 | 84 | 8754.4 | 11.9047619 |
flan-t5-xxl | INT8-CW | 1139 | 528.4 | 84.9 | 25288 | 11.77856302 |
falcon-7b-instruct | INT8-CW | 32 | 124 | 86.4 | 8149.2 | 11.57407407 |
mistral-7b-v0.1 | INT8-CW | 1024 | 603.5 | 86.8 | 8421.3 | 11.52073733 |
falcon-7b-instruct | INT8-CW | 1024 | 594.1 | 89 | 7890.9 | 11.23595506 |
baichuan2-13b-chat | INT4-MIXED | 32 | 178.4 | 93.9 | 9320.9 | 10.64962726 |
gemma-7b-it | INT8-CW | 32 | 149.3 | 99.7 | 9596.8 | 10.03009027 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec (2nd lat^(-1)) |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 13.6 | 3.4 | 1184.3 | 294.12 |
opt-125m-gptq | INT4-MIXED | 1024 | 17.3 | 3.8 | 1302.2 | 263.16 |
gemma-2b-it | INT4-MIXED | 32 | 31.4 | 17.7 | 3364.6 | 56.50 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 49.1 | 17.8 | 3382.2 | 56.18 |
dolly-v2-3b | INT4-MIXED | 32 | 50.8 | 18.3 | 3402.1 | 54.64 |
gemma-2b-it | INT4-MIXED | 1024 | 189.1 | 18.6 | 3294.5 | 53.76 |
phi-2 | INT4-MIXED | 32 | 57 | 19.3 | 3326.9 | 51.81 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1024 | 359.9 | 19.7 | 3717 | 50.76 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 59.7 | 20.2 | 3410.6 | 49.50 |
dolly-v2-3b | INT4-MIXED | 1024 | 383.1 | 20.3 | 3773.8 | 49.26 |
phi-2 | INT4-MIXED | 1024 | 326.1 | 21.3 | 3845.6 | 46.95 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 60.8 | 21.9 | 3566.3 | 45.66 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 330 | 22.2 | 3929.3 | 45.05 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 299.8 | 23.8 | 4032.5 | 42.02 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 44.8 | 25.5 | 3772.6 | 39.22 |
gemma-2b-it | INT8-CW | 32 | 35.5 | 28 | 3737.4 | 35.71 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 457.5 | 28.1 | 4361.9 | 35.59 |
gemma-2b-it | INT8-CW | 1024 | 208.9 | 28.9 | 3878.2 | 34.60 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 39.8 | 29.8 | 4317.7 | 33.56 |
dolly-v2-3b | INT8-CW | 32 | 46.6 | 30.5 | 4426.2 | 32.79 |
stablelm-3b-4e1t | INT8-CW | 32 | 48.9 | 30.8 | 4444.1 | 32.47 |
phi-2 | INT8-CW | 32 | 54.1 | 30.9 | 4429.9 | 32.36 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 49.4 | 30.9 | 4456.8 | 32.36 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1024 | 324 | 31.9 | 4713.6 | 31.35 |
dolly-v2-3b | INT8-CW | 1024 | 365.7 | 32.6 | 4826.6 | 30.67 |
phi-2 | INT8-CW | 1024 | 305.9 | 32.9 | 4817.9 | 30.40 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 328.1 | 32.9 | 4844.5 | 30.40 |
stablelm-3b-4e1t | INT8-CW | 1024 | 330.6 | 32.9 | 4818.1 | 30.40 |
chatglm3-6b | INT4-MIXED | 32 | 52.5 | 33.7 | 5202.4 | 29.67 |
flan-t5-xxl | INT4-MIXED | 33 | 57.7 | 34.8 | 13591.1 | 28.74 |
chatglm3-6b | INT4-MIXED | 1024 | 467.2 | 35.8 | 5187.5 | 27.93 |
codegen25-7b | INT4-MIXED | 32 | 63.8 | 37.7 | 5518 | 26.53 |
llama-2-7b-gptq | INT4-MIXED | 32 | 61.8 | 37.8 | 5352.1 | 26.46 |
gpt-j-6b | INT4-MIXED | 32 | 76.5 | 39.5 | 5071.7 | 25.32 |
flan-t5-xxl | INT4-MIXED | 1139 | 252.9 | 39.7 | 15513.3 | 25.19 |
chatglm3-6b-gptq | INT4-MIXED | 32 | 69.3 | 40.5 | 5781.1 | 24.69 |
codegen25-7b | INT4-MIXED | 1024 | 539.9 | 41.4 | 5994.8 | 24.15 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 53.3 | 42.2 | 5452.9 | 23.70 |
chatglm3-6b-gptq | INT4-MIXED | 1024 | 427.9 | 42.8 | 5642.5 | 23.36 |
gpt-j-6b | INT4-MIXED | 1024 | 565.8 | 43 | 6405.8 | 23.26 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 77.1 | 43.1 | 6371 | 23.20 |
falcon-7b-instruct | INT4-MIXED | 32 | 66.7 | 43.3 | 5357.6 | 23.09 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 438.6 | 43.6 | 6815.2 | 22.94 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 70.7 | 44.5 | 5862.7 | 22.47 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 453.4 | 44.9 | 5810.4 | 22.27 |
falcon-7b-instruct | INT4-MIXED | 1024 | 568.1 | 45.2 | 5293.6 | 22.12 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 712.2 | 46.8 | 6839.2 | 21.37 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 467 | 47 | 5757.8 | 21.28 |
zephyr-7b-beta | INT4-MIXED | 32 | 72.1 | 48.4 | 6074.7 | 20.66 |
baichuan2-7b-chat | INT4-MIXED | 32 | 73.5 | 49.7 | 6579.6 | 20.12 |
zephyr-7b-beta | INT4-MIXED | 1024 | 509.8 | 51 | 6049.1 | 19.61 |
gemma-7b-it | INT4-MIXED | 32 | 88.7 | 51.5 | 6571.3 | 19.42 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 1547.3 | 54.5 | 7293.8 | 18.35 |
gemma-7b-it | INT4-MIXED | 1024 | 714.8 | 57 | 7044.8 | 17.54 |
qwen-7b-chat | INT4-MIXED | 32 | 85.7 | 60 | 7335.9 | 16.67 |
gemma-2b-it | FP16 | 32 | 66 | 62.1 | 6218.5 | 16.10 |
gemma-2b-it | FP16 | 1024 | 250.7 | 62.8 | 6370.7 | 15.92 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 64.9 | 63.8 | 6906.6 | 15.67 |
dolly-v2-3b | FP16 | 32 | 65.6 | 64.4 | 6920.6 | 15.53 |
phi-2 | FP16 | 32 | 66.3 | 64.6 | 6938.9 | 15.48 |
qwen-7b-chat | INT4-MIXED | 1024 | 776.9 | 64.8 | 7960.4 | 15.43 |
stablelm-3b-4e1t | FP16 | 32 | 62.3 | 65.5 | 7146.8 | 15.27 |
stable-zephyr-3b-dpo | FP16 | 32 | 62.7 | 65.6 | 7157.5 | 15.24 |
chatglm3-6b | INT8-CW | 32 | 77.8 | 66 | 7573.1 | 15.15 |
red-pajama-incite-chat-3b-v1 | FP16 | 1024 | 387.5 | 67.6 | 7741.9 | 14.79 |
phi-2 | FP16 | 1024 | 364.6 | 68.3 | 7748.2 | 14.64 |
chatglm3-6b | INT8-CW | 1024 | 465.1 | 68.4 | 7521.8 | 14.62 |
dolly-v2-3b | FP16 | 1024 | 430.9 | 68.4 | 7731.2 | 14.62 |
stable-zephyr-3b-dpo | FP16 | 1024 | 348.8 | 69.3 | 7768.6 | 14.43 |
stablelm-3b-4e1t | FP16 | 1024 | 354.9 | 69.3 | 7752 | 14.43 |
flan-t5-xxl | INT8-CW | 33 | 300 | 69.4 | 23449.5 | 14.41 |
falcon-7b-instruct | INT8-CW | 32 | 88 | 73.5 | 8224.5 | 13.61 |
gpt-j-6b | INT8-CW | 32 | 86.8 | 75.1 | 7450.1 | 13.32 |
flan-t5-xxl | INT8-CW | 1139 | 368.4 | 75.4 | 25452.7 | 13.26 |
falcon-7b-instruct | INT8-CW | 1024 | 625.2 | 76.3 | 8166.2 | 13.11 |
codegen25-7b | INT8-CW | 32 | 89.2 | 76.5 | 8423.7 | 13.07 |
baichuan2-13b-chat | INT4-MIXED | 32 | 121.9 | 77.1 | 9488 | 12.97 |
gpt-j-6b | INT8-CW | 1024 | 597.9 | 78.7 | 8861.5 | 12.71 |
baichuan2-7b-chat | INT8-CW | 32 | 89.2 | 79.9 | 8620.5 | 12.52 |
mistral-7b-v0.1 | INT8-CW | 32 | 94.3 | 80.5 | 8715.1 | 12.42 |
zephyr-7b-beta | INT8-CW | 32 | 95.4 | 80.6 | 8622.5 | 12.41 |
codegen25-7b | INT8-CW | 1024 | 554.4 | 81 | 8948.7 | 12.35 |
qwen-7b-chat | INT8-CW | 32 | 100 | 82.7 | 8975.7 | 12.09 |
baichuan2-13b-chat | INT4-MIXED | 1024 | 2739.6 | 83.7 | 10552.9 | 11.95 |
mistral-7b-v0.1 | INT8-CW | 1024 | 598.2 | 83.8 | 8597.9 | 11.93 |
zephyr-7b-beta | INT8-CW | 1024 | 615.2 | 83.8 | 8512.3 | 11.93 |
baichuan2-7b-chat | INT8-CW | 1024 | 1681.7 | 84.2 | 9540.5 | 11.88 |
qwen-7b-chat | INT8-CW | 1024 | 942.8 | 86.9 | 9861.6 | 11.51 |
starcoder | INT4-MIXED | 32 | 145.4 | 88.2 | 9617.2 | 11.34 |
starcoder | INT4-MIXED | 1024 | 1499 | 91.9 | 9452.9 | 10.88 |
gemma-7b-it | INT8-CW | 32 | 113.9 | 92.5 | 9744.4 | 10.81 |
gemma-7b-it | INT8-CW | 1024 | 778.6 | 96.1 | 10562 | 10.41 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec (2nd lat^(-1)) |
---|---|---|---|---|---|---|
dolly-v2-3b | INT4-MIXED | 32 | 71.8 | 24 | 3331.5 | 41.66666667 |
phi-2 | INT4-MIXED | 32 | 69 | 24.5 | 3233.9 | 40.81632653 |
gemma-2b-it | INT4-MIXED | 32 | 64.5 | 24.7 | 3797.4 | 40.48582996 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 70.1 | 24.7 | 3260.3 | 40.48582996 |
gemma-2b-it | INT4-MIXED | 1024 | 765.3 | 25.6 | 3730.1 | 39.0625 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 82.4 | 26.6 | 3334.7 | 37.59398496 |
dolly-v2-3b | INT4-MIXED | 1024 | 1107.7 | 27 | 3750 | 37.03703704 |
phi-2 | INT4-MIXED | 1024 | 1088.2 | 27.4 | 3635.7 | 36.49635036 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1024 | 1089.9 | 27.8 | 3651.1 | 35.97122302 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 76.2 | 29.1 | 3514.4 | 34.36426117 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 1119.8 | 29.6 | 3712.9 | 33.78378378 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 1095.1 | 32.1 | 3823.9 | 31.15264798 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 97.5 | 34.9 | 3850 | 28.65329513 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 1440.6 | 38.2 | 4169.5 | 26.17801047 |
gemma-2b-it | INT8-CW | 32 | 91.4 | 40.8 | 4488.2 | 24.50980392 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 120.5 | 41.3 | 4216.6 | 24.21307506 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 91.6 | 41.5 | 4233.2 | 24.09638554 |
phi-2 | INT8-CW | 32 | 102 | 41.7 | 4250.5 | 23.98081535 |
gemma-2b-it | INT8-CW | 1024 | 814.8 | 41.8 | 4469.3 | 23.92344498 |
stablelm-3b-4e1t | INT8-CW | 32 | 93.4 | 41.8 | 4216 | 23.92344498 |
dolly-v2-3b | INT8-CW | 32 | 103.9 | 42.3 | 4232.8 | 23.64066194 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1024 | 1175.3 | 44.1 | 4622.9 | 22.67573696 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 1201.8 | 44.4 | 4650.5 | 22.52252252 |
stablelm-3b-4e1t | INT8-CW | 1024 | 1192 | 44.7 | 4621.4 | 22.37136465 |
phi-2 | INT8-CW | 1024 | 1164.9 | 44.8 | 4669 | 22.32142857 |
dolly-v2-3b | INT8-CW | 1024 | 1198.1 | 45.4 | 4669.7 | 22.02643172 |
flan-t5-xxl | INT4-MIXED | 33 | 87.6 | 47.9 | 13562.5 | 20.87682672 |
gpt-j-6b | INT4-MIXED | 32 | 137 | 49.7 | 5273.5 | 20.12072435 |
chatglm3-6b | INT4-MIXED | 32 | 141 | 50.1 | 5035.2 | 19.96007984 |
chatglm3-6b | INT4-MIXED | 1024 | 2069.7 | 51.6 | 4837.8 | 19.37984496 |
flan-t5-xxl | INT4-MIXED | 1139 | 464.4 | 53.5 | 15673 | 18.69158879 |
gpt-j-6b | INT4-MIXED | 1024 | 2300.8 | 53.7 | 6412.9 | 18.62197393 |
codegen25-7b | INT4-MIXED | 32 | 160.8 | 54.4 | 5357.6 | 18.38235294 |
llama-2-7b-gptq | INT4-MIXED | 32 | 158.3 | 54.8 | 5420.1 | 18.24817518 |
chatglm3-6b-gptq | INT4-MIXED | 32 | 160.3 | 55.2 | 5814.9 | 18.11594203 |
falcon-7b-instruct | INT4-MIXED | 32 | 182.6 | 55.8 | 5630 | 17.92114695 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 114.1 | 56.8 | 5372.2 | 17.6056338 |
chatglm3-6b-gptq | INT4-MIXED | 1024 | 2007.3 | 57.3 | 5519.6 | 17.45200698 |
falcon-7b-instruct | INT4-MIXED | 1024 | 2569.8 | 57.3 | 5567.5 | 17.45200698 |
codegen25-7b | INT4-MIXED | 1024 | 2422.8 | 58.7 | 5911.4 | 17.03577513 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 1582.7 | 60.1 | 5715.3 | 16.63893511 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 169.8 | 61 | 6144.3 | 16.39344262 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 2388.8 | 62.8 | 6655 | 15.92356688 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 178.6 | 62.8 | 5675.7 | 15.92356688 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 2463.5 | 65.1 | 6735.6 | 15.3609831 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 2563.3 | 65.2 | 5739.7 | 15.33742331 |
zephyr-7b-beta | INT4-MIXED | 32 | 178.3 | 66.9 | 5928.6 | 14.94768311 |
baichuan2-7b-chat | INT4-MIXED | 32 | 156.7 | 68 | 6595.2 | 14.70588235 |
zephyr-7b-beta | INT4-MIXED | 1024 | 2553 | 69.2 | 6112.7 | 14.45086705 |
gemma-7b-it | INT4-MIXED | 32 | 219.2 | 69.8 | 7239.4 | 14.32664756 |
gemma-2b-it | FP16 | 32 | 94.8 | 70.1 | 7443.3 | 14.26533524 |
gemma-2b-it | FP16 | 1024 | 1399.2 | 70.9 | 7254.5 | 14.10437236 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 2979.5 | 72.3 | 7276.7 | 13.83125864 |
gemma-7b-it | INT4-MIXED | 1024 | 3057.5 | 73.8 | 7809.5 | 13.5501355 |
phi-2 | FP16 | 32 | 104.6 | 74 | 6848.1 | 13.51351351 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 109.9 | 74.3 | 6824.9 | 13.4589502 |
dolly-v2-3b | FP16 | 32 | 104.8 | 74.4 | 6831.8 | 13.44086022 |
stable-zephyr-3b-dpo | FP16 | 32 | 100.9 | 74.8 | 6876.2 | 13.36898396 |
stablelm-3b-4e1t | FP16 | 32 | 103.1 | 74.9 | 6962.5 | 13.35113485 |
phi-2 | FP16 | 1024 | 1467.3 | 78.6 | 7643.7 | 12.72264631 |
dolly-v2-3b | FP16 | 1024 | 1508.2 | 79 | 7644.4 | 12.65822785 |
qwen-7b-chat | INT4-MIXED | 32 | 158.1 | 79 | 7303.8 | 12.65822785 |
red-pajama-incite-chat-3b-v1 | FP16 | 1024 | 1488.9 | 79 | 7617.8 | 12.65822785 |
stable-zephyr-3b-dpo | FP16 | 1024 | 1460.4 | 79.5 | 7673.9 | 12.57861635 |
stablelm-3b-4e1t | FP16 | 1024 | 1449 | 79.6 | 7643.6 | 12.56281407 |
qwen-7b-chat | INT4-MIXED | 1024 | 2607.4 | 83.3 | 8072.1 | 12.00480192 |
gpt-j-6b | INT8-CW | 32 | 152 | 84 | 7357.5 | 11.9047619 |
chatglm3-6b | INT8-CW | 32 | 154.9 | 85.7 | 7465.7 | 11.66861144 |
chatglm3-6b | INT8-CW | 1024 | 2664.8 | 87.1 | 7432 | 11.48105626 |
gpt-j-6b | INT8-CW | 1024 | 2376.3 | 87.9 | 8775.2 | 11.37656428 |
falcon-7b-instruct | INT8-CW | 32 | 186.8 | 94.7 | 8518 | 10.55966209 |
flan-t5-xxl | INT8-CW | 33 | 174.8 | 94.7 | 19917.6 | 10.55966209 |
falcon-7b-instruct | INT8-CW | 1024 | 2775.5 | 96.1 | 8317.8 | 10.40582726 |
codegen25-7b | INT8-CW | 32 | 161.8 | 96.2 | 8295.9 | 10.3950104 |
All models listed here were tested with the following parameters:
Framework: PyTorch
Beam: 1
Batch size: 1