Most Efficient Large Language Models for AI PC#
This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2026.0, 26 February 2026.
The tables below list the key performance indicators for inference on built-in GPUs.
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
minicpm4-0.5b | INT4-MIXED | 32 | 16.5 | 4.3 | 1012.4 | 232.56 | ||||||
minicpm4-0.5b | INT4-MIXED | 32 | 17 | 4.4 | 1143.7 | 227.27 | ||||||
minicpm4-0.5b | INT4-MIXED | 32 | 16.6 | 4.4 | 1025.8 | 227.27 | ||||||
minicpm4-0.5b | INT4-MIXED | 1024 | 39.6 | 4.5 | 1065.7 | 222.22 | ||||||
minicpm4-0.5b | INT4-MIXED | 1024 | 56.5 | 4.6 | 1201.1 | 217.39 | ||||||
gemma-3-270m | INT4-MIXED | 1024 | 28.6 | 4.8 | 1116.8 | 208.33 | ||||||
minicpm4-0.5b | INT4-MIXED | 1024 | 29 | 4.8 | 1059.7 | 208.33 | ||||||
gemma-3-270m | INT4-MIXED | 32 | 17.8 | 4.8 | 1072.3 | 208.33 | ||||||
gemma-3-270m | INT8-CW | 1024 | 27.2 | 4.8 | 1148.3 | 208.33 | ||||||
gemma-3-270m | INT8-CW | 32 | 16.9 | 4.8 | 1091.9 | 208.33 | ||||||
distil-large-v2 | INT4-MIXED | prompt0 | 122.3 | 4.9 | 1560.7 | 204.08 | ||||||
distil-large-v2 | INT4-MIXED | prompt1 | 178.4 | 5 | 1592.4 | 200.00 | ||||||
distil-large-v2 | INT8-CW | prompt1 | 175 | 5.1 | 1847 | 196.08 | ||||||
distil-large-v2 | INT8-CW | prompt0 | 118.2 | 5.2 | 1814.4 | 192.31 | ||||||
minicpm4-0.5b | INT8-CW | 32 | 18.2 | 5.7 | 1276.6 | 175.44 | ||||||
minicpm4-0.5b | INT8-CW | 1024 | 60.6 | 6.2 | 1321.6 | 161.29 | ||||||
gemma-3-270m | FP16 | 32 | 15.5 | 6.9 | 1345.6 | 144.93 | ||||||
distil-large-v2 | FP16 | prompt1 | 194.1 | 7.1 | 2574.1 | 140.85 | ||||||
distil-large-v2 | FP16 | prompt0 | 133.5 | 7.2 | 2543 | 138.89 | ||||||
gemma-3-270m | FP16 | 1024 | 39.8 | 7.5 | 1433.4 | 133.33 | ||||||
llama-3.2-1b-instruct | INT4-MIXED | 32 | 13.5 | 8.1 | 1501.3 | 123.46 | ||||||
llama-3.2-1b-instruct | INT4-MIXED | 32 | 13.9 | 8.2 | 1522.4 | 121.95 | ||||||
gemma-3-1b-it | INT4-MIXED | 32 | 22 | 8.4 | 1474.7 | 119.05 | ||||||
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 60.1 | 8.5 | 1599.7 | 117.65 | ||||||
gemma-3-1b-it | INT4-MIXED | 1024 | 64 | 8.7 | 1567.9 | 114.94 | ||||||
gemma-3-1b-it | INT4-MIXED | 32 | 23.1 | 8.7 | 1536.3 | 114.94 | ||||||
minicpm4-0.5b | FP4-NORMALIZED | 32 | 14.5 | 9 | 1615.2 | 111.11 | ||||||
gemma-3-1b-it | INT4-MIXED | 1024 | 73.2 | 9 | 1625.1 | 111.11 | ||||||
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 82.2 | 9.1 | 1626.2 | 109.89 | ||||||
minicpm4-0.5b | FP4-NORMALIZED | 1024 | 60.9 | 9.3 | 1651.2 | 107.53 | ||||||
minicpm4-0.5b | FP16 | 32 | 15.2 | 9.7 | 1573.9 | 103.09 | ||||||
minicpm4-0.5b | FP16 | 1024 | 61.9 | 10.1 | 1618.8 | 99.01 | ||||||
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 22.5 | 11 | 1790.1 | 90.91 | ||||||
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 101.4 | 11.6 | 1952.5 | 86.21 | ||||||
gemma-3-1b-it | INT8-CW | 32 | 25.5 | 11.6 | 1840.8 | 86.21 | ||||||
gemma-3-1b-it | INT8-CW | 1024 | 96.9 | 11.9 | 1915.2 | 84.03 | ||||||
llama-3.2-1b-instruct | INT8-CW | 32 | 15.1 | 12.8 | 2065.2 | 78.13 | ||||||
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 23.1 | 13.3 | 2345.4 | 75.19 | ||||||
llama-3.2-1b-instruct | INT8-CW | 1024 | 85.3 | 13.4 | 2168.9 | 74.63 | ||||||
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 116 | 14 | 2474.2 | 71.43 | ||||||
llama-3.2-3b-instruct | INT4-MIXED | 32 | 22.5 | 18 | 2646.4 | 55.56 | ||||||
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 28.7 | 18.2 | 2534 | 54.95 | ||||||
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 28.7 | 18.4 | 2538.7 | 54.35 | ||||||
llama-3.2-3b-instruct | INT4-MIXED | 32 | 24.1 | 18.5 | 2628.6 | 54.05 | ||||||
llama-3.2-3b-instruct | INT4-MIXED | 32 | 23.7 | 18.8 | 2685 | 53.19 | ||||||
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 127.1 | 19.1 | 2662.8 | 52.36 | ||||||
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 30.7 | 19.1 | 2806.4 | 52.36 | ||||||
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 175.4 | 19.2 | 2878 | 52.08 | ||||||
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 25.4 | 19.4 | 2876 | 51.55 | ||||||
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 24.7 | 19.4 | 2787.3 | 51.55 | ||||||
phi-3.5-mini-instruct | INT4-MIXED | 32 | 24.7 | 19.4 | 2790 | 51.55 | ||||||
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 150.4 | 19.6 | 2830.9 | 51.02 | ||||||
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 130 | 19.6 | 2664.2 | 51.02 | ||||||
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 26.1 | 19.9 | 2862.8 | 50.25 | ||||||
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 283.3 | 20 | 2924.2 | 50.00 | ||||||
phi-3.5-mini-instruct | INT4-MIXED | 32 | 26.6 | 20 | 2878.3 | 50.00 | ||||||
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 206.4 | 20.2 | 2895.9 | 49.50 | ||||||
gemma-3-1b-it | FP16 | 32 | 24.4 | 20.4 | 2743.9 | 49.02 | ||||||
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 27.8 | 20.4 | 3029.5 | 49.02 | ||||||
gemma-3-1b-it | FP16 | 1024 | 124.8 | 20.8 | 2881.1 | 48.08 | ||||||
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 31.5 | 21.3 | 3091.5 | 46.95 | ||||||
phi-3.5-mini-instruct | INT4-MIXED | 32 | 32.2 | 21.4 | 3095.2 | 46.73 | ||||||
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 235.7 | 21.7 | 3511.9 | 46.08 | ||||||
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 237.9 | 21.8 | 3418.8 | 45.87 | ||||||
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 237.7 | 21.8 | 3423.5 | 45.87 | ||||||
internvl2-4b | INT4-MIXED | 297 | 163.2 | 22.2 | 4246.4 | 45.05 | ||||||
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 202.5 | 22.4 | 3487.2 | 44.64 | ||||||
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 204.7 | 22.4 | 3509.2 | 44.64 | ||||||
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 313 | 22.8 | 3647.4 | 43.86 | ||||||
phi-4-mini-instruct | INT4-MIXED | 32 | 35.3 | 22.9 | 3091 | 43.67 | ||||||
phi-4-mini-reasoning | INT4-MIXED | 32 | 34.2 | 22.9 | 2990.2 | 43.67 | ||||||
internvl2-4b | INT4-MIXED | 297 | 170.8 | 23.3 | 4417 | 42.92 | ||||||
llama-3.2-1b-instruct | FP16 | 32 | 25.2 | 23.4 | 3104.9 | 42.74 | ||||||
phi-4-mini-instruct | INT4-MIXED | 32 | 37.7 | 23.5 | 3166.5 | 42.55 | ||||||
phi-4-mini-reasoning | INT4-MIXED | 32 | 35.7 | 23.5 | 3165 | 42.55 | ||||||
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 341.7 | 23.7 | 3720.1 | 42.19 | ||||||
phi-4-mini-reasoning | INT4-MIXED | 32 | 35.2 | 23.7 | 3212.8 | 42.19 | ||||||
llama-3.2-1b-instruct | FP16 | 1024 | 134.4 | 23.8 | 3241.8 | 42.02 | ||||||
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 366 | 23.8 | 3739.5 | 42.02 | ||||||
afm-4.5b | INT4-MIXED | 32 | 29.4 | 23.8 | 3626.8 | 42.02 | ||||||
phi-4-mini-instruct | INT4-MIXED | 1024 | 219.8 | 24.3 | 3365.5 | 41.15 | ||||||
phi-4-mini-reasoning | INT4-MIXED | 1024 | 229.4 | 24.3 | 3264.4 | 41.15 | ||||||
phi-4-mini-instruct | INT4-MIXED | 32 | 38.1 | 24.6 | 3275.2 | 40.65 | ||||||
internvl2-4b | INT4-MIXED | 1027 | 367.3 | 24.8 | 5781.4 | 40.32 | ||||||
phi-4-mini-instruct | INT4-MIXED | 1024 | 210.1 | 24.8 | 3443.2 | 40.32 | ||||||
phi-4-mini-reasoning | INT4-MIXED | 1024 | 208.8 | 24.8 | 3440.8 | 40.32 | ||||||
phi-3.5-vision-instruct | INT4-MIXED | 802 | 334.3 | 25 | 5540.8 | 40.00 | ||||||
afm-4.5b | INT4-MIXED | 1024 | 265.4 | 25.2 | 3797.6 | 39.68 | ||||||
phi-4-mini-reasoning | INT4-MIXED | 1024 | 289.5 | 25.2 | 3471.2 | 39.68 | ||||||
gemma-3-4b-it | INT4-MIXED | 32 | 51.3 | 25.2 | 4406.1 | 39.68 | ||||||
gemma-3-4b-it | INT4-MIXED | 32 | 51 | 25.7 | 4528.7 | 38.91 | ||||||
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 375 | 25.9 | 5829.4 | 38.61 | ||||||
internvl2-4b | INT4-MIXED | 1027 | 389.5 | 25.9 | 5870.4 | 38.61 | ||||||
gemma-3-4b-it | INT4-MIXED | 32 | 53.8 | 25.9 | 4595.5 | 38.61 | ||||||
phi-4-mini-instruct | INT4-MIXED | 1024 | 328.4 | 26 | 3553.5 | 38.46 | ||||||
gemma-3-4b-it | INT4-MIXED | 1024 | 266.6 | 26.6 | 6761.2 | 37.59 | ||||||
gemma-3-4b-it | INT4-MIXED | 1024 | 275.3 | 27.1 | 6874.8 | 36.90 | ||||||
gemma-3-4b-it | INT4-MIXED | 1024 | 348.3 | 27.4 | 6940 | 36.50 | ||||||
gpt-oss-20b | INT4-MIXED | 32 | 135.9 | 27.4 | 12961.2 | 36.50 | ||||||
gpt-oss-20b | INT4-MIXED | 1024 | 465.3 | 28.5 | 13189.8 | 35.09 | ||||||
minicpm3-4b | INT4-MIXED | 32 | 92 | 28.5 | 3193.6 | 35.09 | ||||||
minicpm3-4b | INT4-MIXED | 32 | 93.6 | 29.2 | 3317.6 | 34.25 | ||||||
minicpm3-4b | INT4-MIXED | 32 | 99.2 | 29.7 | 3511.1 | 33.67 | ||||||
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 32.4 | 29.8 | 4152.9 | 33.56 | ||||||
qwen2.5-1.5b-instruct | FP16 | 32 | 32.3 | 29.8 | 3709.2 | 33.56 | ||||||
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 161.2 | 30.3 | 4295.8 | 33.00 | ||||||
qwen2.5-1.5b-instruct | FP16 | 1024 | 161 | 30.3 | 3851.8 | 33.00 | ||||||
gpt-j-6b | INT4-MIXED | 32 | 42 | 30.5 | 4024.6 | 32.79 | ||||||
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 35.1 | 30.5 | 3782.1 | 32.79 | ||||||
flan-t5-xxl | INT4-MIXED | 33 | 46.7 | 30.6 | 12828.9 | 32.68 | ||||||
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 205.1 | 31.6 | 3900.3 | 31.65 | ||||||
llama-3.2-3b-instruct | INT8-CW | 32 | 37.2 | 31.6 | 4003.7 | 31.65 | ||||||
llama-2-7b-chat-hf | INT4-MIXED | 32 | 38.1 | 32.5 | 4240 | 30.77 | ||||||
llama-3.2-3b-instruct | INT8-CW | 1024 | 222.9 | 32.9 | 4222.2 | 30.40 | ||||||
gpt-j-6b | INT4-MIXED | 1024 | 300.4 | 33.1 | 5174.8 | 30.21 | ||||||
gpt-j-6b | INT4-MIXED | 32 | 49.9 | 33.1 | 4507.4 | 30.21 | ||||||
minicpm3-4b | INT4-MIXED | 1024 | 527.3 | 33.4 | 4380.6 | 29.94 | ||||||
llama-2-7b-chat-hf | INT4-MIXED | 32 | 41.1 | 33.5 | 4409.2 | 29.85 | ||||||
minicpm3-4b | INT4-MIXED | 1024 | 493.8 | 34 | 4502.7 | 29.41 | ||||||
falcon-7b-instruct | INT4-MIXED | 32 | 41.3 | 34.3 | 4342.1 | 29.15 | ||||||
chatglm3-6b | FP16 | 7 | 36.8 | 34.4 | 4391.2 | 29.07 | ||||||
biomistral-7b-slerp | INT4-MIXED | 7 | 36.9 | 34.4 | 4391.2 | 29.07 | ||||||
chatglm3-6b | INT4-MIXED | 7 | 36.8 | 34.4 | 4391.2 | 29.07 | ||||||
chatglm3-6b | INT4-MIXED | 7 | 36.8 | 34.4 | 4391.2 | 29.07 | ||||||
chatglm3-6b | INT8-CW | 7 | 36.8 | 34.4 | 4391.2 | 29.07 | ||||||
minicpm3-4b | INT4-MIXED | 1024 | 600.1 | 34.5 | 4688.6 | 28.99 | ||||||
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 41.1 | 34.5 | 4400.7 | 28.99 | ||||||
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 41.7 | 34.5 | 4408.8 | 28.99 | ||||||
flan-t5-xxl | INT4-MIXED | 1139 | 289.8 | 35 | 15033.6 | 28.57 | ||||||
falcon-7b-instruct | INT4-MIXED | 1024 | 329.8 | 35.5 | 4489.3 | 28.17 | ||||||
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 291.9 | 35.5 | 4959.2 | 28.17 | ||||||
gpt-j-6b | INT4-MIXED | 1024 | 437.4 | 35.6 | 5663.3 | 28.09 | ||||||
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 44.7 | 35.6 | 4589.8 | 28.09 | ||||||
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 45.1 | 35.7 | 4678.4 | 28.01 | ||||||
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 42.9 | 36 | 5057.1 | 27.78 | ||||||
qwen2-7b-instruct | INT4-MIXED | 32 | 42.9 | 36 | 4970.5 | 27.78 | ||||||
qwen2.5-7b-instruct | INT4-MIXED | 32 | 43.7 | 36 | 4971.2 | 27.78 | ||||||
biomistral-7b-slerp | INT4-MIXED | 7 | 38.6 | 36 | 4779.3 | 27.78 | ||||||
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 45.5 | 36.1 | 4794.1 | 27.70 | ||||||
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 43.4 | 36.1 | 4971.9 | 27.70 | ||||||
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 299 | 36.2 | 4713.1 | 27.62 | ||||||
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 45.7 | 36.2 | 4695.9 | 27.62 | ||||||
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 45.6 | 36.2 | 4799.9 | 27.62 | ||||||
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 301.3 | 36.3 | 4705.4 | 27.55 | ||||||
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 294.5 | 36.5 | 5113.9 | 27.40 | ||||||
phi-3-mini-128k-instruct | INT8-CW | 32 | 43.4 | 36.7 | 4517 | 27.25 | ||||||
phi-3-mini-4k-instruct | INT8-CW | 32 | 43.4 | 36.9 | 4520.5 | 27.10 | ||||||
phi-3.5-mini-instruct | INT8-CW | 32 | 43.8 | 36.9 | 4517.5 | 27.10 | ||||||
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 45.8 | 37 | 5150.4 | 27.03 | ||||||
qwen2.5-7b-instruct | INT4-MIXED | 32 | 45.7 | 37 | 5160.2 | 27.03 | ||||||
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 252 | 37.3 | 5279.3 | 26.81 | ||||||
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 286.6 | 37.3 | 4874.3 | 26.81 | ||||||
qwen2-7b-instruct | INT4-MIXED | 1024 | 258.9 | 37.3 | 5190.9 | 26.81 | ||||||
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 253.9 | 37.3 | 5191.7 | 26.81 | ||||||
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 276.7 | 37.4 | 4970 | 26.74 | ||||||
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 262.4 | 37.4 | 5192.5 | 26.74 | ||||||
falcon-7b-instruct | INT4-MIXED | 32 | 48.2 | 37.4 | 4927 | 26.74 | ||||||
qwen2-7b-instruct | INT4-MIXED | 32 | 46.7 | 37.5 | 5365.9 | 26.67 | ||||||
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 46.7 | 37.6 | 5366.6 | 26.60 | ||||||
qwen2.5-7b-instruct | INT4-MIXED | 32 | 46.5 | 37.6 | 5275.8 | 26.60 | ||||||
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 46.7 | 37.6 | 5274.2 | 26.60 | ||||||
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 416.4 | 37.9 | 4989.3 | 26.39 | ||||||
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 389.7 | 37.9 | 5089.3 | 26.39 | ||||||
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 400.4 | 38 | 5093.7 | 26.32 | ||||||
llama-3-8b-instruct | INT4-MIXED | 32 | 44.9 | 38 | 5176.7 | 26.32 | ||||||
llama-3.1-8b-instruct | INT4-MIXED | 32 | 44.7 | 38 | 5177.8 | 26.32 | ||||||
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 44.5 | 38.1 | 5177.6 | 26.25 | ||||||
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 272.5 | 38.3 | 5376.8 | 26.11 | ||||||
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 258.6 | 38.3 | 5383.1 | 26.11 | ||||||
phi-4-mini-instruct | INT8-CW | 32 | 45.8 | 38.3 | 4626.1 | 26.11 | ||||||
phi-4-mini-reasoning | INT8-CW | 32 | 45 | 38.3 | 4531.1 | 26.11 | ||||||
falcon-7b-instruct | INT4-MIXED | 1024 | 414 | 38.7 | 5117.2 | 25.84 | ||||||
qwen2-7b-instruct | INT4-MIXED | 1024 | 382.5 | 38.8 | 5571.3 | 25.77 | ||||||
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 373.5 | 38.8 | 5483.2 | 25.77 | ||||||
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 376.1 | 38.9 | 5572.7 | 25.71 | ||||||
minicpm4-8b | INT4-MIXED | 32 | 46.2 | 38.9 | 5022.6 | 25.71 | ||||||
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 371.4 | 39 | 5479.5 | 25.64 | ||||||
llama-3-8b-instruct | INT4-MIXED | 32 | 49.8 | 39.1 | 5378.2 | 25.58 | ||||||
llama-3.1-8b-instruct | INT4-MIXED | 32 | 47.9 | 39.1 | 5456.7 | 25.58 | ||||||
phi-3-mini-128k-instruct | INT8-CW | 1024 | 318.3 | 39.2 | 5135.6 | 25.51 | ||||||
phi-3-mini-4k-instruct | INT8-CW | 1024 | 324.2 | 39.2 | 5138.6 | 25.51 | ||||||
phi-3.5-mini-instruct | INT8-CW | 1024 | 320.9 | 39.2 | 5135.3 | 25.51 | ||||||
llama-3-8b-instruct | INT4-MIXED | 32 | 48.9 | 39.5 | 5366 | 25.32 | ||||||
internvl2-4b | INT8-CW | 297 | 162.9 | 39.5 | 5778.2 | 25.32 | ||||||
llama-3-8b-instruct | INT4-MIXED | 1024 | 304.1 | 39.7 | 5482.5 | 25.19 | ||||||
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 309.8 | 39.7 | 5482.1 | 25.19 | ||||||
llama-3-8b-instruct | INT4-MIXED | 32 | 50.1 | 39.7 | 5481.4 | 25.19 | ||||||
gemma-3-4b-it | INT8-CW | 32 | 56.2 | 39.7 | 6003.9 | 25.19 | ||||||
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 307.8 | 39.8 | 5481.1 | 25.13 | ||||||
phi-4-mini-instruct | INT8-CW | 1024 | 282.1 | 39.8 | 4882.7 | 25.13 | ||||||
phi-4-mini-reasoning | INT8-CW | 1024 | 286.2 | 39.8 | 4790.1 | 25.13 | ||||||
minicpm4-8b | INT4-MIXED | 32 | 51 | 40.2 | 5239.7 | 24.88 | ||||||
minicpm4-8b | INT4-MIXED | 1024 | 332.4 | 40.3 | 5196.3 | 24.81 | ||||||
minicpm4-8b | INT4-MIXED | 32 | 54.1 | 40.6 | 5452.6 | 24.63 | ||||||
afm-4.5b | INT8-CW | 32 | 45.9 | 40.6 | 5349 | 24.63 | ||||||
llama-3-8b-instruct | INT4-MIXED | 1024 | 301.9 | 40.8 | 5683.6 | 24.51 | ||||||
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 295.1 | 40.9 | 5764.8 | 24.45 | ||||||
llama-3-8b-instruct | INT4-MIXED | 1024 | 293.3 | 41 | 5674.6 | 24.39 | ||||||
gemma-3-4b-it | INT8-CW | 1024 | 407.2 | 41.2 | 8371.7 | 24.27 | ||||||
llama-3-8b-instruct | INT4-MIXED | 1024 | 400.4 | 41.4 | 5774.6 | 24.15 | ||||||
phi-3.5-vision-instruct | INT8-CW | 802 | 298 | 41.4 | 6797.3 | 24.15 | ||||||
minicpm4-8b | INT4-MIXED | 1024 | 327.8 | 41.5 | 5414.6 | 24.10 | ||||||
afm-4.5b | INT8-CW | 1024 | 269.3 | 41.9 | 5519.4 | 23.87 | ||||||
minicpm4-8b | INT4-MIXED | 1024 | 427.3 | 42.1 | 5614.6 | 23.75 | ||||||
internvl2-4b | INT8-CW | 1027 | 399.5 | 42.1 | 6911.6 | 23.75 | ||||||
phi-3.5-vision-instruct | INT8-CW | 1032 | 360.5 | 42.2 | 7100.3 | 23.70 | ||||||
gemma-7b-it | INT4-MIXED | 32 | 52.5 | 44 | 5402.3 | 22.73 | ||||||
gemma-7b-it | INT4-MIXED | 32 | 56.3 | 45.7 | 5833 | 21.88 | ||||||
glm-4-9b-chat-hf | INT4-MIXED | 32 | 58.1 | 46.4 | 6082 | 21.55 | ||||||
gemma-7b-it | INT4-MIXED | 1024 | 351.8 | 46.5 | 6112.7 | 21.51 | ||||||
minicpm3-4b | INT8-CW | 32 | 102.1 | 46.5 | 5179.6 | 21.51 | ||||||
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 62.2 | 47 | 6448.8 | 21.28 | ||||||
llama-3.1-8b-instruct | INT4-MIXED | 32 | 61.2 | 47 | 6356.4 | 21.28 | ||||||
glm-4-9b-chat-hf | INT4-MIXED | 32 | 62.6 | 47.6 | 6295.5 | 21.01 | ||||||
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 453.8 | 48 | 6639.5 | 20.83 | ||||||
gemma-7b-it | INT4-MIXED | 1024 | 465.8 | 48.2 | 6526.7 | 20.75 | ||||||
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 449.2 | 48.7 | 6653.7 | 20.53 | ||||||
llava-next-video-7b-hf | INT4-MIXED | 2945 | 1900 | 48.8 | 8941 | 20.49 | ||||||
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 448.3 | 48.8 | 6747.6 | 20.49 | ||||||
glm-4-9b-chat-hf | INT4-MIXED | 32 | 64.9 | 48.8 | 6448.3 | 20.49 | ||||||
gemma-2-9b-it | INT4-MIXED | 32 | 60.2 | 49.1 | 5851.1 | 20.37 | ||||||
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 479 | 49.5 | 6854.1 | 20.20 | ||||||
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 542.7 | 50.1 | 6993 | 19.96 | ||||||
gemma-2-9b-it | INT4-MIXED | 32 | 64.7 | 50.4 | 6093.8 | 19.84 | ||||||
llava-next-video-7b-hf | INT4-MIXED | 2945 | 2132.4 | 50.5 | 9169.6 | 19.80 | ||||||
gemma-2-9b-it | INT4-MIXED | 32 | 65.5 | 51 | 6338.5 | 19.61 | ||||||
ltx-video | INT8-CW | 11 | 52.1 | 51.4 | 9317.9 | 19.46 | ||||||
minicpm3-4b | INT8-CW | 1024 | 605.6 | 51.8 | 6340 | 19.31 | ||||||
gemma-2-9b-it | INT4-MIXED | 1024 | 408.5 | 52 | 6453.4 | 19.23 | ||||||
gemma-2-9b-it | INT4-MIXED | 1024 | 398.2 | 53.3 | 6701.5 | 18.76 | ||||||
gemma-2-9b-it | INT4-MIXED | 1024 | 521.5 | 54 | 6937.4 | 18.52 | ||||||
flan-t5-xxl | INT8-CW | 33 | 128.5 | 54.4 | 22574 | 18.38 | ||||||
ltx-video | INT4-MIXED | 11 | 56.5 | 55.4 | 6459 | 18.05 | ||||||
llama-3.2-3b-instruct | FP4-NORMALIZED | 32 | 59.5 | 56.4 | 6668.8 | 17.73 | ||||||
qwen2.5-coder-3b-instruct | FP16 | 32 | 60.4 | 57.3 | 6762.7 | 17.45 | ||||||
gpt-j-6b | INT8-CW | 32 | 68 | 57.3 | 6813.4 | 17.45 | ||||||
llama-3.2-3b-instruct | FP4-NORMALIZED | 1024 | 265.8 | 57.6 | 6893.8 | 17.36 | ||||||
qwen2.5-coder-3b-instruct | FP16 | 1024 | 311.8 | 57.9 | 6936.2 | 17.27 | ||||||
llama-3.2-3b-instruct | FP16 | 32 | 65.6 | 59.7 | 6954.9 | 16.75 | ||||||
flan-t5-xxl | INT8-CW | 1139 | 287.3 | 59.7 | 24753.2 | 16.75 | ||||||
gpt-j-6b | INT8-CW | 1024 | 416.7 | 59.9 | 7948.1 | 16.69 | ||||||
lcm-dreamshaper-v7 | INT8-CW | 1024 | 63.5 | 60.6 | 5206.3 | 16.50 | ||||||
llama-2-13b-chat-hf | INT4-MIXED | 32 | 72.3 | 60.7 | 7354.6 | 16.47 | ||||||
lcm-dreamshaper-v7 | INT8-CW | 1024 | 63.1 | 60.8 | 4943.5 | 16.45 | ||||||
llama-2-7b-chat-hf | INT8-CW | 32 | 67.8 | 60.8 | 7432.6 | 16.45 | ||||||
llama-3.2-3b-instruct | FP16 | 1024 | 270.6 | 61.3 | 7279.4 | 16.31 | ||||||
lcm-dreamshaper-v7 | INT8-CW | 32 | 61.4 | 62.6 | 5030 | 15.97 | ||||||
lcm-dreamshaper-v7 | INT8-CW | 32 | 66.5 | 63 | 5204.7 | 15.87 | ||||||
gemma-3-12b-it | INT4-MIXED | 32 | 85.9 | 64 | 8912.2 | 15.63 | ||||||
llama-2-7b-chat-hf | INT8-CW | 1024 | 365.5 | 64 | 8138.2 | 15.63 | ||||||
biomistral-7b-slerp | INT8-CW | 7 | 67.4 | 65 | 7821.6 | 15.38 | ||||||
mistral-7b-instruct-v0.1 | INT8-CW | 32 | 73.7 | 65.1 | 7737.2 | 15.36 | ||||||
mistral-7b-instruct-v0.2 | INT8-CW | 32 | 73.8 | 65.2 | 7736.5 | 15.34 | ||||||
mistral-7b-instruct-v0.3 | INT8-CW | 32 | 73.8 | 65.2 | 7840.8 | 15.34 | ||||||
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 556.8 | 65.9 | 8448.5 | 15.17 | ||||||
qwen2-7b-instruct | INT8-CW | 32 | 75.1 | 65.9 | 8087.4 | 15.17 | ||||||
qwen2.5-7b-instruct | INT8-CW | 32 | 75.1 | 65.9 | 8088.8 | 15.17 | ||||||
qwen2.5-7b-instruct-1m | INT8-CW | 32 | 75.2 | 66 | 8182.4 | 15.15 | ||||||
phi-4-mini-instruct | FP4-NORMALIZED | 32 | 71.9 | 66.1 | 7585 | 15.13 | ||||||
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 74.9 | 66.1 | 8175.9 | 15.13 | ||||||
phi-4-mini-reasoning | FP4-NORMALIZED | 32 | 71.8 | 66.3 | 7685.1 | 15.08 | ||||||
gemma-3-12b-it | INT4-MIXED | 32 | 116.6 | 66.6 | 9357.9 | 15.02 | ||||||
mistral-7b-instruct-v0.2 | INT8-CW | 1024 | 383.2 | 67 | 8035.9 | 14.93 | ||||||
mistral-7b-instruct-v0.1 | INT8-CW | 1025 | 403.5 | 67.1 | 8023.1 | 14.90 | ||||||
mistral-7b-instruct-v0.3 | INT8-CW | 1024 | 385 | 67.2 | 8136.8 | 14.88 | ||||||
deepseek-r1-distill-qwen-7b | INT8-CW | 1024 | 368.2 | 67.5 | 8391.1 | 14.81 | ||||||
phi-4-mini-reasoning | FP4-NORMALIZED | 1024 | 338.2 | 67.6 | 7933.1 | 14.79 | ||||||
qwen2-7b-instruct | INT8-CW | 1024 | 372.1 | 67.6 | 8299.1 | 14.79 | ||||||
qwen2.5-7b-instruct | INT8-CW | 1024 | 373.5 | 67.6 | 8298.4 | 14.79 | ||||||
phi-4-mini-instruct | FP4-NORMALIZED | 1024 | 363.4 | 67.8 | 7841.4 | 14.75 | ||||||
qwen2.5-7b-instruct-1m | INT8-CW | 1024 | 370.1 | 67.8 | 8392.6 | 14.75 | ||||||
gemma-3-12b-it | INT4-MIXED | 1024 | 653.3 | 68 | 11641 | 14.71 | ||||||
phi-4 | INT4-MIXED | 32 | 79.7 | 68.3 | 8359.4 | 14.64 | ||||||
phi-4-reasoning | INT4-MIXED | 32 | 79.5 | 68.3 | 8358.1 | 14.64 | ||||||
llama-3.1-8b-instruct | INT8-CW | 32 | 75.9 | 68.6 | 8521.5 | 14.58 | ||||||
deepseek-r1-distill-llama-8b | INT8-CW | 32 | 76.1 | 68.7 | 8611.3 | 14.56 | ||||||
llama-3-8b-instruct | INT8-CW | 32 | 76.9 | 68.7 | 8614.3 | 14.56 | ||||||
phi-3-mini-128k-instruct | FP16 | 32 | 73.4 | 68.8 | 8305 | 14.53 | ||||||
phi-3-mini-4k-instruct | FP16 | 32 | 75.5 | 68.9 | 8211.3 | 14.51 | ||||||
phi-3.5-mini-instruct | FP16 | 32 | 73.6 | 68.9 | 8191 | 14.51 | ||||||
baichuan2-13b-chat | INT4-MIXED | 32 | 87.3 | 68.9 | 8815.4 | 14.51 | ||||||
qwen1.5-14b-chat | INT4-MIXED | 32 | 84.4 | 69 | 9086.1 | 14.49 | ||||||
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 80.3 | 69.4 | 8779.6 | 14.41 | ||||||
gemma-3-12b-it | INT4-MIXED | 1024 | 743 | 70.7 | 12037.7 | 14.14 | ||||||
llama-3.1-8b-instruct | INT8-CW | 1024 | 383 | 70.8 | 8810.4 | 14.12 | ||||||
deepseek-r1-distill-llama-8b | INT8-CW | 1024 | 376.6 | 70.9 | 8903.4 | 14.10 | ||||||
llama-3-8b-instruct | INT8-CW | 1024 | 427.1 | 70.9 | 8902.9 | 14.10 | ||||||
phi-4-mini-instruct | FP16 | 32 | 77.3 | 71.4 | 8169.8 | 14.01 | ||||||
phi-4 | INT4-MIXED | 1024 | 619.7 | 71.4 | 8829.3 | 14.01 | ||||||
phi-4-reasoning | INT4-MIXED | 1024 | 615.3 | 71.4 | 8826.8 | 14.01 | ||||||
internvl2-4b | FP16 | 297 | 216.4 | 71.6 | 9512.2 | 13.97 | ||||||
phi-4-mini-reasoning | FP16 | 32 | 77.1 | 71.6 | 8171.4 | 13.97 | ||||||
phi-4 | INT4-MIXED | 32 | 89.4 | 71.9 | 9057 | 13.91 | ||||||
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 90.6 | 72.4 | 9400.5 | 13.81 | ||||||
phi-3.5-mini-instruct | FP16 | 1024 | 389.2 | 72.7 | 9095.2 | 13.76 | ||||||
phi-3-mini-4k-instruct | FP16 | 1024 | 386.1 | 72.8 | 9094 | 13.74 | ||||||
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 592.9 | 72.9 | 9169.9 | 13.72 | ||||||
phi-3-mini-128k-instruct | FP16 | 1024 | 345 | 73.1 | 9186.6 | 13.68 | ||||||
phi-4-mini-instruct | FP16 | 1024 | 367.6 | 73.2 | 8562.5 | 13.66 | ||||||
phi-3.5-vision-instruct | FP16 | 802 | 406.7 | 73.2 | 10330.5 | 13.66 | ||||||
phi-4-mini-reasoning | FP16 | 1024 | 342.5 | 73.4 | 8560.5 | 13.62 | ||||||
baichuan2-13b-chat | INT4-MIXED | 1024 | 1837.1 | 73.9 | 13059.7 | 13.53 | ||||||
minicpm4-8b | INT8-CW | 32 | 83 | 73.9 | 8743 | 13.53 | ||||||
phi-3.5-vision-instruct | FP16 | 1032 | 518.5 | 74.1 | 10638.2 | 13.50 | ||||||
qwen1.5-14b-chat | INT4-MIXED | 1024 | 664.3 | 74.1 | 10171.1 | 13.50 | ||||||
internvl2-4b | FP16 | 1027 | 543.4 | 74.3 | 10380.3 | 13.46 | ||||||
phi-4-reasoning | INT4-MIXED | 32 | 103.3 | 74.7 | 9650.4 | 13.39 | ||||||
phi-4 | INT4-MIXED | 1024 | 738.8 | 74.9 | 9509.4 | 13.35 | ||||||
gemma-3-4b-it | FP16 | 32 | 84 | 75 | 10702.8 | 13.33 | ||||||
minicpm4-8b | INT8-CW | 1024 | 428 | 75.7 | 8906.9 | 13.21 | ||||||
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 722.9 | 75.8 | 9786 | 13.19 | ||||||
gemma-3-4b-it | FP16 | 1024 | 399.2 | 76.7 | 13070.1 | 13.04 | ||||||
llava-next-video-7b-hf | INT8-CW | 2945 | 2147.7 | 77 | 11933.6 | 12.99 | ||||||
phi-4-reasoning | INT4-MIXED | 1024 | 823.7 | 77.7 | 9782.8 | 12.87 | ||||||
gemma-7b-it | INT8-CW | 32 | 92.3 | 79.9 | 9104.5 | 12.52 | ||||||
afm-4.5b | FP16 | 32 | 84.4 | 80 | 9635.6 | 12.50 | ||||||
afm-4.5b | FP16 | 1024 | 314.6 | 81.2 | 9865.1 | 12.32 | ||||||
glm-4-9b-chat-hf | INT8-CW | 32 | 95.2 | 82.6 | 9980.1 | 12.11 | ||||||
gemma-7b-it | INT8-CW | 1024 | 446 | 82.7 | 9807.5 | 12.09 | ||||||
glm-4-9b-chat-hf | INT8-CW | 1024 | 530.4 | 84.5 | 10503.6 | 11.83 | ||||||
phi-4-multimodal-instruct | FP16 | 578 | 472.2 | 85.2 | 12425.3 | 11.74 | ||||||
phi-4-multimodal-instruct | FP16 | 786 | 550.6 | 85.5 | 13352.9 | 11.70 | ||||||
phi-4-multimodal-instruct | FP16 | 1362 | 1114.8 | 86.5 | 14960.9 | 11.56 | ||||||
phi-4-multimodal-instruct | FP16 | 1570 | 1135.4 | 86.8 | 15711.6 | 11.52 | ||||||
gemma-2-9b-it | INT8-CW | 32 | 99.4 | 87.6 | 9830.1 | 11.42 | ||||||
gemma-2-9b-it | INT8-CW | 1024 | 503.3 | 90.7 | 10407.2 | 11.03 | ||||||
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec |
|---|---|---|---|---|---|---|
t5-small | INT4-MIXED | 32 | 10 | 5 | 1052.1 | 200.00 |
t5-small | INT8-CW | 32 | 11.7 | 5.2 | 1218.7 | 192.31 |
t5-small | INT4-MIXED | 1024 | 14.1 | 5.3 | 1010.3 | 188.68 |
t5-small | INT4-MIXED | 32 | 10.9 | 5.7 | 1142.7 | 175.44 |
t5-small | INT8-CW | 1024 | 16.1 | 5.7 | 1216.8 | 175.44 |
t5-small | INT4-MIXED | 1024 | 17.4 | 5.8 | 1114.4 | 172.41 |
t5-small | INT4-MIXED | 32 | 16.1 | 5.9 | 1155.4 | 169.49 |
t5-small | FP16 | 32 | 18.5 | 6.1 | 1201.4 | 163.93 |
t5-small | INT4-MIXED | 1024 | 17 | 6.1 | 1115 | 163.93 |
t5-small | FP16 | 1024 | 12.5 | 6.6 | 1327.6 | 151.52 |
distil-large-v2 | INT8-CW | 32 | 237.5 | 9 | 2643.4 | 111.11 |
minicpm4-0.5b | INT4-MIXED | 32 | 28.2 | 9.1 | 1397.6 | 109.89 |
minicpm4-0.5b | INT4-MIXED | 32 | 32.1 | 9.7 | 1445.1 | 103.09 |
minicpm4-0.5b | INT4-MIXED | 1024 | 65.6 | 10.2 | 1274.1 | 98.04 |
whisper-large-v3-turbo | INT8-CW | 32 | 257.8 | 10.5 | 2680.7 | 95.24 |
distil-large-v2 | INT4-MIXED | 32 | 260.5 | 10.8 | 2128.4 | 92.59 |
minicpm4-0.5b | INT4-MIXED | 32 | 36.4 | 10.8 | 1414 | 92.59 |
whisper-large-v3-turbo | INT8-CW | 1024 | 336.3 | 11.1 | 2463.5 | 90.09 |
distil-large-v2 | INT4-MIXED | 1024 | 348 | 11.2 | 1976 | 89.29 |
distil-large-v2 | INT8-CW | 1024 | 327.1 | 11.2 | 2441.6 | 89.29 |
whisper-large-v3-turbo | INT4-MIXED | 1024 | 355.6 | 11.3 | 2082.5 | 88.50 |
minicpm4-0.5b | INT8-CW | 32 | 35.8 | 11.4 | 1736.1 | 87.72 |
minicpm4-0.5b | INT4-MIXED | 1024 | 73.1 | 11.5 | 1314.9 | 86.96 |
whisper-large-v3-turbo | INT4-MIXED | 32 | 282 | 12 | 2229.2 | 83.33 |
minicpm4-0.5b | INT4-MIXED | 1024 | 69.8 | 12.2 | 1295.2 | 81.97 |
whisper-small | INT4-MIXED | 32 | 126.3 | 12.9 | 1554.1 | 77.52 |
gemma-3-270m | INT4-MIXED | 32 | 42.5 | 13 | 1614.4 | 76.92 |
minicpm4-0.5b | INT8-CW | 1024 | 65.3 | 13 | 1605.3 | 76.92 |
whisper-small | INT4-MIXED | 1024 | 180.8 | 13.1 | 1410.2 | 76.34 |
gemma-3-270m | INT4-MIXED | 1024 | 53.6 | 13.8 | 1470.5 | 72.46 |
whisper-small | INT4-MIXED | 32 | 129.6 | 14.4 | 1671.9 | 69.44 |
gemma-3-270m | INT8-CW | 1024 | 59.1 | 14.9 | 1581.3 | 67.11 |
gemma-3-270m | INT8-CW | 32 | 40.3 | 14.9 | 1696.7 | 67.11 |
whisper-small | INT4-MIXED | 1024 | 197.6 | 15.2 | 1533.7 | 65.79 |
gemma-3-270m | FP16 | 32 | 42.2 | 15.6 | 1586.8 | 64.10 |
whisper-small | INT4-MIXED | 32 | 141.8 | 15.6 | 1652.4 | 64.10 |
whisper-small | INT4-MIXED | 1024 | 184.6 | 15.7 | 1513.1 | 63.69 |
whisper-small | INT8-CW | 1024 | 202.4 | 16.1 | 1702.5 | 62.11 |
whisper-small | INT8-CW | 32 | 151.8 | 16.2 | 1817.9 | 61.73 |
whisper-large-v3-turbo | FP16 | 32 | 320.5 | 16.4 | 3183.7 | 60.98 |
whisper-large-v3-turbo | FP16 | 1024 | 418.2 | 16.5 | 3213.9 | 60.61 |
gemma-3-270m | FP16 | 1024 | 51.8 | 16.6 | 1661.4 | 60.24 |
distil-large-v2 | FP16 | 1024 | 380.6 | 16.9 | 2693.5 | 59.17 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 43.2 | 17.1 | 2470.4 | 58.48 |
distil-large-v2 | FP16 | 32 | 304.1 | 17.2 | 2662.8 | 58.14 |
whisper-small | FP16 | 32 | 136 | 18 | 1888.7 | 55.56 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 143 | 18.4 | 2273.5 | 54.35 |
whisper-small | FP16 | 1024 | 203.4 | 18.5 | 1920.6 | 54.05 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 32.2 | 21 | 2443.1 | 47.62 |
llama-3.2-1b-instruct | INT8-CW | 32 | 48.1 | 21.8 | 2549.1 | 45.87 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 114.3 | 22 | 2240.9 | 45.45 |
gemma-3-1b-it | INT4-MIXED | 32 | 75.1 | 22.6 | 2123.2 | 44.25 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 75.9 | 22.8 | 2662.2 | 43.86 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 79.2 | 22.9 | 2610.7 | 43.67 |
gemma-3-1b-it | INT4-MIXED | 1024 | 119.2 | 23 | 2013.9 | 43.48 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 133.5 | 23 | 2501.4 | 43.48 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 155.3 | 23.2 | 2381.6 | 43.10 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 159.5 | 23.8 | 2321.7 | 42.02 |
gemma-3-1b-it | INT4-MIXED | 1024 | 138.9 | 24.1 | 2182.3 | 41.49 |
gemma-3-1b-it | INT4-MIXED | 32 | 81.6 | 24.1 | 2299.9 | 41.49 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 69.6 | 24.8 | 2959.9 | 40.32 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 88 | 25.1 | 2713.8 | 39.84 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 163.8 | 26.1 | 2634.8 | 38.31 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 70.9 | 26.1 | 2434.7 | 38.31 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 203.7 | 26.4 | 2563.3 | 37.88 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 165.5 | 27 | 2263.4 | 37.04 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 90.3 | 27 | 3127.3 | 37.04 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 203.9 | 27.4 | 2954.8 | 36.50 |
smolvlm2-256m-video-instruct | INT4-MIXED | 1141 | 618.3 | 27.7 | 2952.6 | 36.10 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 75.3 | 28 | 3540.4 | 35.71 |
gemma-3-1b-it | INT8-CW | 32 | 90.7 | 28.6 | 2469.1 | 34.97 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 166.9 | 28.7 | 3240.5 | 34.84 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 78.8 | 28.7 | 3004.4 | 34.84 |
gemma-3-1b-it | INT8-CW | 1024 | 133.9 | 29.3 | 2368.4 | 34.13 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 169 | 29.5 | 2726.9 | 33.90 |
smolvlm2-256m-video-instruct | INT8-CW | 1141 | 624.1 | 34 | 2964.9 | 29.41 |
smolvlm2-256m-video-instruct | FP16 | 1141 | 701.4 | 34.6 | 3688.7 | 28.90 |
phi-2 | INT4-MIXED | 32 | 134.8 | 34.6 | 2997.2 | 28.90 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 103.1 | 36.2 | 3009.3 | 27.62 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 107.5 | 36.3 | 3018.4 | 27.55 |
afm-4.5b | INT4-MIXED | 32 | 84.2 | 36.7 | 4387 | 27.25 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 133.2 | 36.7 | 2990.3 | 27.25 |
phi-2 | INT4-MIXED | 1024 | 386.3 | 36.8 | 3174.5 | 27.17 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 137.2 | 37.1 | 2990 | 26.95 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 103.6 | 37.3 | 3486.2 | 26.81 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 327.5 | 37.5 | 3097.5 | 26.67 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 294.1 | 37.5 | 2987 | 26.67 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 111.1 | 37.9 | 3660.5 | 26.39 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 117.6 | 38.1 | 3653.5 | 26.25 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 141.4 | 38.2 | 3618.9 | 26.18 |
afm-4.5b | INT4-MIXED | 1024 | 460.3 | 38.5 | 4232 | 25.97 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 424.5 | 38.8 | 3315.4 | 25.77 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 113.9 | 38.8 | 3654.6 | 25.77 |
phi-2 | INT4-MIXED | 32 | 137.2 | 38.9 | 3268.1 | 25.71 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 387.4 | 39.1 | 3157.7 | 25.58 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 389.9 | 39.3 | 3153.9 | 25.45 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 152.4 | 39.8 | 3352.9 | 25.13 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 493.3 | 40.3 | 3249.6 | 24.81 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 514 | 40.8 | 3714.6 | 24.51 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 513.5 | 41.1 | 3715.5 | 24.33 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 473.3 | 41.2 | 3488.4 | 24.27 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 122.8 | 41.3 | 3632 | 24.21 |
phi-2 | INT4-MIXED | 1024 | 485.9 | 41.5 | 3414.5 | 24.10 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 510.5 | 41.7 | 3720.9 | 23.98 |
gemma-3-1b-it | FP16 | 32 | 99.7 | 42.2 | 3200.1 | 23.70 |
gemma-3-1b-it | FP16 | 1024 | 182.7 | 42.4 | 3328.3 | 23.58 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 653.9 | 43.4 | 3804 | 23.04 |
biomistral-7b-slerp | INT4-MIXED | 7 | 62.4 | 43.4 | 5201.3 | 23.04 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 106.1 | 43.7 | 3649 | 22.88 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 91.8 | 44.4 | 3332.6 | 22.52 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 139.2 | 44.7 | 3677.5 | 22.37 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 137.8 | 45.1 | 3531.1 | 22.17 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 134.2 | 45.4 | 3506.4 | 22.03 |
phi-2 | INT8-CW | 32 | 134.2 | 45.8 | 4346.6 | 21.83 |
stablelm-3b-4e1t | INT8-CW | 32 | 121.3 | 46.1 | 4345.1 | 21.69 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 328.6 | 46.2 | 3141.2 | 21.65 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 123.9 | 46.2 | 4443.6 | 21.65 |
biomistral-7b-slerp | INT4-MIXED | 7 | 64.3 | 46.5 | 5410.6 | 21.51 |
internvl2-4b | INT4-MIXED | 297 | 358.2 | 47 | 4322.1 | 21.28 |
phi-4-mini-instruct | INT4-MIXED | 32 | 158.5 | 47.1 | 3432.6 | 21.23 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 667.6 | 47.4 | 3885 | 21.10 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 431.6 | 47.6 | 3574.9 | 21.01 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 675.8 | 48.1 | 3908.7 | 20.79 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 427.8 | 48.3 | 3675 | 20.70 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 413.4 | 48.5 | 3617.1 | 20.62 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 166.6 | 48.7 | 3856.7 | 20.53 |
phi-2 | INT8-CW | 1024 | 415.4 | 48.7 | 4383.8 | 20.53 |
internvl2-4b | INT4-MIXED | 297 | 397.2 | 48.9 | 4351.7 | 20.45 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 421 | 48.9 | 4539.5 | 20.45 |
stablelm-3b-4e1t | INT8-CW | 1024 | 414.7 | 48.9 | 4398.9 | 20.45 |
internvl2-4b | INT4-MIXED | 1027 | 804.1 | 49.1 | 4896 | 20.37 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 535.8 | 50.2 | 3751.2 | 19.92 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 707.5 | 50.3 | 5136.7 | 19.88 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 110.8 | 50.9 | 3521.5 | 19.65 |
afm-4.5b | INT8-CW | 32 | 71.5 | 50.9 | 5726.2 | 19.65 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 116.9 | 51.2 | 3512.7 | 19.53 |
llama-3.2-3b-instruct | INT8-CW | 32 | 90.7 | 51.2 | 4467 | 19.53 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 136 | 51.5 | 4367.4 | 19.42 |
internvl2-4b | INT4-MIXED | 1027 | 884.3 | 51.7 | 4965.2 | 19.34 |
phi-4-mini-instruct | INT4-MIXED | 32 | 169.7 | 51.8 | 3998.2 | 19.31 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 891.5 | 52.3 | 5107.8 | 19.12 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 347.6 | 52.4 | 4581.6 | 19.08 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 407.7 | 52.6 | 4327 | 19.01 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 63.6 | 52.9 | 3224.7 | 18.90 |
afm-4.5b | INT8-CW | 1024 | 346.2 | 53 | 5770.5 | 18.87 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 571.5 | 53.5 | 3749.6 | 18.69 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 509.1 | 54.2 | 3751 | 18.45 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 500 | 54.4 | 3742.7 | 18.38 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 311.9 | 54.4 | 2949.9 | 18.38 |
phi-4-mini-instruct | INT4-MIXED | 32 | 148.9 | 55.1 | 3964.9 | 18.15 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 136.9 | 55.3 | 3968.1 | 18.08 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 89.6 | 55.7 | 4594.4 | 17.95 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 250.5 | 56 | 4698.2 | 17.86 |
whisper-large-v3 | INT4-MIXED | 1024 | 566.5 | 57.1 | 3372.1 | 17.51 |
qwen2.5-1.5b-instruct | FP16 | 32 | 96.7 | 57.2 | 4045.2 | 17.48 |
whisper-large-v3 | INT8-CW | 1024 | 536.1 | 57.4 | 4018.2 | 17.42 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 437.4 | 57.5 | 3627.7 | 17.39 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 433.8 | 57.7 | 3625.9 | 17.33 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 259.3 | 57.8 | 4158.7 | 17.30 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 109.5 | 57.8 | 5118.3 | 17.30 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 123.6 | 57.9 | 5357.2 | 17.27 |
whisper-large-v3 | INT8-CW | 32 | 480.2 | 58 | 4230.1 | 17.24 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 101.2 | 58 | 5355.9 | 17.24 |
phi-3.5-mini-instruct | INT8-CW | 32 | 108.6 | 58.2 | 5357.3 | 17.18 |
whisper-large-v3 | INT4-MIXED | 32 | 504.5 | 58.3 | 3718.6 | 17.15 |
gpt-j-6b | INT4-MIXED | 32 | 200 | 58.3 | 4869.3 | 17.15 |
gemma-3-4b-it | INT4-MIXED | 32 | 182.8 | 60.3 | 4828.2 | 16.58 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 535.8 | 61.1 | 5325.4 | 16.37 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 541.2 | 61.2 | 5299 | 16.34 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 550.5 | 61.2 | 5286.9 | 16.34 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 114.6 | 61.3 | 5300.5 | 16.31 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 104.9 | 61.9 | 5231.7 | 16.16 |
gpt-j-6b | INT4-MIXED | 1024 | 715.2 | 62 | 5443.4 | 16.13 |
falcon-7b-instruct | INT4-MIXED | 32 | 114.4 | 62.1 | 5113.3 | 16.10 |
phi-4-mini-reasoning | INT8-CW | 32 | 137.9 | 62.4 | 4985 | 16.03 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 718.7 | 62.7 | 5154.4 | 15.95 |
phi-4-mini-instruct | INT8-CW | 32 | 140.5 | 62.7 | 4986.5 | 15.95 |
falcon-7b-instruct | INT4-MIXED | 1024 | 740.1 | 63.2 | 4690.1 | 15.82 |
gemma-3-4b-it | INT4-MIXED | 1024 | 577.1 | 63.4 | 6729.3 | 15.77 |
gemma-3-4b-it | INT4-MIXED | 32 | 206.3 | 63.4 | 5024.3 | 15.77 |
phi-4-mini-reasoning | INT8-CW | 1024 | 458.7 | 63.7 | 5035.9 | 15.70 |
phi-4-mini-instruct | INT8-CW | 1024 | 458.9 | 63.8 | 5060.4 | 15.67 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 709.5 | 63.9 | 4931.5 | 15.65 |
flan-t5-xxl | INT4-MIXED | 33 | 296.5 | 63.9 | 13708.9 | 15.65 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 725.1 | 64 | 5022.1 | 15.63 |
internvl2-4b | INT8-CW | 297 | 371.8 | 64.1 | 6073.4 | 15.60 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 101 | 64.6 | 5443.8 | 15.48 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 98.2 | 64.9 | 5442.1 | 15.41 |
qwen2-7b-instruct | INT4-MIXED | 32 | 118.1 | 65 | 5537.5 | 15.38 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 104.9 | 65 | 5447 | 15.38 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 113.7 | 65.3 | 5306.7 | 15.31 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 653.2 | 65.6 | 5474.1 | 15.24 |
gpt-j-6b | INT4-MIXED | 32 | 190 | 65.8 | 5243.2 | 15.20 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 652.2 | 65.9 | 5468.6 | 15.17 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 640.4 | 65.9 | 5469.1 | 15.17 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 118.1 | 65.9 | 5271.6 | 15.17 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 128.5 | 65.9 | 5425.7 | 15.17 |
gemma-3-4b-it | INT4-MIXED | 1024 | 701.4 | 66.2 | 6928.2 | 15.11 |
whisper-large-v3 | FP16 | 1024 | 605.4 | 66.3 | 5200.1 | 15.08 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 108.8 | 66.4 | 5640.1 | 15.06 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 108 | 66.5 | 5636.3 | 15.04 |
llama-3-8b-instruct | INT4-MIXED | 32 | 112.8 | 66.6 | 5637.1 | 15.02 |
phi-3.5-vision-instruct | INT8-CW | 802 | 611.7 | 66.6 | 6908.6 | 15.02 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 656.8 | 66.8 | 5562.2 | 14.97 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 998.7 | 67.4 | 5059.1 | 14.84 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 984.9 | 67.5 | 5183.8 | 14.81 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 1003.8 | 67.6 | 5090.7 | 14.79 |
minicpm-v-2_6 | INT4-MIXED | 228 | 911.7 | 67.6 | 6679.1 | 14.79 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 112.1 | 67.9 | 6095.5 | 14.73 |
minicpm-o-2_6 | INT4-MIXED | 238 | 910 | 68.1 | 6699.5 | 14.68 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 791.4 | 68.2 | 6580.4 | 14.66 |
gpt-j-6b | INT4-MIXED | 1024 | 951.4 | 68.4 | 5791.6 | 14.62 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 113.3 | 68.4 | 6062 | 14.62 |
internvl2-4b | INT8-CW | 1027 | 795 | 68.4 | 6426.4 | 14.62 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 110.7 | 68.5 | 5993 | 14.60 |
gemma-3-4b-it | INT4-MIXED | 32 | 194.9 | 68.6 | 4971.6 | 14.58 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 722.8 | 68.7 | 5710.2 | 14.56 |
minicpm4-8b | INT4-MIXED | 32 | 110.5 | 68.7 | 5503.7 | 14.56 |
qwen2-7b-instruct | INT4-MIXED | 32 | 91.6 | 69 | 5996 | 14.49 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 721.8 | 69.3 | 5707.8 | 14.43 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 727.5 | 69.3 | 5706.1 | 14.43 |
qwen3-8b | INT4-MIXED | 32 | 146.4 | 69.3 | 5828.2 | 14.43 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 689 | 69.4 | 6106 | 14.41 |
minicpm3-4b | INT4-MIXED | 32 | 317.7 | 69.5 | 3764.5 | 14.39 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 755.7 | 69.6 | 6478.7 | 14.37 |
minicpm4-8b | INT4-MIXED | 1024 | 787.4 | 69.7 | 5498.9 | 14.35 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 900.3 | 69.9 | 5722.3 | 14.31 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 886.8 | 70 | 5734.2 | 14.29 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 896.1 | 70 | 5624.3 | 14.29 |
flan-t5-xxl | INT4-MIXED | 1139 | 321.1 | 70.2 | 14929 | 14.25 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 905.2 | 70.3 | 5631.7 | 14.22 |
llama-3-8b-instruct | INT4-MIXED | 32 | 122.7 | 70.5 | 6107.2 | 14.18 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1590.6 | 70.6 | 8154 | 14.16 |
whisper-large-v3 | FP16 | 32 | 551.2 | 70.7 | 5248.6 | 14.14 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1790 | 70.9 | 8750.6 | 14.10 |
falcon-7b-instruct | INT4-MIXED | 32 | 119 | 71.5 | 5468 | 13.99 |
qwen3-8b | INT4-MIXED | 1024 | 773.6 | 71.7 | 5889.9 | 13.95 |
minicpm-v-2_6 | INT4-MIXED | 228 | 992 | 71.9 | 6931.3 | 13.91 |
minicpm-o-2_6 | INT4-MIXED | 238 | 997.4 | 72.7 | 6860.1 | 13.76 |
falcon-7b-instruct | INT4-MIXED | 1024 | 976.5 | 72.8 | 5105.8 | 13.74 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 990.2 | 73.2 | 5868.2 | 13.66 |
qwen3-8b | INT4-MIXED | 32 | 136.5 | 73.4 | 6403.3 | 13.62 |
gemma-3-4b-it | INT4-MIXED | 1024 | 589.2 | 73.9 | 6873.6 | 13.53 |
minicpm4-8b | INT4-MIXED | 32 | 129.9 | 74 | 6026.1 | 13.51 |
minicpm-v-4_5 | INT4-MIXED | 217 | 987.9 | 74.3 | 7092.6 | 13.46 |
gemma-3-4b-it | INT8-CW | 32 | 195.4 | 74.5 | 6371.2 | 13.42 |
minicpm4-8b | INT4-MIXED | 1024 | 1056.3 | 74.7 | 5670.6 | 13.39 |
qwen3-8b | INT4-MIXED | 1024 | 1038.2 | 75.9 | 6136.9 | 13.18 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 95.8 | 76.2 | 5917.6 | 13.12 |
minicpm3-4b | INT4-MIXED | 32 | 360.4 | 76.4 | 3890.8 | 13.09 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 672 | 77.6 | 5651.2 | 12.89 |
minicpm-v-4_5 | INT4-MIXED | 217 | 1033.9 | 77.7 | 7334.1 | 12.87 |
minicpm3-4b | INT4-MIXED | 32 | 321.5 | 78.2 | 4062.2 | 12.79 |
gemma-7b-it | INT4-MIXED | 32 | 113.7 | 78.6 | 5804.4 | 12.72 |
minicpm3-4b | INT4-MIXED | 1024 | 832.4 | 78.7 | 4366.5 | 12.71 |
gemma-3-4b-it | INT8-CW | 1024 | 607.1 | 79.7 | 8300 | 12.55 |
biomistral-7b-slerp | INT8-CW | 7 | 110.1 | 80.5 | 8559.4 | 12.42 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 114.8 | 81.6 | 5220.3 | 12.25 |
gemma-7b-it | INT4-MIXED | 1024 | 854.1 | 82 | 6099.7 | 12.20 |
minicpm3-4b | INT8-CW | 32 | 346.1 | 82.2 | 5664.8 | 12.17 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 3976.6 | 83.2 | 8722.1 | 12.02 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 133.8 | 83.2 | 7029.4 | 12.02 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 355.2 | 83.3 | 6747 | 12.00 |
gemma-7b-it | INT4-MIXED | 32 | 122.3 | 83.4 | 6492.6 | 11.99 |
phi-4-multimodal-instruct | INT8-CW | 786 | 797.7 | 83.7 | 8287.3 | 11.95 |
phi-4-multimodal-instruct | INT8-CW | 578 | 711.3 | 83.7 | 7914.3 | 11.95 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 145.3 | 83.9 | 7063.8 | 11.92 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 99.9 | 84.2 | 5326.9 | 11.88 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 90.9 | 84.2 | 5831.2 | 11.88 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 105 | 84.3 | 5431.5 | 11.86 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 1655.9 | 84.6 | 10029.6 | 11.82 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 904.2 | 84.7 | 7499.3 | 11.81 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 960.4 | 85 | 7092.1 | 11.76 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 926.6 | 85.2 | 6609.2 | 11.74 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 1853.7 | 85.2 | 10678 | 11.74 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 288.7 | 85.5 | 6469.5 | 11.70 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 952.7 | 85.7 | 6557.9 | 11.67 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 783.1 | 85.8 | 6684.5 | 11.66 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 952.9 | 86.1 | 6706.6 | 11.61 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 700.5 | 86.2 | 5083.9 | 11.60 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 650.9 | 86.2 | 5441.7 | 11.60 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 4458.6 | 86.4 | 9034.3 | 11.57 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 2150.6 | 86.7 | 9445.3 | 11.53 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1946 | 86.7 | 8733.7 | 11.53 |
minicpm-v-2_6 | INT4-MIXED | 228 | 878.4 | 86.7 | 6772.8 | 11.53 |
minicpm3-4b | INT4-MIXED | 1024 | 944.6 | 86.8 | 4420.1 | 11.52 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 715.1 | 86.9 | 4909.6 | 11.51 |
minicpm3-4b | INT4-MIXED | 1024 | 859.7 | 87.1 | 4331.2 | 11.48 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 716 | 87.1 | 4908 | 11.48 |
gemma-7b-it | INT4-MIXED | 1024 | 1126.4 | 87.5 | 6346.7 | 11.43 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 351.1 | 87.6 | 6938.1 | 11.42 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 293.3 | 87.7 | 7160 | 11.40 |
gemma-2-9b-it | INT4-MIXED | 32 | 166.7 | 87.9 | 6266.3 | 11.38 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 1113.9 | 89.1 | 7658.6 | 11.22 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 90.4 | 89.1 | 6910.7 | 11.22 |
llama-3-8b-instruct | INT4-MIXED | 32 | 112.5 | 89.3 | 6208.5 | 11.20 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 109.5 | 89.4 | 5976.8 | 11.19 |
llama-3-8b-instruct | INT4-MIXED | 32 | 110.5 | 89.9 | 6199.6 | 11.12 |
phi-2 | FP16 | 32 | 127.7 | 90 | 7000.7 | 11.11 |
minicpm3-4b | INT8-CW | 1024 | 892.6 | 90.3 | 6200.6 | 11.07 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 1273 | 91.4 | 7041.1 | 10.94 |
stablelm-3b-4e1t | FP16 | 32 | 136 | 91.7 | 6671.3 | 10.91 |
stable-zephyr-3b-dpo | FP16 | 32 | 148.2 | 92 | 7007.2 | 10.87 |
gemma-2-9b-it | INT4-MIXED | 1024 | 982.1 | 92 | 6411.7 | 10.87 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 724.3 | 92.5 | 5694.5 | 10.81 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 721.3 | 92.5 | 5689.5 | 10.81 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 724.7 | 92.7 | 5695.4 | 10.79 |
baichuan2-13b-chat | INT4-MIXED | 32 | 140.4 | 92.7 | 9190.6 | 10.79 |
gemma-2-9b-it | INT4-MIXED | 32 | 173.3 | 93 | 7051.7 | 10.75 |
gpt-j-6b | INT8-CW | 32 | 177.4 | 93 | 7473.9 | 10.75 |
minicpm4-8b | INT4-MIXED | 32 | 107 | 93.9 | 5970.6 | 10.65 |
phi-2 | FP16 | 1024 | 605.6 | 94.5 | 6967.8 | 10.58 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 96.3 | 94.8 | 6988.7 | 10.55 |
minicpm4-8b | INT4-MIXED | 1024 | 791.8 | 96.1 | 5588 | 10.41 |
stable-zephyr-3b-dpo | FP16 | 1024 | 623.4 | 96.3 | 6975.3 | 10.38 |
stablelm-3b-4e1t | FP16 | 1024 | 616.7 | 96.3 | 6882.2 | 10.38 |
gpt-j-6b | INT8-CW | 1024 | 749.8 | 96.7 | 7964 | 10.34 |
gemma-2-9b-it | INT4-MIXED | 1024 | 1282.7 | 96.8 | 6753.7 | 10.33 |
llama-2-7b-chat-hf | INT8-CW | 32 | 133.7 | 97.8 | 8213.7 | 10.22 |
baichuan2-13b-chat | INT4-MIXED | 1024 | 2432.6 | 99.8 | 12861.6 | 10.02 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec |
|---|---|---|---|---|---|---|
t5-small | FP16 | 1024 | 10.1 | 5.2 | 1241.1 | 192.31 |
t5-small | INT4-MIXED | 1024 | 11.5 | 5.2 | 1083.7 | 192.31 |
t5-small | INT4-MIXED | 1024 | 11.8 | 5.3 | 1071.5 | 188.68 |
t5-small | INT8-CW | 1024 | 12 | 5.3 | 1120.9 | 188.68 |
t5-small | INT4-MIXED | 1024 | 11.8 | 5.4 | 1086.6 | 185.19 |
t5-small | INT4-MIXED | 32 | 10.7 | 5.5 | 933.7 | 181.82 |
t5-small | INT4-MIXED | 32 | 11.2 | 5.6 | 946.9 | 178.57 |
t5-small | INT4-MIXED | 32 | 11 | 5.6 | 922.1 | 178.57 |
t5-small | INT8-CW | 32 | 11.4 | 5.6 | 971.3 | 178.57 |
t5-small | FP16 | 32 | 10.3 | 6 | 1090.7 | 166.67 |
distil-large-v2 | INT4-MIXED | 1024 | 210.2 | 6 | 1573.9 | 166.67 |
distil-large-v2 | INT4-MIXED | 32 | 163.7 | 6 | 1542.2 | 166.67 |
whisper-large-v3-turbo | INT8-CW | 1024 | 222.8 | 6.3 | 1945.7 | 158.73 |
whisper-large-v3-turbo | INT4-MIXED | 1024 | 223.2 | 6.4 | 1672.6 | 156.25 |
whisper-large-v3-turbo | INT4-MIXED | 32 | 176 | 6.4 | 1640.5 | 156.25 |
whisper-large-v3-turbo | INT8-CW | 32 | 175.1 | 6.4 | 1913.2 | 156.25 |
minicpm4-0.5b | INT4-MIXED | 32 | 28.2 | 7.2 | 1075.4 | 138.89 |
gemma-3-270m | INT4-MIXED | 32 | 28.2 | 7.3 | 1130.4 | 136.99 |
minicpm4-0.5b | INT4-MIXED | 32 | 28 | 7.3 | 1172.6 | 136.99 |
distil-large-v2 | INT8-CW | 1024 | 217.9 | 7.3 | 1829.7 | 136.99 |
minicpm4-0.5b | INT4-MIXED | 1024 | 44.1 | 7.4 | 1127.7 | 135.14 |
gemma-3-270m | INT8-CW | 32 | 30.4 | 7.4 | 1164.5 | 135.14 |
gemma-3-270m | INT4-MIXED | 1024 | 33.6 | 7.5 | 1209 | 133.33 |
minicpm4-0.5b | INT4-MIXED | 1024 | 47 | 7.5 | 1200.7 | 133.33 |
gemma-3-270m | INT8-CW | 1024 | 34.6 | 7.5 | 1224.4 | 133.33 |
distil-large-v2 | INT8-CW | 32 | 168.2 | 7.5 | 1797.2 | 133.33 |
minicpm4-0.5b | INT8-CW | 32 | 29.8 | 7.6 | 1338.9 | 131.58 |
minicpm4-0.5b | INT4-MIXED | 32 | 29.4 | 7.7 | 1207.3 | 129.87 |
minicpm4-0.5b | INT4-MIXED | 1024 | 52.9 | 7.9 | 1260.4 | 126.58 |
minicpm4-0.5b | INT8-CW | 1024 | 51.6 | 7.9 | 1387.5 | 126.58 |
distil-large-v2 | FP16 | 1024 | 207.7 | 8.2 | 2612.5 | 121.95 |
whisper-large-v3-turbo | FP16 | 1024 | 219.6 | 8.2 | 2647.9 | 121.95 |
distil-large-v2 | FP16 | 32 | 161.4 | 8.2 | 2580.9 | 121.95 |
whisper-large-v3-turbo | FP16 | 32 | 170.7 | 8.2 | 2617.9 | 121.95 |
gemma-3-270m | FP16 | 32 | 24.6 | 8.3 | 1406.2 | 120.48 |
gemma-3-270m | FP16 | 1024 | 30.5 | 8.4 | 1496.9 | 119.05 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 20.7 | 9.4 | 1565.7 | 106.38 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 23.1 | 9.6 | 1585.4 | 104.17 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 83.9 | 10 | 1657.4 | 100 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 94.9 | 10.1 | 1687.6 | 99.01 |
whisper-small | INT4-MIXED | 32 | 93.3 | 10.2 | 1342.4 | 98.04 |
whisper-small | INT4-MIXED | 1024 | 138.1 | 10.4 | 1374.8 | 96.15 |
whisper-small | INT4-MIXED | 1024 | 136.8 | 10.5 | 1390.8 | 95.24 |
whisper-small | INT8-CW | 1024 | 141.5 | 10.6 | 1481 | 94.34 |
whisper-small | FP16 | 1024 | 128.8 | 10.8 | 1625.5 | 92.59 |
whisper-small | FP16 | 32 | 86 | 10.8 | 1593.2 | 92.59 |
whisper-small | INT4-MIXED | 32 | 96.1 | 10.8 | 1359.1 | 92.59 |
whisper-small | INT4-MIXED | 1024 | 141.2 | 10.9 | 1424.6 | 91.74 |
whisper-small | INT8-CW | 32 | 97.2 | 10.9 | 1448.7 | 91.74 |
whisper-small | INT4-MIXED | 32 | 96.8 | 11.2 | 1394.3 | 89.29 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 35.1 | 11.2 | 1921.3 | 89.29 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 35.1 | 11.2 | 1787.9 | 89.29 |
gemma-3-1b-it | INT4-MIXED | 32 | 37.1 | 11.3 | 1552.3 | 88.5 |
minicpm4-0.5b | FP16 | 32 | 25.2 | 11.4 | 1740.4 | 87.72 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 33.5 | 11.5 | 1747.7 | 86.96 |
gemma-3-1b-it | INT4-MIXED | 1024 | 83 | 11.6 | 1658 | 86.21 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 33.8 | 11.6 | 2062.1 | 86.21 |
gemma-3-1b-it | INT4-MIXED | 32 | 40.5 | 11.7 | 1689.6 | 85.47 |
minicpm4-0.5b | FP16 | 1024 | 56.6 | 11.8 | 1787.2 | 84.75 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 123.8 | 11.9 | 1879.8 | 84.03 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 123 | 12 | 2009.9 | 83.33 |
gemma-3-1b-it | INT4-MIXED | 1024 | 90.7 | 12 | 1800.9 | 83.33 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 134.5 | 12.2 | 1901.6 | 81.97 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 134.8 | 12.3 | 2221.6 | 81.3 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 36.5 | 12.5 | 1858.5 | 80 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 141.2 | 13.2 | 2023.2 | 75.76 |
gemma-3-1b-it | INT8-CW | 32 | 40 | 13.5 | 1890.8 | 74.07 |
gemma-3-1b-it | INT8-CW | 1024 | 93.7 | 14.3 | 1981.7 | 69.93 |
llama-3.2-1b-instruct | INT8-CW | 32 | 22.9 | 14.4 | 2083.3 | 69.44 |
smolvlm2-256m-video-instruct | INT4-MIXED | 1141 | 385.3 | 14.7 | 2877.5 | 68.03 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 37.2 | 14.9 | 2405 | 67.11 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 101 | 15.2 | 2199.2 | 65.79 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 150 | 15.6 | 2542 | 64.1 |
smolvlm2-256m-video-instruct | FP16 | 1141 | 376.9 | 16 | 3507.1 | 62.5 |
smolvlm2-256m-video-instruct | INT8-CW | 1141 | 385.3 | 17.1 | 2955.8 | 58.48 |
phi-2 | INT4-MIXED | 32 | 51.1 | 17.6 | 2380.5 | 56.82 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 37.6 | 17.8 | 2557.2 | 56.18 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 38 | 17.8 | 2333 | 56.18 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 51 | 17.9 | 2373.7 | 55.87 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 50.9 | 17.9 | 2374.2 | 55.87 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 37.5 | 18.1 | 2326.4 | 55.25 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 129 | 18.6 | 2652.9 | 53.76 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 129.5 | 18.6 | 2426.7 | 53.76 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 130.6 | 19 | 2429.7 | 52.63 |
phi-2 | INT4-MIXED | 32 | 53.1 | 19.4 | 2637.3 | 51.55 |
phi-2 | INT4-MIXED | 1024 | 269.4 | 19.8 | 2915.8 | 50.51 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 43.9 | 19.8 | 2499.2 | 50.51 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 263.6 | 19.9 | 2892.7 | 50.25 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 264.6 | 20 | 2893.3 | 50 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 59.1 | 20.3 | 2648.9 | 49.26 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 44.7 | 20.4 | 2593.3 | 49.02 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 35.4 | 20.9 | 2688.6 | 47.85 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 191.1 | 21.3 | 2630.8 | 46.95 |
phi-2 | INT4-MIXED | 1024 | 334.4 | 21.5 | 3167.1 | 46.51 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 36 | 21.6 | 2682.9 | 46.3 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 49.9 | 21.6 | 2863.6 | 46.3 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 203.2 | 21.7 | 2729.2 | 46.08 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 37.6 | 21.8 | 2825.1 | 45.87 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 59.2 | 21.9 | 2768.3 | 45.66 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 204.6 | 22.4 | 2922.2 | 44.64 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 303.2 | 22.5 | 3163.8 | 44.44 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 43 | 22.6 | 2924.7 | 44.25 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 40.2 | 22.6 | 2826.3 | 44.25 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 40.6 | 22.6 | 2826 | 44.25 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 347.2 | 23 | 2992.1 | 43.48 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 214.6 | 23.1 | 2892 | 43.29 |
gemma-3-1b-it | FP16 | 32 | 38.6 | 23.3 | 2804 | 42.92 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 243.3 | 23.4 | 3045.1 | 42.74 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 39.3 | 23.4 | 3019 | 42.74 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 39.7 | 23.4 | 2934.4 | 42.74 |
gemma-3-1b-it | FP16 | 1024 | 101.9 | 23.8 | 2955.3 | 42.02 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 40.9 | 23.8 | 3095.1 | 42.02 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 293.3 | 24.2 | 3302.8 | 41.32 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 77.9 | 24.3 | 2772.6 | 41.15 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 179 | 25.2 | 2905.3 | 39.68 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 312.2 | 25.3 | 3555.4 | 39.53 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 313.9 | 25.4 | 3456.9 | 39.37 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 313.2 | 25.4 | 3456.4 | 39.37 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 40.6 | 25.4 | 3249.5 | 39.37 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 44.4 | 25.6 | 3258.6 | 39.06 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 321.1 | 26 | 3630.2 | 38.46 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 320.3 | 26.2 | 3568 | 38.17 |
internvl2-4b | INT4-MIXED | 297 | 257.2 | 26.5 | 4284.6 | 37.74 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 410.7 | 26.6 | 3710.3 | 37.59 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 56.1 | 26.8 | 3033.1 | 37.31 |
afm-4.5b | INT4-MIXED | 32 | 45.8 | 26.9 | 3687.3 | 37.17 |
phi-4-mini-instruct | INT4-MIXED | 32 | 54.6 | 26.9 | 3132.7 | 37.17 |
phi-4-mini-instruct | INT4-MIXED | 32 | 56 | 27.5 | 3217 | 36.36 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 54.8 | 27.6 | 3214.8 | 36.23 |
internvl2-4b | INT4-MIXED | 297 | 255.6 | 27.8 | 4409.4 | 35.97 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 57 | 27.8 | 3273.6 | 35.97 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 446.2 | 28.2 | 3864.6 | 35.46 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 475.6 | 28.2 | 3892.1 | 35.46 |
afm-4.5b | INT4-MIXED | 1024 | 318.2 | 28.5 | 3878.3 | 35.09 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 321.8 | 28.6 | 3314.2 | 34.97 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 319.6 | 28.8 | 3413.5 | 34.72 |
whisper-large-v3 | INT4-MIXED | 32 | 281.9 | 29 | 2986.4 | 34.48 |
gemma-3-4b-it | INT4-MIXED | 32 | 91.1 | 29.2 | 4496.5 | 34.25 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 327 | 29.3 | 3503.6 | 34.13 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 328 | 29.3 | 3499.2 | 34.13 |
internvl2-4b | INT4-MIXED | 1027 | 611.1 | 29.4 | 5603.2 | 34.01 |
phi-4-mini-instruct | INT4-MIXED | 32 | 62.4 | 29.4 | 3431.4 | 34.01 |
llama-3.2-1b-instruct | FP16 | 32 | 34.3 | 29.5 | 3157.9 | 33.9 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 352.5 | 29.6 | 3543.1 | 33.78 |
whisper-large-v3 | INT4-MIXED | 1024 | 331.9 | 29.8 | 3018.8 | 33.56 |
llama-3.2-1b-instruct | FP16 | 1024 | 129.4 | 30.1 | 3282.3 | 33.22 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 525.8 | 30.1 | 5356 | 33.22 |
gemma-3-4b-it | INT4-MIXED | 32 | 91.4 | 30.2 | 4559 | 33.11 |
gemma-3-4b-it | INT4-MIXED | 32 | 94.7 | 30.4 | 4648.3 | 32.89 |
whisper-large-v3 | INT8-CW | 32 | 285.1 | 30.4 | 3575.6 | 32.89 |
internvl2-4b | INT4-MIXED | 1027 | 619.4 | 30.7 | 5564.1 | 32.57 |
phi-2 | INT8-CW | 32 | 57.3 | 30.7 | 3590.7 | 32.57 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 57.3 | 30.8 | 3598.6 | 32.47 |
stablelm-3b-4e1t | INT8-CW | 32 | 56.8 | 30.8 | 3598 | 32.47 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 616.4 | 30.9 | 5747.6 | 32.36 |
gemma-3-4b-it | INT4-MIXED | 1024 | 418.3 | 30.9 | 6833.3 | 32.36 |
whisper-large-v3 | INT8-CW | 1024 | 335.6 | 31 | 3608.1 | 32.26 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 412.1 | 31.2 | 3719.6 | 32.05 |
gemma-3-4b-it | INT4-MIXED | 1024 | 433.9 | 31.9 | 6870.8 | 31.35 |
gemma-3-4b-it | INT4-MIXED | 1024 | 456.1 | 32.2 | 6944.3 | 31.06 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 36.8 | 32.7 | 4206.2 | 30.58 |
qwen2.5-1.5b-instruct | FP16 | 32 | 37.4 | 32.7 | 3760.8 | 30.58 |
phi-2 | INT8-CW | 1024 | 295.4 | 33 | 4124.2 | 30.3 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 300.6 | 33 | 4109.1 | 30.3 |
stablelm-3b-4e1t | INT8-CW | 1024 | 299.4 | 33 | 4108.8 | 30.3 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 49.2 | 33 | 3938.6 | 30.3 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 142.9 | 33.4 | 4330.3 | 29.94 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 142.3 | 33.4 | 3902.2 | 29.94 |
gpt-j-6b | INT4-MIXED | 32 | 68.8 | 34.3 | 4086.5 | 29.15 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 232.3 | 34.5 | 4056.5 | 28.99 |
gpt-oss-20b | INT4-MIXED | 32 | 205.1 | 34.8 | 13043.3 | 28.74 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 488.3 | 35 | 5791.7 | 28.57 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 44.6 | 35 | 4386.6 | 28.57 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 583 | 35.2 | 6813 | 28.41 |
flan-t5-xxl | INT4-MIXED | 33 | 58.5 | 35.7 | 12841.3 | 28.01 |
whisper-large-v3 | FP16 | 32 | 264.3 | 35.8 | 4801.7 | 27.93 |
whisper-large-v3 | FP16 | 1024 | 313.1 | 36 | 4831.8 | 27.78 |
llama-3.2-3b-instruct | INT8-CW | 32 | 44.6 | 36.3 | 4041.3 | 27.55 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 53.2 | 36.4 | 4482 | 27.47 |
gpt-oss-20b | INT4-MIXED | 1024 | 617.9 | 36.5 | 13270.2 | 27.4 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1215.4 | 36.7 | 8440 | 27.25 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1355.5 | 36.9 | 9162.2 | 27.1 |
biomistral-7b-slerp | INT4-MIXED | 7 | 40.9 | 37 | 4532.7 | 27.03 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 48.6 | 37.1 | 4547.3 | 26.95 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 48.7 | 37.5 | 4445.1 | 26.67 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 253.8 | 37.8 | 4271.3 | 26.46 |
gpt-j-6b | INT4-MIXED | 1024 | 437.3 | 38 | 5244.8 | 26.32 |
minicpm3-4b | INT4-MIXED | 32 | 166 | 38.3 | 3267.7 | 26.11 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 395.1 | 38.5 | 5103.6 | 25.97 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 57.3 | 38.5 | 4656.4 | 25.97 |
falcon-7b-instruct | INT4-MIXED | 32 | 51.1 | 38.6 | 4287.7 | 25.91 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 56.4 | 38.6 | 4741.8 | 25.91 |
minicpm3-4b | INT4-MIXED | 32 | 170.1 | 38.9 | 3440.6 | 25.71 |
biomistral-7b-slerp | INT4-MIXED | 7 | 43.5 | 39.3 | 4837.6 | 25.45 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 409.5 | 39.4 | 4852.2 | 25.38 |
gpt-j-6b | INT4-MIXED | 32 | 85.3 | 39.4 | 4591.2 | 25.38 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 56.9 | 39.5 | 4762.1 | 25.32 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 59 | 39.5 | 4860.3 | 25.32 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 58.7 | 39.8 | 4855 | 25.13 |
flan-t5-xxl | INT4-MIXED | 1139 | 182.2 | 40 | 14929.6 | 25 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 420.1 | 40 | 4750.5 | 25 |
minicpm3-4b | INT4-MIXED | 1024 | 716.5 | 40.2 | 4449.7 | 24.88 |
falcon-7b-instruct | INT4-MIXED | 1024 | 437.6 | 40.3 | 4435.2 | 24.81 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 421.6 | 40.4 | 5209.4 | 24.75 |
minicpm3-4b | INT4-MIXED | 32 | 168.7 | 40.4 | 3554.8 | 24.75 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 52.1 | 40.5 | 5017.2 | 24.69 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 52.3 | 40.7 | 5017.3 | 24.57 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 51.6 | 40.7 | 5017.7 | 24.57 |
minicpm3-4b | INT4-MIXED | 1024 | 735.7 | 40.8 | 4643.9 | 24.51 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 52.6 | 40.9 | 5222.6 | 24.45 |
llama-3-8b-instruct | INT4-MIXED | 32 | 56.3 | 40.9 | 5224.6 | 24.45 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 552.2 | 41 | 6789.7 | 24.39 |
qwen2-7b-instruct | INT4-MIXED | 32 | 51.3 | 41 | 5016.3 | 24.39 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 55.4 | 41.1 | 5224.9 | 24.33 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 43.1 | 41.1 | 6711.7 | 24.33 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 682 | 41.2 | 7263.8 | 24.27 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 432.9 | 41.6 | 4938.8 | 24.04 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 449.3 | 41.7 | 5024.4 | 23.98 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 56.1 | 41.8 | 5213.4 | 23.92 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 507.1 | 41.9 | 5058.1 | 23.87 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 56.4 | 41.9 | 5213.8 | 23.87 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 391.5 | 42 | 5241.3 | 23.81 |
minicpm3-4b | INT4-MIXED | 1024 | 764.2 | 42 | 4754.4 | 23.81 |
phi-3.5-mini-instruct | INT8-CW | 32 | 55 | 42 | 4559.3 | 23.81 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 383.3 | 42.1 | 5239.5 | 23.75 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 55 | 42.1 | 4651.2 | 23.75 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 503.9 | 42.2 | 5154.9 | 23.7 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 390.3 | 42.2 | 5240.3 | 23.7 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 58.9 | 42.2 | 5417.2 | 23.7 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 55.1 | 42.2 | 4646.7 | 23.7 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 56.6 | 42.3 | 5422.3 | 23.64 |
qwen2-7b-instruct | INT4-MIXED | 32 | 56.7 | 42.3 | 5330.2 | 23.64 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 60.8 | 42.4 | 5519.2 | 23.58 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 56.9 | 42.4 | 5419.4 | 23.58 |
llama-3-8b-instruct | INT4-MIXED | 32 | 60.8 | 42.5 | 5427.2 | 23.53 |
llama-3-8b-instruct | INT4-MIXED | 32 | 60 | 42.5 | 5426.2 | 23.53 |
qwen3-8b | INT4-MIXED | 32 | 58 | 42.5 | 5415.5 | 23.53 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1542.7 | 42.7 | 9587.1 | 23.42 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 388.5 | 42.7 | 5238.5 | 23.42 |
minicpm4-8b | INT4-MIXED | 32 | 62.6 | 42.7 | 5166.6 | 23.42 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1380.1 | 43 | 8862.7 | 23.26 |
gpt-j-6b | INT4-MIXED | 1024 | 608.5 | 43 | 5754.2 | 23.26 |
falcon-7b-instruct | INT4-MIXED | 32 | 59.4 | 43 | 5001.5 | 23.26 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 413.6 | 43.2 | 5529.3 | 23.15 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 552.4 | 43.3 | 5139.7 | 23.09 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 404.3 | 43.3 | 5414.2 | 23.09 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 410.4 | 43.3 | 5440.4 | 23.09 |
llama-3-8b-instruct | INT4-MIXED | 32 | 61.3 | 43.3 | 5538.2 | 23.09 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 418.6 | 43.4 | 5529.9 | 23.04 |
minicpm-o-2_6 | INT4-MIXED | 238 | 622.9 | 43.4 | 6276.5 | 23.04 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 415.7 | 43.5 | 5530 | 22.99 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 477 | 43.8 | 5634.8 | 22.83 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 472.5 | 43.8 | 5542.9 | 22.83 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 469.8 | 43.9 | 5630.2 | 22.78 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 500 | 44 | 5632.1 | 22.73 |
phi-4-mini-instruct | INT8-CW | 32 | 63.2 | 44 | 4667.7 | 22.73 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 44.3 | 44.2 | 7286 | 22.62 |
minicpm-v-2_6 | INT4-MIXED | 228 | 629 | 44.4 | 6264.2 | 22.52 |
phi-4-mini-reasoning | INT8-CW | 32 | 62.1 | 44.4 | 4580.8 | 22.52 |
minicpm4-8b | INT4-MIXED | 32 | 69 | 44.7 | 5379.5 | 22.37 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 431.5 | 44.7 | 5178 | 22.37 |
minicpm4-8b | INT4-MIXED | 1024 | 455.2 | 44.8 | 5343 | 22.32 |
minicpm-v-2_6 | INT4-MIXED | 228 | 639.4 | 44.8 | 6443.9 | 22.32 |
qwen3-8b | INT4-MIXED | 32 | 64.4 | 44.8 | 5816.3 | 22.32 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 430.6 | 44.9 | 5274.5 | 22.27 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 432 | 45 | 5270.3 | 22.22 |
falcon-7b-instruct | INT4-MIXED | 1024 | 528 | 45.2 | 5172.9 | 22.12 |
qwen3-8b | INT4-MIXED | 1024 | 445.9 | 45.3 | 5724.8 | 22.08 |
minicpm-o-2_6 | INT4-MIXED | 238 | 639.6 | 45.4 | 6695.4 | 22.03 |
minicpm-v-2_6 | INT4-MIXED | 228 | 658.4 | 45.5 | 6561.8 | 21.98 |
gemma-3-4b-it | INT8-CW | 32 | 98.9 | 45.5 | 6007.1 | 21.98 |
minicpm4-8b | INT4-MIXED | 32 | 69.1 | 45.6 | 5512.7 | 21.93 |
afm-4.5b | INT8-CW | 32 | 57.5 | 45.7 | 5297.5 | 21.88 |
phi-4-mini-instruct | INT8-CW | 1024 | 376.2 | 45.8 | 4942.6 | 21.83 |
internvl2-4b | INT8-CW | 297 | 251.2 | 46 | 5803.2 | 21.74 |
phi-4-mini-reasoning | INT8-CW | 1024 | 375.8 | 46.1 | 4848.5 | 21.69 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 446 | 46.2 | 5737.6 | 21.65 |
minicpm-v-4_5 | INT4-MIXED | 217 | 647.9 | 46.2 | 6768.9 | 21.65 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 442.3 | 46.3 | 5829.4 | 21.6 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 439.7 | 46.8 | 5739.3 | 21.37 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 527.7 | 47 | 5834.6 | 21.28 |
gemma-3-4b-it | INT8-CW | 1024 | 464.4 | 47.2 | 8348.9 | 21.19 |
qwen3-8b | INT4-MIXED | 1024 | 537.1 | 47.5 | 6111.9 | 21.05 |
afm-4.5b | INT8-CW | 1024 | 326.4 | 47.5 | 5483 | 21.05 |
phi-3.5-vision-instruct | INT8-CW | 802 | 532 | 47.8 | 6689.2 | 20.92 |
minicpm4-8b | INT4-MIXED | 1024 | 504.6 | 48.4 | 5532.1 | 20.66 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 634.5 | 48.7 | 6969.2 | 20.53 |
minicpm4-8b | INT4-MIXED | 1024 | 569.1 | 48.8 | 5676.8 | 20.49 |
internvl2-4b | INT8-CW | 1027 | 646.8 | 48.8 | 6905.5 | 20.49 |
minicpm-v-4_5 | INT4-MIXED | 217 | 662.3 | 49 | 7055.3 | 20.41 |
gemma-7b-it | INT4-MIXED | 32 | 70.1 | 49.1 | 5455.1 | 20.37 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 110.9 | 50.7 | 6147.9 | 19.72 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 182.9 | 51 | 6161.5 | 19.61 |
gemma-7b-it | INT4-MIXED | 32 | 75 | 51.5 | 5890.3 | 19.42 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 650.6 | 52.2 | 7559.2 | 19.16 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 115.8 | 52.9 | 6278.6 | 18.9 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 3044.9 | 53.5 | 8946.5 | 18.69 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 114.4 | 53.8 | 6507.7 | 18.59 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 188.1 | 54.1 | 6460.2 | 18.48 |
gemma-7b-it | INT4-MIXED | 1024 | 523.7 | 54.2 | 6170.3 | 18.45 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 737.6 | 54.6 | 7727.8 | 18.32 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 75.4 | 54.8 | 6507.3 | 18.25 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 75.2 | 54.8 | 6504 | 18.25 |
phi-4-multimodal-instruct | INT8-CW | 578 | 533.9 | 54.8 | 7575.7 | 18.25 |
minicpm3-4b | INT8-CW | 32 | 184.2 | 55 | 5258.3 | 18.18 |
phi-4-multimodal-instruct | INT8-CW | 786 | 641.3 | 55.1 | 8485.8 | 18.15 |
gemma-2-9b-it | INT4-MIXED | 32 | 71.9 | 55.3 | 5914.8 | 18.08 |
phi-2 | FP16 | 32 | 67.2 | 55.5 | 6239.6 | 18.02 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 1311.7 | 56.2 | 10174.5 | 17.79 |
stablelm-3b-4e1t | FP16 | 32 | 67.5 | 56.3 | 6256.7 | 17.76 |
stable-zephyr-3b-dpo | FP16 | 32 | 66.4 | 56.4 | 6267.4 | 17.73 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 620.7 | 56.4 | 6729.9 | 17.73 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 3257.8 | 56.5 | 9086.1 | 17.7 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 1466.7 | 56.5 | 10906.7 | 17.7 |
gemma-7b-it | INT4-MIXED | 1024 | 748 | 57.4 | 6593.8 | 17.42 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 804.7 | 57.4 | 6867.4 | 17.42 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 888.6 | 58.4 | 7093.1 | 17.12 |
gemma-2-9b-it | INT4-MIXED | 32 | 79.1 | 58.7 | 6155.2 | 17.04 |
gemma-2-9b-it | INT4-MIXED | 32 | 79.6 | 58.8 | 6397.3 | 17.01 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 602.5 | 58.9 | 6810.2 | 16.98 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 619.5 | 58.9 | 6806.8 | 16.98 |
phi-2 | FP16 | 1024 | 395.5 | 60.1 | 6981.6 | 16.64 |
stable-zephyr-3b-dpo | FP16 | 1024 | 347.4 | 60.1 | 6991.6 | 16.64 |
stablelm-3b-4e1t | FP16 | 1024 | 342.9 | 60.3 | 7010.2 | 16.58 |
minicpm3-4b | INT8-CW | 1024 | 768.7 | 60.8 | 6424.3 | 16.45 |
gemma-2-9b-it | INT4-MIXED | 1024 | 703.7 | 61.8 | 6523.4 | 16.18 |
stable-diffusion-xl-1.0-inpainting-0.1 | FP16 | 32 | 64.8 | 63 | 9176.7 | 15.87 |
gemma-2-9b-it | INT4-MIXED | 1024 | 808.1 | 64.6 | 7013.1 | 15.48 |
gemma-2-9b-it | INT4-MIXED | 1024 | 780.9 | 65 | 6773.2 | 15.38 |
llama-3.2-3b-instruct | FP16 | 32 | 74 | 65.8 | 6979.5 | 15.2 |
llama-3.2-3b-instruct | FP16 | 1024 | 362.3 | 68.6 | 7308.8 | 14.58 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 95.7 | 69.2 | 7415.2 | 14.45 |
qwen2.5-coder-3b-instruct | FP16 | 32 | 78.3 | 72 | 6728.3 | 13.89 |
flan-t5-xxl | INT8-CW | 33 | 180.7 | 72.9 | 22597.6 | 13.72 |
qwen2.5-coder-3b-instruct | FP16 | 1024 | 284.5 | 73.3 | 6902.6 | 13.64 |
gemma-3-12b-it | INT4-MIXED | 32 | 149.3 | 73.5 | 8987.5 | 13.61 |
falcon-7b-instruct | INT8-CW | 32 | 91.3 | 73.6 | 7547.6 | 13.59 |
qwen2.5-7b-instruct | INT8-CW | 32 | 95.3 | 74.3 | 8139.5 | 13.46 |
qwen2.5-7b-instruct-1m | INT8-CW | 32 | 94.9 | 74.3 | 8139.8 | 13.46 |
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 95.8 | 74.4 | 8228.3 | 13.44 |
qwen2-7b-instruct | INT8-CW | 32 | 96.8 | 74.4 | 8140.1 | 13.44 |
gpt-j-6b | INT8-CW | 32 | 94.8 | 74.7 | 6886.4 | 13.39 |
ltx-video | INT4-MIXED | 11 | 75.4 | 74.8 | 6586.6 | 13.37 |
qwen2.5-7b-instruct | INT8-CW | 32 | 95.9 | 74.8 | 8229.7 | 13.37 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 992.4 | 75.1 | 8509.7 | 13.32 |
llama-2-7b-chat-hf | INT8-CW | 32 | 90.2 | 75.4 | 7490.1 | 13.26 |
phi-3-mini-4k-instruct | FP16 | 32 | 86.3 | 76.3 | 8325.4 | 13.11 |
falcon-7b-instruct | INT8-CW | 1024 | 819.8 | 76.4 | 7692.2 | 13.09 |
qwen2.5-7b-instruct-1m | INT8-CW | 1024 | 576.6 | 76.4 | 8355.9 | 13.09 |
deepseek-r1-distill-qwen-7b | INT8-CW | 1024 | 584.8 | 76.5 | 8447.6 | 13.07 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 596.2 | 76.5 | 8353.7 | 13.07 |
minicpm-o-2_6 | INT8-CW | 238 | 632.3 | 76.6 | 9462.5 | 13.05 |
minicpm-v-2_6 | INT8-CW | 228 | 686.2 | 76.6 | 9359.2 | 13.05 |
qwen2-7b-instruct | INT8-CW | 1024 | 579 | 76.7 | 8355.8 | 13.04 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 564.7 | 76.7 | 8447.7 | 13.04 |
phi-3-mini-128k-instruct | FP16 | 32 | 85.9 | 76.9 | 8342.4 | 13 |
phi-3.5-mini-instruct | FP16 | 32 | 86.8 | 77.1 | 8244.7 | 12.97 |
phi-4 | INT4-MIXED | 32 | 112.7 | 77.1 | 8501.3 | 12.97 |
phi-4-reasoning | INT4-MIXED | 32 | 112.1 | 77.1 | 8412.9 | 12.97 |
ltx-video | INT8-CW | 11 | 77.8 | 77.1 | 9485.6 | 12.97 |
gemma-3-12b-it | INT4-MIXED | 32 | 178.1 | 77.3 | 9319.5 | 12.94 |
gemma-3-12b-it | INT4-MIXED | 1024 | 1159.1 | 77.7 | 11709.2 | 12.87 |
gpt-j-6b | INT8-CW | 1024 | 601.5 | 78.2 | 8019.7 | 12.79 |
flan-t5-xxl | INT8-CW | 1139 | 332.7 | 78.4 | 24785.2 | 12.76 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 106.8 | 78.8 | 8753.7 | 12.69 |
qwen1.5-14b-chat | INT4-MIXED | 32 | 115.9 | 79.3 | 9158 | 12.61 |
phi-4-mini-instruct | FP16 | 32 | 92.3 | 79.7 | 8296.6 | 12.55 |
phi-4-mini-reasoning | FP16 | 32 | 91.6 | 79.7 | 8200.7 | 12.55 |
llama-2-7b-chat-hf | INT8-CW | 1024 | 586 | 79.7 | 8192.9 | 12.55 |
baichuan2-13b-chat | INT4-MIXED | 32 | 112.9 | 80.2 | 8899.9 | 12.47 |
mistral-7b-instruct-v0.3 | INT8-CW | 32 | 96.9 | 80.3 | 7790.5 | 12.45 |
mistral-7b-instruct-v0.1 | INT8-CW | 32 | 100.6 | 80.7 | 7788.6 | 12.39 |
internvl2-4b | FP16 | 297 | 323.6 | 80.8 | 9541.4 | 12.38 |
phi-3-mini-128k-instruct | FP16 | 1024 | 570.3 | 81 | 9213.8 | 12.35 |
phi-3-mini-4k-instruct | FP16 | 1024 | 576.1 | 81 | 9206.4 | 12.35 |
biomistral-7b-slerp | INT8-CW | 7 | 88.1 | 81 | 7773.4 | 12.35 |
phi-4-reasoning | INT4-MIXED | 1024 | 1100.5 | 81.1 | 8885.2 | 12.33 |
phi-4 | INT4-MIXED | 32 | 118.2 | 81.1 | 9114 | 12.33 |
phi-4 | INT4-MIXED | 1024 | 1108.5 | 81.2 | 8973.9 | 12.32 |
phi-3.5-mini-instruct | FP16 | 1024 | 564.6 | 81.4 | 9112.8 | 12.29 |
phi-4-mini-instruct | FP16 | 1024 | 555.4 | 81.7 | 8689.8 | 12.24 |
mistral-7b-instruct-v0.2 | INT8-CW | 32 | 97.3 | 81.7 | 7883.7 | 12.24 |
phi-3.5-vision-instruct | FP16 | 802 | 740.6 | 81.8 | 10290.1 | 12.22 |
gemma-3-12b-it | INT4-MIXED | 1024 | 1244.9 | 81.8 | 11963.2 | 12.22 |
phi-4-mini-reasoning | FP16 | 1024 | 548.7 | 82.1 | 8597.9 | 12.18 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 115.1 | 82.9 | 9447.4 | 12.06 |
phi-3.5-vision-instruct | FP16 | 1032 | 924.7 | 83.1 | 10549.5 | 12.03 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 1022.6 | 83.1 | 9148.2 | 12.03 |
mistral-7b-instruct-v0.3 | INT8-CW | 1024 | 644.3 | 83.5 | 8096.8 | 11.98 |
llama-3-8b-instruct | INT8-CW | 32 | 102.1 | 83.6 | 8564.3 | 11.96 |
internvl2-4b | FP16 | 1027 | 884.1 | 83.7 | 10393.5 | 11.95 |
qwen2.5-vl-7b-instruct | INT8-CW | 32 | 215.2 | 83.8 | 9272.9 | 11.93 |
mistral-7b-instruct-v0.1 | INT8-CW | 1025 | 661.9 | 83.9 | 8080.1 | 11.92 |
deepseek-r1-distill-llama-8b | INT8-CW | 32 | 101 | 84.3 | 8563.4 | 11.86 |
mistral-7b-instruct-v0.2 | INT8-CW | 1024 | 620 | 85.1 | 8181.9 | 11.75 |
llama-3.1-8b-instruct | INT8-CW | 32 | 101.9 | 85.1 | 8564.2 | 11.75 |
phi-4 | INT4-MIXED | 1024 | 1282.1 | 85.3 | 9432.2 | 11.72 |
qwen2.5-vl-7b-instruct | INT8-CW | 1024 | 771.8 | 85.3 | 10690.7 | 11.72 |
phi-4-reasoning | INT4-MIXED | 32 | 124.2 | 85.8 | 9629.6 | 11.66 |
baichuan2-13b-chat | INT4-MIXED | 1024 | 2744.9 | 85.9 | 13038.5 | 11.64 |
llama-3-8b-instruct | INT8-CW | 1024 | 604.5 | 86.3 | 8868.6 | 11.59 |
qwen1.5-14b-chat | INT4-MIXED | 1024 | 1154.1 | 86.4 | 10208.4 | 11.57 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 121.4 | 86.9 | 9355.6 | 11.51 |
deepseek-r1-distill-llama-8b | INT8-CW | 1024 | 611.9 | 87 | 8868.4 | 11.49 |
gemma-3-4b-it | FP16 | 32 | 130.6 | 87.8 | 10641.4 | 11.39 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 1187.6 | 88.2 | 9700.6 | 11.34 |
llama-3.1-8b-instruct | INT8-CW | 1024 | 606.2 | 88.2 | 8870.4 | 11.34 |
lcm-dreamshaper-v7 | INT8-HYBRID | 1024 | 90.3 | 88.4 | 3746 | 11.31 |
lcm-dreamshaper-v7 | INT8-HYBRID | 32 | 90.8 | 88.4 | 3745.2 | 11.31 |
afm-4.5b | FP16 | 32 | 101.5 | 88.6 | 9661.4 | 11.29 |
gemma-3-4b-it | FP16 | 1024 | 612 | 89.5 | 13009.8 | 11.17 |
qwen3-8b | INT8-CW | 32 | 105.1 | 89.9 | 8745.6 | 11.12 |
phi-4-reasoning | INT4-MIXED | 1024 | 1359.7 | 90 | 9690.3 | 11.11 |
starcoder2-15b | INT4-MIXED | 32 | 142.5 | 90.4 | 9573.1 | 11.06 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 93.9 | 90.5 | 4963.7 | 11.05 |
afm-4.5b | FP16 | 1024 | 503.6 | 90.6 | 9899.6 | 11.04 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 92.5 | 90.7 | 4962.7 | 11.03 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 93.6 | 91.2 | 5210.7 | 10.96 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 95.1 | 91.2 | 5209.2 | 10.96 |
lcm-dreamshaper-v7 | FP16 | 1024 | 92.3 | 91.7 | 4875.5 | 10.91 |
lcm-dreamshaper-v7 | FP16 | 32 | 92.4 | 91.7 | 4870.8 | 10.91 |
minicpm-v-4_5 | INT8-CW | 217 | 645.4 | 92.2 | 10004.9 | 10.85 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 1142.6 | 92.8 | 10186.6 | 10.78 |
gemma-7b-it | INT8-CW | 32 | 118.8 | 92.9 | 9163.2 | 10.76 |
llava-next-video-7b-hf | INT8-CW | 2945 | 3284.5 | 93 | 11912.2 | 10.75 |
qwen3-8b | INT8-CW | 1024 | 657.3 | 93.3 | 9046.8 | 10.72 |
minicpm3-4b | FP16 | 32 | 173.1 | 93.7 | 9130.6 | 10.67 |
minicpm4-8b | INT8-CW | 32 | 111.5 | 94.4 | 8797.4 | 10.59 |
starcoder2-15b | INT4-MIXED | 1024 | 1447.5 | 94.5 | 9751.8 | 10.58 |
gemma-7b-it | INT8-CW | 1024 | 762.4 | 96.8 | 9864.9 | 10.33 |
minicpm4-8b | INT8-CW | 1024 | 674.7 | 97 | 8963.8 | 10.31 |
phi-4-multimodal-instruct | FP16 | 578 | 703.1 | 99.4 | 12472.6 | 10.06 |
phi-4-multimodal-instruct | FP16 | 786 | 845.9 | 99.6 | 13307 | 10.04 |
glm-4-9b-chat-hf | INT8-CW | 32 | 138.7 | 99.6 | 9953.3 | 10.04 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec |
|---|---|---|---|---|---|---|
t5-small | FP16 | 1024 | 12.6 | 5.4 | 1447 | 185.19 |
t5-small | FP16 | 32 | 11.8 | 5.5 | 1312.9 | 181.82 |
t5-small | INT4-MIXED | 1024 | 14.5 | 5.9 | 1315.2 | 169.49 |
t5-small | INT4-MIXED | 1024 | 14.6 | 5.9 | 1310.9 | 169.49 |
t5-small | INT4-MIXED | 32 | 13.1 | 5.9 | 1168.6 | 169.49 |
t5-small | INT4-MIXED | 1024 | 14.7 | 6 | 1302.5 | 166.67 |
t5-small | INT4-MIXED | 32 | 13.3 | 6 | 1154.1 | 166.67 |
t5-small | INT4-MIXED | 32 | 13.5 | 6 | 1162.4 | 166.67 |
t5-small | INT8-CW | 1024 | 15.5 | 6.2 | 1320.6 | 161.29 |
t5-small | INT8-CW | 32 | 13.9 | 6.4 | 1170.8 | 156.25 |
minicpm4-0.5b | INT4-MIXED | 32 | 27.4 | 7.1 | 1554.1 | 140.85 |
minicpm4-0.5b | INT4-MIXED | 32 | 26.2 | 7.1 | 1513.2 | 140.85 |
minicpm4-0.5b | INT4-MIXED | 32 | 25.3 | 7.1 | 1513.1 | 140.85 |
minicpm4-0.5b | INT4-MIXED | 1024 | 173.6 | 7.3 | 1596.8 | 136.99 |
minicpm4-0.5b | INT4-MIXED | 1024 | 167.1 | 7.3 | 1555.5 | 136.99 |
minicpm4-0.5b | INT4-MIXED | 1024 | 167.3 | 7.5 | 1555.4 | 133.33 |
minicpm4-0.5b | INT4-MIXED | 4097 | 920.9 | 7.6 | 1758.4 | 131.58 |
minicpm4-0.5b | INT4-MIXED | 4097 | 898.8 | 7.6 | 1717.5 | 131.58 |
minicpm4-0.5b | INT4-MIXED | 2049 | 390 | 7.6 | 1656.4 | 131.58 |
minicpm4-0.5b | INT4-MIXED | 4097 | 910.6 | 7.7 | 1717 | 129.87 |
minicpm4-0.5b | INT4-MIXED | 2049 | 372.9 | 7.7 | 1614.8 | 129.87 |
minicpm4-0.5b | INT4-MIXED | 2049 | 382.8 | 7.8 | 1615.1 | 128.21 |
gemma-3-270m | INT4-MIXED | 32 | 27.4 | 8.3 | 1727 | 120.48 |
gemma-3-270m | INT4-MIXED | 4096 | 396.7 | 8.4 | 2028 | 119.05 |
gemma-3-270m | INT4-MIXED | 2048 | 166.8 | 8.4 | 1833.6 | 119.05 |
gemma-3-270m | INT4-MIXED | 1024 | 78.9 | 8.4 | 1763.5 | 119.05 |
gemma-3-270m | INT8-CW | 32 | 27.4 | 10.4 | 1786.2 | 96.15 |
gemma-3-270m | INT8-CW | 1024 | 87.3 | 10.8 | 1826.5 | 92.59 |
minicpm4-0.5b | INT8-CW | 32 | 33.2 | 10.9 | 1835.3 | 91.74 |
gemma-3-270m | INT8-CW | 4096 | 419.6 | 11.2 | 2101.6 | 89.29 |
gemma-3-270m | INT8-CW | 2048 | 176.5 | 11.2 | 1903.4 | 89.29 |
minicpm4-0.5b | INT8-CW | 1024 | 193.2 | 11.4 | 1905.7 | 87.72 |
minicpm4-0.5b | INT8-CW | 4097 | 1132.7 | 11.8 | 2138.1 | 84.75 |
minicpm4-0.5b | INT8-CW | 2049 | 487.1 | 11.8 | 1990.4 | 84.75 |
gemma-3-270m | FP16 | 32 | 30 | 11.9 | 1990.8 | 84.03 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 33.1 | 12.3 | 2662.8 | 81.30 |
gemma-3-270m | FP16 | 1024 | 78.8 | 12.4 | 2086.2 | 80.65 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 31.8 | 12.5 | 2690.2 | 80.00 |
gemma-3-270m | FP16 | 2048 | 162.7 | 12.7 | 2208.5 | 78.74 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 374 | 12.9 | 2758.1 | 77.52 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 379.3 | 13 | 2785.6 | 76.92 |
llama-3.2-1b-instruct | INT4-MIXED | 2048 | 782.2 | 13.4 | 2920.1 | 74.63 |
llama-3.2-1b-instruct | INT4-MIXED | 4096 | 1751.3 | 13.5 | 3078.8 | 74.07 |
llama-3.2-1b-instruct | INT4-MIXED | 2048 | 770.9 | 13.5 | 2918.6 | 74.07 |
distil-large-v2 | INT4-MIXED | prompt0 | 592.2 | 13.6 | 2160.2 | 73.53 |
gemma-3-1b-it | INT4-MIXED | 32 | 41.4 | 13.6 | 2522.6 | 73.53 |
gemma-3-270m | FP16 | 4096 | 379.8 | 13.7 | 2375.1 | 72.99 |
llama-3.2-1b-instruct | INT4-MIXED | 4096 | 1759.4 | 13.7 | 3079.4 | 72.99 |
gemma-3-1b-it | INT4-MIXED | 32 | 39.5 | 13.8 | 2433.3 | 72.46 |
distil-large-v2 | INT4-MIXED | prompt1 | 636.8 | 13.9 | 2193.5 | 71.94 |
gemma-3-1b-it | INT4-MIXED | 1024 | 314.4 | 13.9 | 2605.3 | 71.94 |
minicpm4-0.5b | FP16 | 32 | 30.3 | 14.2 | 2713 | 70.42 |
gemma-3-1b-it | INT4-MIXED | 1024 | 309 | 14.2 | 2517 | 70.42 |
whisper-small | FP16 | prompt1 | 251 | 14.4 | 2074.9 | 69.44 |
gemma-3-1b-it | INT4-MIXED | 4096 | 1378 | 14.4 | 3070.9 | 69.44 |
gemma-3-1b-it | INT4-MIXED | 2048 | 644.5 | 14.5 | 2743.6 | 68.97 |
whisper-small | INT4-MIXED | prompt0 | 191.3 | 14.6 | 1670.1 | 68.49 |
whisper-small | FP16 | prompt0 | 184 | 14.7 | 2041.4 | 68.03 |
whisper-small | INT4-MIXED | prompt1 | 249.2 | 14.7 | 1706.1 | 68.03 |
whisper-small | INT8-CW | prompt1 | 258.6 | 14.7 | 1816.9 | 68.03 |
minicpm4-0.5b | FP16 | 1024 | 208.5 | 14.8 | 2771.2 | 67.57 |
gemma-3-1b-it | INT4-MIXED | 4096 | 1326.3 | 14.8 | 2979.8 | 67.57 |
whisper-small | INT4-MIXED | prompt0 | 193.8 | 14.9 | 1668.1 | 67.11 |
gemma-3-1b-it | INT4-MIXED | 2048 | 623.1 | 14.9 | 2655.5 | 67.11 |
minicpm4-0.5b | FP16 | 4097 | 1554.8 | 15.1 | 2897.4 | 66.23 |
minicpm4-0.5b | FP16 | 2049 | 2371.1 | 15.1 | 2835.5 | 66.23 |
whisper-small | INT4-MIXED | prompt1 | 248.2 | 15.1 | 1738.4 | 66.23 |
whisper-small | INT4-MIXED | prompt0 | 195.4 | 15.1 | 1705.4 | 66.23 |
whisper-small | INT8-CW | prompt0 | 199.1 | 15.1 | 1782.7 | 66.23 |
whisper-large-v3-turbo | INT4-MIXED | prompt1 | 669.7 | 15.2 | 2295.6 | 65.79 |
whisper-large-v3-turbo | INT4-MIXED | prompt0 | 601.4 | 15.3 | 2262.8 | 65.36 |
whisper-small | INT4-MIXED | prompt1 | 261.9 | 15.4 | 1701.2 | 64.94 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 46.8 | 15.4 | 3047.6 | 64.94 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 44.8 | 15.4 | 2879.6 | 64.94 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 50 | 15.5 | 3124.2 | 64.52 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 49 | 15.5 | 2953.1 | 64.52 |
distil-large-v2 | INT8-CW | prompt1 | 744.7 | 15.9 | 2664.8 | 62.89 |
distil-large-v2 | INT8-CW | prompt0 | 686.3 | 15.9 | 2631 | 62.89 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 483.7 | 16 | 3112.5 | 62.50 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 487.2 | 16.1 | 3189.1 | 62.11 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 481 | 16.1 | 2962.1 | 62.11 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 478.5 | 16.1 | 3035.2 | 62.11 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 2048 | 1002 | 16.4 | 3131.3 | 60.98 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 2048 | 992.1 | 16.6 | 3161.8 | 60.24 |
qwen2.5-1.5b-instruct | INT4-MIXED | 2048 | 985.3 | 16.6 | 3040.1 | 60.24 |
qwen2.5-1.5b-instruct | INT4-MIXED | 2048 | 985.4 | 16.6 | 3113.4 | 60.24 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 4097 | 2190 | 16.8 | 3089.9 | 59.52 |
whisper-large-v3-turbo | INT8-CW | prompt1 | 781.2 | 16.8 | 2798.4 | 59.52 |
whisper-large-v3-turbo | INT8-CW | prompt0 | 704.3 | 16.8 | 2765.2 | 59.52 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 4097 | 2207.8 | 16.9 | 3119.9 | 59.17 |
qwen2.5-1.5b-instruct | INT4-MIXED | 4096 | 2151.3 | 16.9 | 3143.3 | 59.17 |
qwen2.5-1.5b-instruct | INT4-MIXED | 4096 | 2166.2 | 16.9 | 3169.3 | 59.17 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 55.3 | 17.3 | 3209.2 | 57.80 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 503.6 | 17.9 | 3334.7 | 55.87 |
qwen2.5-1.5b-instruct | INT4-MIXED | 2048 | 1025.7 | 18.3 | 3279 | 54.64 |
qwen2.5-1.5b-instruct | INT4-MIXED | 4096 | 2266.5 | 18.7 | 3376.4 | 53.48 |
distil-large-v2 | FP16 | prompt1 | 837 | 19.6 | 2813.9 | 51.02 |
distil-large-v2 | FP16 | prompt0 | 773.2 | 19.6 | 2785.4 | 51.02 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 51.7 | 19.8 | 3658.9 | 50.51 |
llama-3.2-1b-instruct | INT8-CW | 32 | 47 | 20.2 | 3512.6 | 49.50 |
whisper-large-v3-turbo | FP16 | prompt1 | 839.8 | 20.4 | 3435.3 | 49.02 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 539.4 | 20.4 | 3571.7 | 49.02 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 432.3 | 20.8 | 3469.5 | 48.08 |
whisper-large-v3-turbo | FP16 | prompt0 | 785.1 | 20.9 | 3548.1 | 47.85 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 2048 | 1085.1 | 20.9 | 3504.4 | 47.85 |
llama-3.2-1b-instruct | INT8-CW | 2048 | 904.2 | 21.2 | 3427.6 | 47.17 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 4097 | 2747.4 | 21.3 | 3660.7 | 46.95 |
gemma-3-1b-it | INT8-CW | 32 | 57.4 | 21.3 | 3072.7 | 46.95 |
llama-3.2-1b-instruct | INT8-CW | 4096 | 2031.7 | 21.5 | 3137.4 | 46.51 |
gemma-3-1b-it | INT8-CW | 1024 | 358.3 | 22 | 3169.9 | 45.45 |
gemma-3-1b-it | INT8-CW | 4096 | 1518.6 | 22.6 | 3322.3 | 44.25 |
gemma-3-1b-it | INT8-CW | 2048 | 697.6 | 22.7 | 3313.1 | 44.05 |
smolvlm2-256m-video-instruct | INT4-MIXED | 1141 | 1779.7 | 22.8 | 2991.3 | 43.86 |
phi-2 | INT4-MIXED | 32 | 63.8 | 23.7 | 3715.7 | 42.19 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 77.9 | 24 | 3555.1 | 41.67 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 76.2 | 24.1 | 3680.2 | 41.49 |
phi-2 | INT4-MIXED | 32 | 70 | 24.4 | 3857.5 | 40.98 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 82.1 | 25.7 | 3926.3 | 38.91 |
phi-2 | INT4-MIXED | 1024 | 1022.7 | 26.3 | 3790.8 | 38.02 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 998.8 | 26.7 | 3607.1 | 37.45 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 1009.3 | 26.8 | 3731.5 | 37.31 |
phi-2 | INT4-MIXED | 1024 | 1075.8 | 26.9 | 3961 | 37.17 |
smolvlm2-256m-video-instruct | INT8-CW | 1141 | 1799.3 | 26.9 | 3079.4 | 37.17 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 73.3 | 27.3 | 3600.3 | 36.63 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 73.2 | 27.4 | 3835.2 | 36.50 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 70.7 | 28 | 3879.5 | 35.71 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 584.4 | 28 | 3482.2 | 35.71 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 575.7 | 28.2 | 3714.3 | 35.46 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 1091 | 28.3 | 4033.8 | 35.34 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 81.1 | 28.4 | 4160.3 | 35.21 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 83.3 | 28.4 | 3958.9 | 35.21 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 88.9 | 28.5 | 4076.6 | 35.09 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 2048 | 1207.4 | 28.5 | 3603.7 | 35.09 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 74.1 | 28.7 | 4288.5 | 34.84 |
qwen2.5-1.5b-instruct | INT8-CW | 2048 | 1208.2 | 28.7 | 3630.8 | 34.84 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 4097 | 3213.6 | 28.9 | 3465.1 | 34.60 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 78.1 | 29 | 4419.6 | 34.48 |
qwen2.5-1.5b-instruct | INT8-CW | 4096 | 2567.8 | 29 | 3400.7 | 34.48 |
stablelm-3b-4e1t | INT4-MIXED | 2048 | 2439.9 | 29.3 | 3836.7 | 34.13 |
stable-zephyr-3b-dpo | INT4-MIXED | 2076 | 2517.2 | 29.4 | 3962.2 | 34.01 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 967.3 | 29.6 | 3792.9 | 33.78 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 960.2 | 29.7 | 3933.8 | 33.67 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 86.1 | 29.7 | 4253.9 | 33.67 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 941.3 | 29.8 | 4050.6 | 33.56 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 955.6 | 30 | 4199.7 | 33.33 |
gemma-3-1b-it | FP16 | 32 | 55.9 | 30.3 | 4485.5 | 33.00 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 962 | 30.4 | 4318 | 32.89 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 98.1 | 30.4 | 3793.9 | 32.89 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 100 | 30.5 | 4004.3 | 32.79 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 93.2 | 30.5 | 4025.5 | 32.79 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 89.1 | 30.6 | 3997.4 | 32.68 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 1090.9 | 30.7 | 4166.4 | 32.57 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 91.7 | 30.7 | 4191.5 | 32.57 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 959.5 | 30.8 | 4084.5 | 32.47 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 96.4 | 30.9 | 4083.7 | 32.36 |
gemma-3-1b-it | FP16 | 1024 | 383.8 | 31 | 4364.1 | 32.26 |
stable-zephyr-3b-dpo | INT4-MIXED | 2076 | 2719.9 | 31 | 4150.2 | 32.26 |
llama-3.2-3b-instruct | INT4-MIXED | 2048 | 1969.2 | 31.2 | 4132.8 | 32.05 |
llama-3.2-3b-instruct | INT4-MIXED | 2048 | 2007.6 | 31.3 | 4175.5 | 31.95 |
gemma-3-1b-it | FP16 | 2048 | 761.7 | 31.6 | 4223.7 | 31.65 |
llama-3.2-3b-instruct | INT4-MIXED | 2048 | 2042.6 | 31.7 | 4235.8 | 31.55 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 96 | 32.1 | 4133.1 | 31.15 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 100 | 32.2 | 4148 | 31.06 |
gemma-3-1b-it | FP16 | 4096 | 1585.6 | 32.9 | 4100.9 | 30.40 |
llama-3.2-3b-instruct | INT4-MIXED | 4096 | 4821.1 | 33 | 4043.3 | 30.30 |
llama-3.2-3b-instruct | INT4-MIXED | 4096 | 4922.5 | 33.1 | 3963.9 | 30.21 |
stablelm-3b-4e1t | INT4-MIXED | 2048 | 2552.6 | 33.2 | 4227.6 | 30.12 |
llama-3.2-3b-instruct | INT4-MIXED | 4096 | 4998.5 | 33.4 | 4110.3 | 29.94 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 1314.6 | 33.5 | 4252.9 | 29.85 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 1295.6 | 33.5 | 4136.7 | 29.85 |
internvl2-4b | INT4-MIXED | 297 | 768 | 33.5 | 4638.9 | 29.85 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 1309.2 | 33.6 | 4248.2 | 29.76 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 1328.5 | 33.6 | 4298.7 | 29.76 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 1332.1 | 33.7 | 4313.3 | 29.67 |
llama-3.2-1b-instruct | FP16 | 32 | 51 | 33.9 | 4721.8 | 29.50 |
internvl2-4b | INT4-MIXED | 297 | 770.1 | 34 | 4758.2 | 29.41 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 1394.3 | 34.1 | 4400.6 | 29.33 |
stablelm-3b-4e1t | INT4-MIXED | 4096 | 6399 | 34.6 | 4207.4 | 28.90 |
llama-3.2-1b-instruct | FP16 | 1024 | 518.6 | 34.7 | 4588.6 | 28.82 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 1412 | 35.2 | 4518.7 | 28.41 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 1376.8 | 35.3 | 4489.6 | 28.33 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 1990.8 | 35.4 | 5138.7 | 28.25 |
llama-3.2-1b-instruct | FP16 | 2048 | 1008.7 | 35.5 | 4202.8 | 28.17 |
internvl2-4b | INT4-MIXED | 1027 | 1821.6 | 35.5 | 4614.3 | 28.17 |
phi-4-mini-instruct | INT4-MIXED | 32 | 94.3 | 35.8 | 4678.1 | 27.93 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 90.8 | 35.8 | 4663.3 | 27.93 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 2367.9 | 35.9 | 4843.9 | 27.86 |
llama-3.2-1b-instruct | FP16 | 4096 | 2210.1 | 36 | 4276.6 | 27.78 |
internvl2-4b | INT4-MIXED | 1027 | 1859.4 | 36 | 4752.7 | 27.78 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 90.7 | 36.1 | 4810.6 | 27.70 |
phi-4-mini-instruct | INT4-MIXED | 32 | 93.5 | 36.2 | 4818.1 | 27.62 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 93.5 | 36.5 | 4998 | 27.40 |
phi-3-mini-4k-instruct | INT4-MIXED | 2048 | 3027.4 | 36.6 | 4172.4 | 27.32 |
phi-3.5-mini-instruct | INT4-MIXED | 2048 | 3034.8 | 36.6 | 4092.8 | 27.32 |
smolvlm2-256m-video-instruct | FP16 | 1141 | 1559.7 | 36.7 | 3683 | 27.25 |
phi-3-mini-4k-instruct | INT4-MIXED | 2048 | 3076.2 | 36.7 | 4225.3 | 27.25 |
phi-3.5-mini-instruct | INT4-MIXED | 2048 | 3080.2 | 36.7 | 4179.3 | 27.25 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 1107.7 | 37.2 | 4679.9 | 26.88 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 1109.1 | 37.3 | 4663.9 | 26.81 |
afm-4.5b | INT4-MIXED | 32 | 108.5 | 37.3 | 4699.2 | 26.81 |
phi-4-mini-instruct | INT4-MIXED | 32 | 86.4 | 37.6 | 5075.3 | 26.60 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 1126.9 | 37.7 | 4730.9 | 26.53 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 1133.1 | 37.7 | 4735.9 | 26.53 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 1158.7 | 38 | 4893.5 | 26.32 |
phi-3.5-mini-instruct | INT4-MIXED | 2048 | 3270.2 | 38.2 | 4615.1 | 26.18 |
stablelm-3b-4e1t | INT4-MIXED | 4096 | 6583.4 | 38.3 | 4965.4 | 26.11 |
internvl2-4b | INT4-MIXED | 2051 | 3624.3 | 38.3 | 5584.2 | 26.11 |
phi-3-mini-4k-instruct | INT4-MIXED | 2048 | 3193.1 | 38.3 | 4409.5 | 26.11 |
afm-4.5b | INT4-MIXED | 1024 | 1402.7 | 38.5 | 4577.7 | 25.97 |
phi-3.5-vision-instruct | INT4-MIXED | 2056 | 4151.7 | 38.6 | 5847.1 | 25.91 |
phi-4-mini-instruct | INT4-MIXED | 2048 | 2354.3 | 38.7 | 4332.9 | 25.84 |
internvl2-4b | INT4-MIXED | 2051 | 3681.3 | 38.8 | 5718.9 | 25.77 |
phi-4-mini-reasoning | INT4-MIXED | 1986 | 2337.2 | 38.8 | 4338.9 | 25.77 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 1214.1 | 39 | 5044.5 | 25.64 |
phi-4-mini-instruct | INT4-MIXED | 2048 | 2406.2 | 39.2 | 4441.2 | 25.51 |
phi-4-mini-reasoning | INT4-MIXED | 1986 | 2398.7 | 39.2 | 4462 | 25.51 |
phi-4-mini-reasoning | INT4-MIXED | 1986 | 2418.9 | 39.5 | 4641.6 | 25.32 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 87 | 39.5 | 4873.1 | 25.32 |
stablelm-3b-4e1t | INT8-CW | 32 | 95.7 | 39.6 | 4767.2 | 25.25 |
afm-4.5b | INT4-MIXED | 2048 | 2971.5 | 39.7 | 4591 | 25.19 |
phi-2 | INT8-CW | 32 | 97.2 | 40 | 4904.3 | 25.00 |
afm-4.5b | INT4-MIXED | 4096 | 6608.7 | 40.5 | 4582.5 | 24.69 |
phi-4-mini-instruct | INT4-MIXED | 2048 | 2597.4 | 40.5 | 4866.9 | 24.69 |
qwen3-4b | INT4-MIXED | 2048 | 2736.2 | 40.6 | 4943.8 | 24.63 |
qwen3-4b | INT4-MIXED | 2048 | 2762.9 | 41 | 4809.3 | 24.39 |
phi-4-mini-instruct | INT4-MIXED | 3623 | 4742 | 41.1 | 4536.6 | 24.33 |
phi-4-mini-reasoning | INT4-MIXED | 3623 | 4759.7 | 41.3 | 4519.1 | 24.21 |
qwen3-4b | INT4-MIXED | 4096 | 6472.1 | 41.6 | 5090.7 | 24.04 |
phi-4-mini-instruct | INT4-MIXED | 3623 | 4836.9 | 41.6 | 4582.5 | 24.04 |
phi-4-mini-reasoning | INT4-MIXED | 3623 | 4894.3 | 41.7 | 4592.2 | 23.98 |
phi-4-mini-reasoning | INT4-MIXED | 3623 | 4935.8 | 42 | 4632.5 | 23.81 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 1190.6 | 42 | 4746.2 | 23.81 |
qwen3-4b | INT4-MIXED | 4096 | 6568.5 | 42.1 | 5168.6 | 23.75 |
stablelm-3b-4e1t | INT8-CW | 1024 | 1193.5 | 42.1 | 4650.6 | 23.75 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 71.3 | 42.3 | 5040.7 | 23.64 |
phi-2 | INT8-CW | 1024 | 1216 | 42.3 | 4751.1 | 23.64 |
qwen2.5-1.5b-instruct | FP16 | 32 | 70.7 | 42.4 | 5460.8 | 23.58 |
whisper-large-v3 | INT4-MIXED | prompt0 | 772.9 | 42.6 | 3854.8 | 23.47 |
phi-3.5-mini-instruct | INT4-MIXED | 4096 | 7629.4 | 42.6 | 5027 | 23.47 |
phi-3-mini-4k-instruct | INT4-MIXED | 4096 | 7602.8 | 42.7 | 5129.5 | 23.42 |
phi-3.5-mini-instruct | INT4-MIXED | 4096 | 7747.3 | 42.7 | 5163.1 | 23.42 |
phi-3-mini-4k-instruct | INT4-MIXED | 4096 | 7734.1 | 42.8 | 5164.7 | 23.36 |
phi-4-mini-instruct | INT4-MIXED | 3623 | 5172.4 | 42.9 | 5043.3 | 23.31 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 651 | 43.2 | 5124.9 | 23.15 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 651.4 | 43.3 | 5144.8 | 23.09 |
deepseek-r1-distill-qwen-1.5b | FP16 | 2048 | 1309.9 | 43.8 | 4815 | 22.83 |
internvl2-4b | INT4-MIXED | 4099 | 8950.9 | 43.8 | 7377.6 | 22.83 |
qwen2.5-1.5b-instruct | FP16 | 2048 | 1304.7 | 43.9 | 4726.6 | 22.78 |
deepseek-r1-distill-qwen-1.5b | FP16 | 4097 | 8260.8 | 44.3 | 4881.6 | 22.57 |
qwen2.5-1.5b-instruct | FP16 | 4096 | 2788.9 | 44.3 | 4898.5 | 22.57 |
internvl2-4b | INT4-MIXED | 4099 | 9058.1 | 44.3 | 7365.2 | 22.57 |
phi-3.5-mini-instruct | INT4-MIXED | 4096 | 8269.2 | 44.3 | 5492.1 | 22.57 |
phi-3-mini-4k-instruct | INT4-MIXED | 4096 | 8098.8 | 44.4 | 5317.7 | 22.52 |
stable-zephyr-3b-dpo | INT8-CW | 2076 | 4103.6 | 44.6 | 4797.2 | 22.42 |
stablelm-3b-4e1t | INT8-CW | 2048 | 2821 | 44.6 | 4750.7 | 22.42 |
gpt-j-6b | INT4-MIXED | 32 | 140.6 | 45.3 | 5178.9 | 22.08 |
whisper-large-v3 | INT4-MIXED | prompt1 | 855.2 | 45.4 | 3886.9 | 22.03 |
llama-3.2-3b-instruct | INT8-CW | 32 | 93.7 | 46.4 | 5553 | 21.55 |
phi-3.5-vision-instruct | INT4-MIXED | 5239 | 13020.6 | 47.4 | 8087 | 21.10 |
minicpm3-4b | INT4-MIXED | 32 | 181.5 | 47.6 | 4621.1 | 21.01 |
whisper-large-v3 | INT8-CW | prompt0 | 884.2 | 47.6 | 4683.3 | 21.01 |
flan-t5-xxl | INT4-MIXED | 33 | 468.3 | 47.7 | 13996.1 | 20.96 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 1159.5 | 47.7 | 5326.8 | 20.96 |
whisper-large-v3 | INT8-CW | prompt1 | 947.3 | 48.1 | 4594 | 20.79 |
gpt-j-6b | INT4-MIXED | 32 | 153.2 | 48.4 | 5508.1 | 20.66 |
minicpm3-4b | INT4-MIXED | 32 | 215.5 | 48.5 | 4426.7 | 20.62 |
gpt-j-6b | INT4-MIXED | 1024 | 2092.6 | 48.6 | 6045.2 | 20.58 |
minicpm3-4b | INT4-MIXED | 32 | 220 | 48.7 | 4385.5 | 20.53 |
llama-3.2-3b-instruct | INT8-CW | 2048 | 2401.4 | 49 | 4945.1 | 20.41 |
stablelm-3b-4e1t | INT8-CW | 4096 | 7153.3 | 49.7 | 5660.4 | 20.12 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 165.3 | 50.2 | 5377.6 | 19.92 |
llama-3.2-3b-instruct | INT8-CW | 4096 | 5655.6 | 50.8 | 5450.1 | 19.69 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 116.4 | 50.9 | 5415.6 | 19.65 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 172.9 | 51.1 | 5508 | 19.57 |
gpt-j-6b | INT4-MIXED | 1024 | 2252.9 | 51.9 | 6377.5 | 19.27 |
falcon-7b-instruct | INT4-MIXED | 32 | 166.4 | 51.9 | 5731.2 | 19.27 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 1014.5 | 52 | 5106.1 | 19.23 |
flan-t5-xxl | INT4-MIXED | 1139 | 422.9 | 52.1 | 15612.4 | 19.19 |
gpt-j-6b | INT4-MIXED | 2057 | 4835.9 | 52.8 | 7320.2 | 18.94 |
falcon-7b-instruct | INT4-MIXED | 1024 | 2194 | 53.2 | 5561.1 | 18.80 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 108 | 53.4 | 5721.8 | 18.73 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 113.9 | 53.4 | 5778.1 | 18.73 |
phi-3.5-mini-instruct | INT8-CW | 32 | 109.2 | 53.4 | 5703.6 | 18.73 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 175.5 | 53.7 | 5621.5 | 18.62 |
minicpm3-4b | INT4-MIXED | 1024 | 1758.4 | 53.9 | 5081.9 | 18.55 |
chatglm3-6b | FP16 | 7 | 74.1 | 54.3 | 5655 | 18.42 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 2213.7 | 54.3 | 5448.1 | 18.42 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 175.3 | 54.3 | 5652.8 | 18.42 |
biomistral-7b-slerp | INT4-MIXED | 7 | 74.1 | 54.3 | 5655 | 18.42 |
chatglm3-6b | INT4-MIXED | 7 | 74.1 | 54.3 | 5655 | 18.42 |
chatglm3-6b | INT4-MIXED | 7 | 74.1 | 54.3 | 5655 | 18.42 |
chatglm3-6b | INT8-CW | 7 | 74.1 | 54.3 | 5655 | 18.42 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 4897.7 | 54.6 | 6524.5 | 18.32 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 186.2 | 54.6 | 5833.8 | 18.32 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 6613.3 | 54.8 | 6722 | 18.25 |
minicpm3-4b | INT4-MIXED | 1024 | 1757.6 | 55.1 | 4980.6 | 18.15 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 2216.8 | 55.3 | 5530.5 | 18.08 |
minicpm3-4b | INT4-MIXED | 1024 | 1727.3 | 55.3 | 4874.5 | 18.08 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 188.3 | 55.4 | 5748.5 | 18.05 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 18175.5 | 55.5 | 8029.9 | 18.02 |
falcon-7b-instruct | INT4-MIXED | 32 | 181.9 | 55.6 | 6288.2 | 17.99 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 188.3 | 55.7 | 5965.6 | 17.95 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 13503 | 55.8 | 8935.9 | 17.92 |
biomistral-7b-slerp | INT4-MIXED | 7 | 78.6 | 55.8 | 5854.6 | 17.92 |
phi-4-multimodal-instruct | INT4-MIXED | 1685 | 13140.2 | 55.9 | 9393.9 | 17.89 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 2233 | 55.9 | 5438.2 | 17.89 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 174.3 | 56 | 5720.8 | 17.86 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 189.1 | 56 | 5958.5 | 17.86 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 187.4 | 56 | 5768.1 | 17.86 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 181.6 | 56 | 5721.6 | 17.86 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 177.2 | 56 | 5811.4 | 17.86 |
phi-4-mini-instruct | INT8-CW | 32 | 111.2 | 56 | 6409.4 | 17.86 |
gpt-j-6b | INT4-MIXED | 2057 | 5028.8 | 56.1 | 7740.4 | 17.83 |
qwen2-7b-instruct | INT4-MIXED | 32 | 178.4 | 56.1 | 5838.1 | 17.83 |
phi-4-mini-reasoning | INT8-CW | 32 | 113.7 | 56.1 | 6280.5 | 17.83 |
internvl2-4b | INT8-CW | 297 | 981.1 | 56.3 | 6407.6 | 17.76 |
whisper-large-v3 | FP16 | prompt0 | 933.8 | 56.4 | 5766.8 | 17.73 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 1604.9 | 56.5 | 5732.6 | 17.70 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 1590.8 | 56.5 | 5734.7 | 17.70 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 1592.8 | 56.5 | 5625.6 | 17.70 |
flan-t5-xxl | INT4-MIXED | 2048 | 690.7 | 56.6 | 19603.4 | 17.67 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 2234.4 | 56.6 | 5426.5 | 17.67 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 2239.3 | 56.8 | 5532.3 | 17.61 |
whisper-large-v3 | FP16 | prompt1 | 1002.7 | 56.9 | 5545.9 | 17.57 |
falcon-7b-instruct | INT4-MIXED | 1024 | 2323.2 | 57 | 6004.5 | 17.54 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 181.1 | 57.3 | 6223 | 17.45 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 2038.7 | 57.4 | 5859.6 | 17.42 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 2104.6 | 57.4 | 5977 | 17.42 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 2025.4 | 57.4 | 5860.3 | 17.42 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 2028.1 | 57.4 | 5950.1 | 17.42 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 183.3 | 57.4 | 6420.8 | 17.42 |
phi-4-mini-instruct | INT8-CW | 1024 | 1341.4 | 57.5 | 6156 | 17.39 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 2245.2 | 57.6 | 5553.9 | 17.36 |
phi-4-mini-reasoning | INT8-CW | 1024 | 1350.9 | 57.7 | 6162.4 | 17.33 |
phi-3.5-vision-instruct | INT8-CW | 802 | 7401.7 | 57.7 | 6893.7 | 17.33 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 2324.5 | 57.9 | 5647.1 | 17.27 |
minicpm-v-2_6 | INT4-MIXED | 228 | 2896.3 | 57.9 | 7147.8 | 17.27 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 2444.7 | 58 | 5670.2 | 17.24 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 2315.8 | 58.1 | 5646.2 | 17.21 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 182.3 | 58.1 | 6177.7 | 17.21 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 179.8 | 58.1 | 6333.6 | 17.21 |
minicpm-o-2_6 | INT4-MIXED | 238 | 2962.2 | 58.2 | 7165.8 | 17.18 |
qwen2-7b-instruct | INT4-MIXED | 32 | 188.1 | 58.2 | 6525.8 | 17.18 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 180.6 | 58.2 | 6352 | 17.18 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 3862.9 | 58.2 | 6296.3 | 17.18 |
internvl2-4b | INT8-CW | 1027 | 4017 | 58.4 | 6261.8 | 17.12 |
llama-2-7b-chat-hf | INT4-MIXED | 2048 | 4980 | 58.6 | 5792.1 | 17.06 |
qwen2-7b-instruct | INT4-MIXED | 2048 | 4371.3 | 58.6 | 5559.7 | 17.06 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 2048 | 4364.6 | 58.7 | 5441.3 | 17.04 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 2064 | 58.7 | 5969.2 | 17.04 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 2072.6 | 58.8 | 6084.8 | 17.01 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 5157.9 | 58.8 | 6919.7 | 17.01 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 6791.8 | 59 | 7372.8 | 16.95 |
minicpm-v-2_6 | INT4-MIXED | 228 | 3002.8 | 59.1 | 7262 | 16.92 |
phi-4-mini-instruct | INT8-CW | 2048 | 2819.2 | 59.1 | 5817.7 | 16.92 |
phi-4-mini-reasoning | INT8-CW | 1986 | 5427.8 | 59.2 | 5825.9 | 16.89 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 4097 | 9472.8 | 59.3 | 5805.7 | 16.86 |
qwen2-7b-instruct | INT4-MIXED | 4096 | 9372.7 | 59.3 | 5918.5 | 16.86 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 2135.2 | 59.5 | 6060.1 | 16.81 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 2135.9 | 59.5 | 6062.5 | 16.81 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 2125.7 | 59.5 | 6163.5 | 16.81 |
llama-2-7b-chat-hf | INT4-MIXED | 2048 | 4978.1 | 59.6 | 5871.6 | 16.78 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 2221 | 59.6 | 6183.6 | 16.78 |
llama-3-8b-instruct | INT4-MIXED | 32 | 180.6 | 59.6 | 5936.8 | 16.78 |
phi-3-mini-4k-instruct | INT8-CW | 2048 | 3646.6 | 59.6 | 5851.8 | 16.78 |
phi-3.5-mini-instruct | INT8-CW | 2048 | 3619.2 | 59.6 | 5736.2 | 16.78 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 18669 | 59.7 | 8758.1 | 16.75 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 184.6 | 59.7 | 6054.9 | 16.75 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 2048 | 4460 | 59.8 | 5549.2 | 16.72 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 13974.8 | 60 | 9654.1 | 16.67 |
minicpm-v-2_6 | INT4-MIXED | 228 | 3130.7 | 60 | 7387 | 16.67 |
phi-4-multimodal-instruct | INT4-MIXED | 1685 | 13655.4 | 60.1 | 10068.6 | 16.64 |
minicpm-o-2_6 | INT4-MIXED | 238 | 3188.1 | 60.1 | 6894.6 | 16.64 |
phi-4-multimodal-instruct | INT4-MIXED | 4506 | 36268.1 | 60.2 | 14572.5 | 16.61 |
minicpm3-4b | INT4-MIXED | 2049 | 4566.4 | 60.2 | 5834.6 | 16.61 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 180.2 | 60.2 | 5936.8 | 16.61 |
llama-3-8b-instruct | INT4-MIXED | 32 | 191.1 | 60.5 | 6454.4 | 16.53 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 4097 | 9634.9 | 60.6 | 5915.3 | 16.50 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 2048 | 4523.8 | 60.6 | 5639.5 | 16.50 |
qwen2-7b-instruct | INT4-MIXED | 2048 | 4595.8 | 60.8 | 5778.2 | 16.45 |
minicpm4-8b | INT4-MIXED | 32 | 198.2 | 60.8 | 5897.5 | 16.45 |
llama-3-8b-instruct | INT4-MIXED | 32 | 193.6 | 60.9 | 6496.6 | 16.42 |
phi-3.5-vision-instruct | INT8-CW | 2056 | 7298.7 | 60.9 | 7394.5 | 16.42 |
internvl2-4b | INT8-CW | 2051 | 7364.9 | 61.1 | 7237.1 | 16.37 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 193.3 | 61.3 | 6417.7 | 16.31 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 4097 | 9847.7 | 61.4 | 6003.4 | 16.29 |
minicpm3-4b | INT4-MIXED | 2049 | 4560.3 | 61.4 | 5723.3 | 16.29 |
qwen3-8b | INT4-MIXED | 32 | 180.5 | 61.4 | 6242.9 | 16.29 |
phi-4-mini-instruct | INT8-CW | 3623 | 10432.1 | 61.4 | 6278.4 | 16.29 |
qwen2-7b-instruct | INT4-MIXED | 4096 | 9798.4 | 61.5 | 6123.9 | 16.26 |
gpt-j-6b | INT4-MIXED | 4112 | 11653.7 | 61.6 | 10386.6 | 16.23 |
minicpm4-8b | INT4-MIXED | 32 | 211.3 | 61.6 | 6455.4 | 16.23 |
phi-4-mini-reasoning | INT8-CW | 3623 | 10471.7 | 61.6 | 6268 | 16.23 |
minicpm3-4b | INT4-MIXED | 2049 | 4495.4 | 61.7 | 5694.7 | 16.21 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 2256.5 | 61.7 | 6114.2 | 16.21 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 2247.8 | 61.8 | 6232.2 | 16.18 |
llama-3-8b-instruct | INT4-MIXED | 32 | 190.3 | 62 | 6490.2 | 16.13 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 9332.9 | 62.4 | 7936.1 | 16.03 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 2258.1 | 62.4 | 6114.1 | 16.03 |
afm-4.5b | INT8-CW | 32 | 159.6 | 62.4 | 6132.1 | 16.03 |
minicpm4-8b | INT4-MIXED | 1024 | 2364.2 | 62.5 | 6017.6 | 16.00 |
minicpm4-8b | INT4-MIXED | 32 | 214.6 | 62.5 | 6605.5 | 16.00 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 2236.8 | 62.7 | 6203 | 15.95 |
qwen3-8b | INT4-MIXED | 32 | 179.3 | 62.8 | 6763 | 15.92 |
qwen3-4b | INT8-CW | 2048 | 3273.7 | 63.1 | 6609.6 | 15.85 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 2228.7 | 63.2 | 6349.2 | 15.82 |
minicpm4-8b | INT4-MIXED | 1024 | 2439.5 | 63.4 | 6153.7 | 15.77 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 2244.5 | 63.5 | 6199.3 | 15.75 |
afm-4.5b | INT8-CW | 1024 | 1517 | 63.5 | 5963.3 | 15.75 |
minicpm4-8b | INT4-MIXED | 4097 | 11075.9 | 63.7 | 5863.4 | 15.70 |
llama-3-8b-instruct | INT4-MIXED | 2048 | 4755.4 | 63.9 | 5770.4 | 15.65 |
qwen3-8b | INT4-MIXED | 1024 | 2222.3 | 63.9 | 6427.4 | 15.65 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 9696.9 | 64 | 8038.1 | 15.63 |
minicpm4-8b | INT4-MIXED | 2049 | 5198.3 | 64 | 5502.8 | 15.63 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 2048 | 4771.4 | 64 | 5889.3 | 15.63 |
minicpm4-8b | INT4-MIXED | 1024 | 2548.5 | 64.1 | 6259.1 | 15.60 |
qwen3-4b | INT8-CW | 4096 | 7519 | 64.1 | 6479.4 | 15.60 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 2332.4 | 64.2 | 6443.5 | 15.58 |
phi-4-multimodal-instruct | INT4-MIXED | 4506 | 36990.5 | 64.4 | 15329.2 | 15.53 |
minicpm-v-4_5 | INT4-MIXED | 217 | 3015.3 | 64.4 | 7566.2 | 15.53 |
llama-3.1-8b-instruct | INT4-MIXED | 2048 | 4745.3 | 64.5 | 5770.8 | 15.50 |
minicpm4-8b | INT4-MIXED | 4097 | 11347.1 | 64.6 | 5994.3 | 15.48 |
afm-4.5b | INT8-CW | 2048 | 3187.1 | 64.7 | 5907.2 | 15.46 |
gpt-j-6b | INT4-MIXED | 4112 | 12121.1 | 64.8 | 10729.4 | 15.43 |
llama-3-8b-instruct | INT4-MIXED | 4096 | 10403.5 | 64.8 | 6298.9 | 15.43 |
minicpm4-8b | INT4-MIXED | 2049 | 5350.1 | 64.9 | 5628.1 | 15.41 |
llama-3-8b-instruct | INT4-MIXED | 2048 | 4802.5 | 64.9 | 5856.1 | 15.41 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 4086 | 10391.6 | 65.1 | 6425 | 15.36 |
minicpm-v-4_5 | INT4-MIXED | 217 | 3132.4 | 65.2 | 7794.5 | 15.34 |
llama-3-8b-instruct | INT4-MIXED | 2048 | 4783.3 | 65.3 | 6000.9 | 15.31 |
qwen3-8b | INT4-MIXED | 1024 | 2353.2 | 65.3 | 6654.6 | 15.31 |
minicpm4-8b | INT4-MIXED | 4097 | 11649.3 | 65.4 | 6099.8 | 15.29 |
llama-3.1-8b-instruct | INT4-MIXED | 4096 | 10431 | 65.4 | 6299.9 | 15.29 |
minicpm4-8b | INT4-MIXED | 2049 | 5509 | 65.6 | 5751.1 | 15.24 |
afm-4.5b | INT8-CW | 4096 | 7029.2 | 65.6 | 6288.2 | 15.24 |
llama-3.1-8b-instruct | INT4-MIXED | 2048 | 4820.6 | 65.7 | 5851.1 | 15.22 |
phi-3-mini-4k-instruct | INT8-CW | 4096 | 8952.2 | 65.7 | 6953.5 | 15.22 |
phi-3.5-mini-instruct | INT8-CW | 4096 | 8942.6 | 65.7 | 6837.4 | 15.22 |
llama-3-8b-instruct | INT4-MIXED | 4096 | 10507.7 | 65.8 | 6385.2 | 15.20 |
llama-3-8b-instruct | INT4-MIXED | 4096 | 10486.7 | 66.1 | 6529.5 | 15.13 |
llama-3-8b-instruct | INT4-MIXED | 2048 | 4936.2 | 66.3 | 6095.7 | 15.08 |
qwen3-8b | INT4-MIXED | 2048 | 4806.1 | 66.3 | 6107.6 | 15.08 |
llama-3.1-8b-instruct | INT4-MIXED | 4096 | 10521.9 | 66.5 | 6379.9 | 15.04 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 303.8 | 66.6 | 6965.7 | 15.02 |
internvl2-4b | INT8-CW | 4099 | 10469.2 | 66.7 | 8614.4 | 14.99 |
gemma-7b-it | INT4-MIXED | 32 | 224.2 | 67 | 7202.4 | 14.93 |
llama-2-7b-chat-hf | INT4-MIXED | 4096 | 12428.9 | 67.2 | 7167.6 | 14.88 |
llama-3-8b-instruct | INT4-MIXED | 4096 | 10823.9 | 67.2 | 6624 | 14.88 |
qwen3-8b | INT4-MIXED | 4096 | 10581.4 | 67.2 | 6661.2 | 14.88 |
flan-t5-xxl | INT4-MIXED | 4096 | 1377.3 | 67.5 | 28697.9 | 14.81 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 2426.5 | 67.6 | 7483.5 | 14.79 |
qwen3-8b | INT4-MIXED | 2048 | 5027.4 | 67.7 | 6324.6 | 14.77 |
llama-2-7b-chat-hf | INT4-MIXED | 4096 | 12334.6 | 68.1 | 7247.6 | 14.68 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 309.6 | 68.5 | 7452.8 | 14.60 |
gemma-7b-it | INT4-MIXED | 32 | 233.5 | 68.6 | 7523.4 | 14.58 |
qwen3-8b | INT4-MIXED | 4096 | 11033 | 68.7 | 6883.5 | 14.56 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 2048 | 5190.3 | 68.7 | 9568.6 | 14.56 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 2540.2 | 69.6 | 7690 | 14.37 |
phi-3.5-vision-instruct | INT8-CW | 5239 | 24507.7 | 69.7 | 9417.7 | 14.35 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 2048 | 5344.7 | 70.7 | 9787.8 | 14.14 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 4096 | 11437.7 | 70.9 | 13142.7 | 14.10 |
gemma-7b-it | INT4-MIXED | 1024 | 2750.2 | 71 | 7115.5 | 14.08 |
stable-zephyr-3b-dpo | FP16 | 32 | 97.5 | 71.4 | 7250.1 | 14.01 |
stablelm-3b-4e1t | FP16 | 32 | 97.4 | 71.4 | 7405.6 | 14.01 |
phi-2 | FP16 | 32 | 101.4 | 71.5 | 7239.9 | 13.99 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 176.9 | 71.9 | 7503.1 | 13.91 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 181 | 72.1 | 7448.5 | 13.87 |
gemma-7b-it | INT4-MIXED | 1024 | 2824.2 | 72.5 | 7554.3 | 13.79 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 218.6 | 72.5 | 6896 | 13.79 |
minicpm3-4b | INT4-MIXED | 4097 | 12436.8 | 72.9 | 7833.4 | 13.72 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 4096 | 11899.3 | 72.9 | 13335.9 | 13.72 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 236.7 | 73.6 | 7400.7 | 13.59 |
minicpm3-4b | INT4-MIXED | 4097 | 12490.3 | 74 | 7750.3 | 13.51 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 2473.8 | 74.1 | 7115.6 | 13.50 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 2471.7 | 74.1 | 7329.9 | 13.50 |
minicpm3-4b | INT4-MIXED | 4097 | 12312.3 | 74.3 | 7713.8 | 13.46 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 2689.9 | 74.3 | 7123.8 | 13.46 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 230.4 | 74.6 | 7402.4 | 13.40 |
gemma-7b-it | INT4-MIXED | 2048 | 5975.4 | 75 | 7910.5 | 13.33 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 2770.3 | 75.3 | 7226.6 | 13.28 |
stable-zephyr-3b-dpo | FP16 | 1024 | 1419.6 | 75.9 | 7137 | 13.18 |
stablelm-3b-4e1t | FP16 | 1024 | 1413.1 | 75.9 | 7015.3 | 13.18 |
glm-4-9b-chat-hf | INT4-MIXED | 4096 | 12262.4 | 75.9 | 8870.3 | 13.18 |
phi-2 | FP16 | 1024 | 1409.1 | 76 | 7088.2 | 13.16 |
glm-4-9b-chat-hf | INT4-MIXED | 2048 | 5683.9 | 76 | 7506.1 | 13.16 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 2048 | 5281 | 76.2 | 7093.4 | 13.12 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 2810.4 | 76.2 | 7362 | 13.12 |
llama-3.1-8b-instruct | INT4-MIXED | 2048 | 5307.2 | 76.4 | 7049.5 | 13.09 |
gemma-2-9b-it | INT4-MIXED | 32 | 200.4 | 76.4 | 7933.9 | 13.09 |
gemma-7b-it | INT4-MIXED | 2048 | 6109.3 | 76.5 | 8115.3 | 13.07 |
minicpm3-4b | INT8-CW | 32 | 182.7 | 76.7 | 6235.7 | 13.04 |
glm-4-9b-chat-hf | INT4-MIXED | 4096 | 12495.3 | 77.1 | 8983.8 | 12.97 |
llama-3.1-8b-instruct | INT4-MIXED | 4096 | 11469.5 | 77.2 | 7713.4 | 12.95 |
glm-4-9b-chat-hf | INT4-MIXED | 2048 | 5761.7 | 77.2 | 7678.9 | 12.95 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 4086 | 11479.9 | 77.3 | 7762 | 12.94 |
gemma-2-9b-it | INT4-MIXED | 32 | 214.2 | 77.3 | 7913.9 | 12.94 |
gemma-2-9b-it | INT4-MIXED | 32 | 213.7 | 77.5 | 8270.5 | 12.90 |
phi-4-multimodal-instruct | INT8-CW | 578 | 5176.6 | 77.8 | 8310.9 | 12.85 |
glm-4-9b-chat-hf | INT4-MIXED | 2048 | 5924.4 | 78 | 7742.2 | 12.82 |
phi-4-multimodal-instruct | INT8-CW | 786 | 6915.9 | 78 | 8445.9 | 12.82 |
glm-4-9b-chat-hf | INT4-MIXED | 4096 | 12806.4 | 78.1 | 9116.7 | 12.80 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 18870.5 | 78.8 | 9847.9 | 12.69 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 14309.2 | 79 | 10734.4 | 12.66 |
phi-4-multimodal-instruct | INT8-CW | 1685 | 13971.7 | 79.2 | 11213.2 | 12.63 |
stablelm-3b-4e1t | FP16 | 2048 | 3162.5 | 80.5 | 7823.7 | 12.42 |
stable-zephyr-3b-dpo | FP16 | 2076 | 9447.5 | 80.6 | 7836.8 | 12.41 |
gemma-2-9b-it | INT4-MIXED | 1024 | 2851.6 | 80.6 | 7715.2 | 12.41 |
gemma-2-9b-it | INT4-MIXED | 1024 | 2942.9 | 81.5 | 7841.6 | 12.27 |
gemma-2-9b-it | INT4-MIXED | 1024 | 3018.3 | 81.8 | 8017 | 12.22 |
flan-t5-xxl | INT8-CW | 33 | 310.9 | 81.9 | 23709.8 | 12.21 |
qwen2.5-coder-3b-instruct | FP16 | 32 | 126 | 82.7 | 8611.3 | 12.09 |
phi-4-multimodal-instruct | INT8-CW | 4506 | 37716.5 | 82.8 | 16439.1 | 12.08 |
minicpm3-4b | INT8-CW | 1024 | 2028.6 | 83 | 6596.6 | 12.05 |
gpt-j-6b | INT8-CW | 32 | 158.4 | 83 | 7982.6 | 12.05 |
gemma-7b-it | INT4-MIXED | 4096 | 14072.9 | 83.1 | 9060.3 | 12.03 |
qwen2.5-coder-3b-instruct | FP16 | 1024 | 1394.7 | 84.3 | 7990.9 | 11.86 |
gemma-7b-it | INT4-MIXED | 4096 | 14448.1 | 84.7 | 9245 | 11.81 |
gemma-2-9b-it | INT4-MIXED | 2048 | 6150.2 | 84.9 | 8125.5 | 11.78 |
llama-3.2-3b-instruct | FP16 | 32 | 108.2 | 85.4 | 9000.5 | 11.71 |
gemma-2-9b-it | INT4-MIXED | 2048 | 6334.1 | 85.8 | 8257.3 | 11.66 |
gemma-2-9b-it | INT4-MIXED | 2048 | 6513.8 | 86 | 8408.9 | 11.63 |
gpt-j-6b | INT8-CW | 1024 | 2465.2 | 87.1 | 8211.5 | 11.48 |
flan-t5-xxl | INT8-CW | 1139 | 509.5 | 87.2 | 25123.7 | 11.47 |
llama-3.2-3b-instruct | FP16 | 1024 | 1413.1 | 87.5 | 8422.3 | 11.43 |
gemma-2-9b-it | INT4-MIXED | 4096 | 13749.8 | 88.4 | 9292.1 | 11.31 |
gemma-2-9b-it | INT4-MIXED | 4096 | 14169.9 | 89.3 | 9423.6 | 11.20 |
minicpm3-4b | INT8-CW | 2049 | 7300.5 | 89.3 | 7623.8 | 11.20 |
stablelm-3b-4e1t | FP16 | 4096 | 7749 | 89.4 | 9238.6 | 11.19 |
gemma-2-9b-it | INT4-MIXED | 4096 | 14397.3 | 89.5 | 9574.8 | 11.17 |
llama-3.2-3b-instruct | FP16 | 2048 | 2892.4 | 89.6 | 8470.6 | 11.16 |
gpt-j-6b | INT8-CW | 2057 | 7011.4 | 91.3 | 9570.4 | 10.95 |
llama-2-7b-chat-hf | INT8-CW | 32 | 156.2 | 91.6 | 8478.1 | 10.92 |
llama-3.2-3b-instruct | FP16 | 4096 | 6578.9 | 92.1 | 9073.1 | 10.86 |
flan-t5-xxl | INT8-CW | 2048 | 836.5 | 92.3 | 29363 | 10.83 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 315.3 | 93.9 | 8271.6 | 10.65 |
falcon-7b-instruct | INT8-CW | 32 | 178.6 | 94.9 | 8853.9 | 10.54 |
llama-2-7b-chat-hf | INT8-CW | 1024 | 2675.6 | 95.6 | 8222.6 | 10.46 |
falcon-7b-instruct | INT8-CW | 1024 | 2619.8 | 96.2 | 8208.9 | 10.40 |
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 183 | 97.9 | 8857.8 | 10.21 |
qwen2.5-7b-instruct | INT8-CW | 32 | 193.6 | 97.9 | 8859.7 | 10.21 |
qwen2.5-7b-instruct-1m | INT8-CW | 32 | 188.6 | 97.9 | 8941.8 | 10.21 |
qwen2-7b-instruct | INT8-CW | 32 | 184.8 | 98 | 8969.5 | 10.20 |
mistral-7b-instruct-v0.2 | INT8-CW | 32 | 181.2 | 99 | 8740.7 | 10.10 |
mistral-7b-instruct-v0.3 | INT8-CW | 32 | 176.1 | 99 | 8756.2 | 10.10 |
phi-3-mini-4k-instruct | FP16 | 32 | 127.9 | 99.1 | 9385.8 | 10.09 |
phi-3-mini-128k-instruct | FP16 | 32 | 126.5 | 99.2 | 9397.8 | 10.08 |
deepseek-r1-distill-qwen-7b | INT8-CW | 1024 | 2504.8 | 99.3 | 8385.8 | 10.07 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 2505.2 | 99.4 | 8386.7 | 10.06 |
qwen2.5-7b-instruct-1m | INT8-CW | 1024 | 2518.2 | 99.4 | 8470.4 | 10.06 |
phi-3.5-mini-instruct | FP16 | 32 | 128.7 | 99.5 | 9291.7 | 10.05 |
qwen2-7b-instruct | INT8-CW | 1024 | 2566.8 | 99.5 | 8504.9 | 10.05 |
biomistral-7b-slerp | INT8-CW | 7 | 117.9 | 99.5 | 8777.8 | 10.05 |
minicpm-v-2_6 | INT8-CW | 228 | 3132.9 | 99.7 | 10104.6 | 10.03 |
llama-2-7b-chat-hf | INT8-CW | 2048 | 5933.5 | 99.9 | 8960.9 | 10.01 |
minicpm-o-2_6 | INT8-CW | 238 | 3081.2 | 100 | 10113.2 | 10.00 |
All models listed here were tested with the following parameters:
Framework: PyTorch
Beam: 1
Batch size: 1