Most Efficient Large Language Models for AI PC#
This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2026.1, 7 April 2026.
The tables below list the key performance indicators for inference on built-in GPUs.
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec |
|---|---|---|---|---|---|---|
t5-small | INT4-MIXED | 32 | 11 | 5.3 | 1161.7 | 188.6792453 |
t5-small | INT4-MIXED | 1024 | 14.1 | 5.4 | 1032.5 | 185.1851852 |
t5-small | INT8-CW | 32 | 11.3 | 5.5 | 1219.6 | 181.8181818 |
t5-small | INT4-MIXED | 32 | 11.5 | 5.6 | 1131 | 178.5714286 |
t5-small | INT8-CW | 1024 | 13.5 | 5.7 | 1097.2 | 175.4385965 |
t5-small | INT4-MIXED | 32 | 11.2 | 5.8 | 1149 | 172.4137931 |
t5-small | INT4-MIXED | 1024 | 18.2 | 6.1 | 1039.8 | 163.9344262 |
t5-small | FP16 | 32 | 10.5 | 6.2 | 1138.4 | 161.2903226 |
t5-small | INT4-MIXED | 1024 | 11.5 | 6.2 | 1020.7 | 161.2903226 |
t5-small | FP16 | 1024 | 14.5 | 6.8 | 1233.5 | 147.0588235 |
distil-large-v2 | INT4-MIXED | 32 | 239.7 | 8.9 | 2170.7 | 112.3595506 |
distil-large-v2 | INT4-MIXED | 1024 | 305.4 | 8.9 | 1662.6 | 112.3595506 |
distil-large-v2 | INT8-CW | 1024 | 281.3 | 9.2 | 2160.5 | 108.6956522 |
distil-large-v2 | INT8-CW | 32 | 237.5 | 9.3 | 2451.7 | 107.5268817 |
whisper-large-v3-turbo | INT4-MIXED | 1024 | 312.5 | 9.6 | 1980.9 | 104.1666667 |
whisper-large-v3-turbo | INT8-CW | 1024 | 309.9 | 9.6 | 2167.9 | 104.1666667 |
whisper-large-v3-turbo | INT4-MIXED | 32 | 263.5 | 9.7 | 2240.8 | 103.0927835 |
whisper-large-v3-turbo | INT8-CW | 32 | 247.2 | 9.7 | 2482.7 | 103.0927835 |
codet5-base-sum | INT8-CW | 291 | 22.2 | 9.8 | 1505.7 | 102.0408163 |
codet5-base-sum | INT4-MIXED | 26 | 22.9 | 9.9 | 1453.6 | 101.010101 |
codet5-base-sum | INT4-MIXED | 291 | 21.3 | 9.9 | 1276 | 101.010101 |
codet5-base-sum | INT4-MIXED | 291 | 21.2 | 9.9 | 1232 | 101.010101 |
codet5-base-sum | INT8-CW | 26 | 20 | 10 | 1666.4 | 100 |
codet5-base-sum | FP16 | 26 | 17.7 | 10.3 | 1975.1 | 97.08737864 |
codet5-base-sum | INT4-MIXED | 26 | 27.1 | 10.3 | 1398.8 | 97.08737864 |
minicpm4-0.5b | INT4-MIXED | 32 | 37.2 | 10.3 | 1588.9 | 97.08737864 |
codet5-base-sum | FP16 | 291 | 21.1 | 10.4 | 2012.4 | 96.15384615 |
minicpm4-0.5b | INT4-MIXED | 32 | 41.7 | 11 | 1633.8 | 90.90909091 |
minicpm4-0.5b | INT4-MIXED | 1024 | 62.1 | 11 | 1253.4 | 90.90909091 |
minicpm4-0.5b | INT4-MIXED | 32 | 38.9 | 11 | 1492.6 | 90.90909091 |
minicpm4-0.5b | INT4-MIXED | 1024 | 67.3 | 11.1 | 1283.8 | 90.09009009 |
minicpm4-0.5b | INT4-MIXED | 1024 | 75.9 | 11.3 | 1444.5 | 88.49557522 |
minicpm4-0.5b | INT8-CW | 32 | 38.9 | 11.6 | 1816.9 | 86.20689655 |
gemma-3-270m | INT4-MIXED | 32 | 34.9 | 12.5 | 1624.4 | 80 |
minicpm4-0.5b | INT8-CW | 1024 | 73.8 | 12.6 | 1408.4 | 79.36507937 |
gemma-3-270m | INT4-MIXED | 1024 | 65.7 | 12.8 | 1224.2 | 78.125 |
whisper-small | INT4-MIXED | 32 | 116.7 | 12.8 | 1776.5 | 78.125 |
whisper-small | INT4-MIXED | 32 | 124.9 | 13 | 1667.8 | 76.92307692 |
whisper-small | INT4-MIXED | 1024 | 185.8 | 13.2 | 1582.8 | 75.75757576 |
whisper-small | INT4-MIXED | 1024 | 172.7 | 13.2 | 1454.7 | 75.75757576 |
distil-large-v2 | FP16 | 1024 | 321.4 | 13.4 | 2642.5 | 74.62686567 |
distil-large-v2 | FP16 | 32 | 278.9 | 13.5 | 2610.8 | 74.07407407 |
whisper-large-v3-turbo | FP16 | 32 | 287.6 | 13.6 | 3008.2 | 73.52941176 |
whisper-large-v3-turbo | FP16 | 1024 | 348.9 | 13.6 | 2937.9 | 73.52941176 |
whisper-small | INT4-MIXED | 32 | 125.1 | 13.9 | 1748 | 71.94244604 |
whisper-small | INT4-MIXED | 1024 | 189.3 | 14 | 1552 | 71.42857143 |
whisper-small | INT8-CW | 32 | 124.7 | 14.5 | 1916.3 | 68.96551724 |
whisper-small | INT8-CW | 1024 | 184.6 | 14.5 | 1717.8 | 68.96551724 |
gemma-3-270m | INT8-CW | 32 | 45.7 | 14.6 | 1804 | 68.49315068 |
gemma-3-270m | INT8-CW | 1024 | 60.1 | 15.4 | 1458.1 | 64.93506494 |
gemma-3-270m | FP16 | 1024 | 54.2 | 16.4 | 1452 | 60.97560976 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 64.5 | 16.6 | 2258.2 | 60.24096386 |
whisper-small | FP16 | 32 | 126.7 | 16.6 | 1929.7 | 60.24096386 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 152.9 | 16.8 | 1566.4 | 59.52380952 |
gemma-3-270m | FP16 | 32 | 44.2 | 17.1 | 1522.4 | 58.47953216 |
whisper-small | FP16 | 1024 | 182.5 | 17.1 | 1956.8 | 58.47953216 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 40.5 | 17.2 | 2166.2 | 58.13953488 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 43.1 | 17.5 | 2039.9 | 57.14285714 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 140 | 18.1 | 1627.3 | 55.24861878 |
flan-t5-large-grammar-synthesis | INT4-MIXED | 32 | 43.8 | 18.3 | 2426.6 | 54.64480874 |
flan-t5-large-grammar-synthesis | INT4-MIXED | 1024 | 63.6 | 18.8 | 2701.5 | 53.19148936 |
flan-t5-large-grammar-synthesis | INT8-CW | 32 | 48.7 | 18.8 | 3181.1 | 53.19148936 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 123.3 | 18.9 | 1430.6 | 52.91005291 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 47.8 | 19.3 | 2541.3 | 51.8134715 |
flan-t5-large-grammar-synthesis | INT8-CW | 1024 | 73.4 | 19.8 | 3283.8 | 50.50505051 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 131.6 | 19.9 | 1866.9 | 50.25125628 |
nanollava | INT8-CW | 760 | 243.4 | 20.4 | 3539.5 | 49.01960784 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 31 | 20.5 | 2182.6 | 48.7804878 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 124.9 | 21.5 | 1592 | 46.51162791 |
nanollava | INT4-MIXED | 760 | 260.6 | 22 | 3484.7 | 45.45454545 |
gemma-3-1b-it | INT4-MIXED | 32 | 75 | 22.3 | 2216.5 | 44.84304933 |
llama-3.2-1b-instruct | INT8-CW | 32 | 43.1 | 22.3 | 2633.5 | 44.84304933 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 78.2 | 22.4 | 2451.9 | 44.64285714 |
gemma-3-1b-it | INT4-MIXED | 1024 | 121.8 | 22.7 | 1773.4 | 44.05286344 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 132.6 | 22.7 | 2185 | 44.05286344 |
flan-t5-large-grammar-synthesis | FP16 | 32 | 41.4 | 23 | 4519.4 | 43.47826087 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 163.9 | 23.1 | 1971.3 | 43.29004329 |
minicpm4-0.5b | FP16 | 32 | 71.5 | 23.1 | 2318.3 | 43.29004329 |
gemma-3-1b-it | INT4-MIXED | 32 | 66.5 | 23.6 | 2243.9 | 42.37288136 |
nanollava | INT8-CW | 1752 | 362.3 | 23.6 | 4985.1 | 42.37288136 |
minicpm4-0.5b | FP16 | 1024 | 94.7 | 23.7 | 1728.9 | 42.19409283 |
gemma-3-1b-it | INT4-MIXED | 32 | 81.6 | 23.9 | 2374.2 | 41.84100418 |
smolvlm2-256m-video-instruct | FP16 | 1141 | 682.9 | 23.9 | 3507.4 | 41.84100418 |
nanollava | INT4-MIXED | 1752 | 364.1 | 24 | 4754.7 | 41.66666667 |
gemma-3-1b-it | INT4-MIXED | 1024 | 147.8 | 24.5 | 1854.2 | 40.81632653 |
gemma-3-1b-it | INT4-MIXED | 1024 | 129.4 | 24.6 | 1713.5 | 40.6504065 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 89.4 | 25.2 | 2440.5 | 39.68253968 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 199.2 | 25.5 | 1931.8 | 39.21568627 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 63 | 26 | 2312.8 | 38.46153846 |
glm-edge-1.5b-chat | INT4-MIXED | 32 | 107.8 | 26.3 | 2460.5 | 38.02281369 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 77.1 | 26.4 | 2569 | 37.87878788 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 171.7 | 26.4 | 1781.7 | 37.87878788 |
minicpm-1b-sft | INT4-MIXED | 31 | 102.4 | 26.5 | 2317 | 37.73584906 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 88.7 | 26.6 | 2875.8 | 37.59398496 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 172.6 | 26.7 | 2003 | 37.45318352 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 213 | 26.8 | 2414 | 37.31343284 |
smolvlm2-256m-video-instruct | INT8-CW | 1141 | 626.9 | 26.9 | 3014.8 | 37.17472119 |
glm-edge-1.5b-chat | INT4-MIXED | 1024 | 263.2 | 27.6 | 1879.7 | 36.23188406 |
gemma-2b-it | INT4-MIXED | 32 | 66.3 | 27.8 | 2841.5 | 35.97122302 |
minicpm-1b-sft | INT4-MIXED | 1014 | 188.2 | 27.9 | 1773.4 | 35.84229391 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 74.7 | 28 | 2863.4 | 35.71428571 |
minicpm-1b-sft | INT4-MIXED | 31 | 111.1 | 28.3 | 2359.9 | 35.33568905 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 74 | 28.5 | 3355.1 | 35.0877193 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 175.5 | 28.6 | 2723.3 | 34.96503497 |
gemma-2b-it | INT4-MIXED | 1024 | 228.3 | 28.8 | 2393.4 | 34.72222222 |
gemma-3-1b-it | INT8-CW | 32 | 97.5 | 28.8 | 2555.6 | 34.72222222 |
minicpm-1b-sft | INT4-MIXED | 31 | 114.6 | 28.9 | 2509.7 | 34.60207612 |
minicpm-1b-sft | INT8-CW | 31 | 94.2 | 28.9 | 2936.2 | 34.60207612 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 175.8 | 28.9 | 2404.6 | 34.60207612 |
gemma-3-1b-it | INT8-CW | 1024 | 147.6 | 29.4 | 2049.8 | 34.01360544 |
glm-edge-1.5b-chat | INT8-CW | 32 | 95.9 | 29.4 | 3031.8 | 34.01360544 |
smolvlm2-256m-video-instruct | INT4-MIXED | 1141 | 630.4 | 29.5 | 2988.6 | 33.89830508 |
gemma-2b-it | INT4-MIXED | 32 | 55.9 | 29.6 | 3051.8 | 33.78378378 |
flan-t5-large-grammar-synthesis | FP16 | 1024 | 67.9 | 29.7 | 4481.7 | 33.67003367 |
gemma-2b-it | INT4-MIXED | 1024 | 293.7 | 30.1 | 2546.2 | 33.22259136 |
minicpm-1b-sft | INT4-MIXED | 1014 | 230.9 | 30.5 | 1922.6 | 32.78688525 |
glm-edge-1.5b-chat | INT8-CW | 1024 | 241.4 | 30.8 | 2409.1 | 32.46753247 |
minicpm-1b-sft | INT4-MIXED | 1014 | 199.6 | 31.6 | 1801.1 | 31.64556962 |
minicpm-1b-sft | INT8-CW | 1014 | 202.9 | 31.7 | 2356.2 | 31.54574132 |
nanollava | FP16 | 760 | 301.8 | 32.4 | 4296.9 | 30.86419753 |
nanollava | FP16 | 1752 | 422.3 | 34 | 5973.7 | 29.41176471 |
phi-2 | INT4-MIXED | 32 | 141.6 | 34.1 | 2842.2 | 29.3255132 |
gemma-2-2b | INT4-MIXED | 33 | 98.1 | 34.7 | 2957.5 | 28.8184438 |
gemma-2-2b | INT4-MIXED | 1025 | 291.6 | 35.9 | 2661.5 | 27.8551532 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 93.7 | 35.9 | 3091.4 | 27.8551532 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 106.3 | 36.3 | 3112.6 | 27.54820937 |
phi-2 | INT4-MIXED | 1024 | 391.1 | 36.7 | 2865.2 | 27.2479564 |
gemma-2-2b | INT4-MIXED | 33 | 100.8 | 36.8 | 3181 | 27.17391304 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 143.2 | 36.8 | 2848.2 | 27.17391304 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 324.7 | 36.9 | 2797.1 | 27.100271 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 149.4 | 36.9 | 2834.2 | 27.100271 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 303.1 | 37.2 | 2657.7 | 26.88172043 |
gemma-2-2b | INT4-MIXED | 1025 | 332.8 | 37.6 | 2846.1 | 26.59574468 |
phi-2 | INT4-MIXED | 32 | 146.6 | 37.9 | 3067.2 | 26.38522427 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 98.8 | 38.3 | 3178.1 | 26.10966057 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 122 | 38.5 | 3303.3 | 25.97402597 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 394.5 | 38.6 | 2858.5 | 25.90673575 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 389.8 | 38.6 | 2849.7 | 25.90673575 |
tiny-llama-1.1b-chat | FP16 | 32 | 60.3 | 39 | 3427.6 | 25.64102564 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 421.4 | 39.1 | 2873.9 | 25.57544757 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 122.8 | 39.3 | 3421.7 | 25.44529262 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 110.1 | 39.4 | 3426.2 | 25.38071066 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 486.2 | 39.4 | 2807.8 | 25.38071066 |
tiny-llama-1.1b-chat | FP16 | 1024 | 184.4 | 39.4 | 3013.6 | 25.38071066 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 156.8 | 39.7 | 3157 | 25.18891688 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 123.6 | 39.8 | 3501 | 25.12562814 |
phi-2 | INT4-MIXED | 1024 | 487 | 40.1 | 3098.8 | 24.93765586 |
gemma-2b-it | INT8-CW | 32 | 67.4 | 40.3 | 3814 | 24.81389578 |
gemma-2b-it | INT8-CW | 1024 | 246.4 | 40.9 | 3367.4 | 24.44987775 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 504.6 | 41.4 | 3370 | 24.15458937 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 510 | 41.8 | 3370 | 23.92344498 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 474.8 | 41.8 | 3190.4 | 23.92344498 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 510.3 | 41.9 | 3466.9 | 23.86634845 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 113.7 | 42 | 3457 | 23.80952381 |
gemma-3-1b-it | FP16 | 32 | 92.3 | 42.1 | 3227.1 | 23.75296912 |
gemma-3-1b-it | FP16 | 1024 | 178.4 | 42.4 | 2997.7 | 23.58490566 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 653.4 | 44.2 | 3552.8 | 22.62443439 |
gemma-2-2b | INT8-CW | 33 | 92 | 44.3 | 3957 | 22.57336343 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 89.2 | 45.1 | 3154.7 | 22.172949 |
gemma-2-2b | INT8-CW | 1025 | 297.7 | 45.5 | 3657.9 | 21.97802198 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 121.1 | 45.5 | 3535.2 | 21.97802198 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 131.2 | 45.7 | 3512.5 | 21.88183807 |
llama-3.2-1b-instruct | FP16 | 32 | 54.5 | 45.8 | 3557.3 | 21.83406114 |
phi-2 | INT8-CW | 32 | 127.4 | 45.8 | 4154.6 | 21.83406114 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 139.6 | 46 | 3287 | 21.73913043 |
qwen3-4b | INT4-MIXED | 32 | 150 | 46.1 | 3522.3 | 21.69197397 |
llama-3.2-1b-instruct | FP16 | 1024 | 202.1 | 46.4 | 3332.1 | 21.55172414 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 334.3 | 46.4 | 2843.4 | 21.55172414 |
phi-4-mini-instruct | INT4-MIXED | 32 | 141.4 | 46.4 | 3537.8 | 21.55172414 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 135.1 | 46.5 | 4243.2 | 21.50537634 |
stablelm-3b-4e1t | INT8-CW | 32 | 137.2 | 46.5 | 4144.2 | 21.50537634 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 140.1 | 46.7 | 3446.2 | 21.41327623 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 425.9 | 47.1 | 3287.3 | 21.23142251 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 423.4 | 47.1 | 3286 | 21.23142251 |
phi-2 | INT8-CW | 1024 | 407.7 | 48.1 | 4077.8 | 20.79002079 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 655.9 | 48.2 | 3637.5 | 20.74688797 |
afm-4.5b | INT4-MIXED | 32 | 127.4 | 48.3 | 4145.2 | 20.70393375 |
qwen3-4b | INT4-MIXED | 1024 | 479.8 | 48.5 | 3308.2 | 20.6185567 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 159.6 | 48.7 | 3705.2 | 20.5338809 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 671.4 | 48.8 | 3559.8 | 20.49180328 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 419.7 | 48.8 | 3309.3 | 20.49180328 |
stablelm-3b-4e1t | INT8-CW | 1024 | 415.1 | 48.9 | 4092.6 | 20.44989775 |
afm-4.5b | INT4-MIXED | 1024 | 558 | 49.2 | 3735.3 | 20.32520325 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 412.6 | 49.3 | 4223.7 | 20.28397566 |
internvl2-4b | INT4-MIXED | 297 | 317.6 | 49.4 | 4276.1 | 20.24291498 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 535.1 | 50.5 | 3452.1 | 19.8019802 |
llama-3.2-3b-instruct | INT8-CW | 32 | 90.2 | 51 | 4561.4 | 19.60784314 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 113.6 | 51.4 | 4461.1 | 19.45525292 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 105.2 | 51.6 | 3345.5 | 19.37984496 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 106.8 | 51.7 | 3355.2 | 19.34235977 |
phi-4-mini-instruct | INT4-MIXED | 32 | 160.3 | 52 | 3864.3 | 19.23076923 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 354.2 | 52.3 | 4271.7 | 19.12045889 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 698.5 | 52.3 | 5042.9 | 19.12045889 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 405.8 | 52.3 | 4001.2 | 19.12045889 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 872.8 | 52.4 | 5254.7 | 19.08396947 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 66.8 | 52.7 | 3056.7 | 18.97533207 |
chatglm3-6b | INT4-MIXED | 32 | 90.9 | 52.9 | 4587.8 | 18.90359168 |
internvl2-4b | INT4-MIXED | 1027 | 812.3 | 53.1 | 5112 | 18.83239171 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 563 | 53.6 | 3554.3 | 18.65671642 |
chatglm3-6b | INT4-MIXED | 1024 | 607.2 | 53.9 | 4178.8 | 18.5528757 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 308 | 54 | 2611.4 | 18.51851852 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 503 | 54.4 | 3426.3 | 18.38235294 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 503.1 | 54.5 | 3412.1 | 18.34862385 |
minicpm-1b-sft | FP16 | 31 | 152.5 | 55.1 | 3885.2 | 18.14882033 |
chatglm3-6b | INT4-MIXED | 32 | 92.1 | 55.8 | 4827.9 | 17.92114695 |
minicpm-1b-sft | FP16 | 1014 | 280.5 | 55.8 | 3849.4 | 17.92114695 |
phi-4-mini-instruct | INT4-MIXED | 32 | 138.5 | 55.9 | 3641.7 | 17.88908766 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 144.5 | 56 | 3657 | 17.85714286 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 86.8 | 56.7 | 4630.7 | 17.6366843 |
internvl2-4b | INT4-MIXED | 297 | 306.6 | 56.9 | 4261 | 17.57469244 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 443.5 | 57.1 | 3335 | 17.51313485 |
qwen2.5-1.5b-instruct | FP16 | 32 | 84.3 | 57.1 | 4096 | 17.51313485 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 253.8 | 57.2 | 4365.2 | 17.48251748 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 443.2 | 57.2 | 3331.3 | 17.48251748 |
glm-edge-4b-chat | INT4-MIXED | 32 | 230.1 | 57.3 | 3892.8 | 17.45200698 |
chatglm3-6b | INT4-MIXED | 1024 | 848.8 | 57.4 | 4402.1 | 17.42160279 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 247.6 | 57.4 | 3827.1 | 17.42160279 |
whisper-large-v3 | INT4-MIXED | 32 | 485 | 57.7 | 3586 | 17.33102253 |
whisper-large-v3 | INT8-CW | 1024 | 511.6 | 57.8 | 3761.8 | 17.30103806 |
glm-edge-1.5b-chat | FP16 | 32 | 137.2 | 57.9 | 4261.6 | 17.27115717 |
whisper-large-v3 | INT8-CW | 32 | 500.5 | 58 | 4232.8 | 17.24137931 |
codellama-7b | INT4-MIXED | 32 | 102.6 | 58.2 | 4712.1 | 17.18213058 |
whisper-large-v3 | INT4-MIXED | 1024 | 514.1 | 58.4 | 3295.1 | 17.12328767 |
flan-t5-xxl | INT4-MIXED | 33 | 78.2 | 58.6 | 13267.6 | 17.06484642 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 107 | 58.6 | 5175.1 | 17.06484642 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 112.7 | 58.7 | 4714.6 | 17.03577513 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 108.5 | 58.8 | 5100.8 | 17.00680272 |
baichuan2-7b-chat | INT4-MIXED | 32 | 98.1 | 58.9 | 6354.6 | 16.97792869 |
glm-edge-4b-chat | INT4-MIXED | 1024 | 701.5 | 58.9 | 3598 | 16.97792869 |
phi-3.5-mini-instruct | INT8-CW | 32 | 106.9 | 58.9 | 5147 | 16.97792869 |
glm-edge-1.5b-chat | FP16 | 1024 | 310.9 | 59.7 | 3917.2 | 16.75041876 |
qwen3-4b | INT4-MIXED | 32 | 122.6 | 60.3 | 3691.7 | 16.58374793 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 113.6 | 61.1 | 4937.8 | 16.36661211 |
zephyr-7b-beta | INT4-MIXED | 32 | 111.8 | 61.1 | 4949.6 | 16.36661211 |
codellama-7b | INT4-MIXED | 32 | 116.8 | 61.2 | 4968.6 | 16.33986928 |
biomistral-7b-slerp | INT4-MIXED | 7 | 109.2 | 61.4 | 4935.6 | 16.28664495 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 545.4 | 61.6 | 5112 | 16.23376623 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 540.3 | 61.7 | 5120.9 | 16.20745543 |
gemma-3-4b-it | INT4-MIXED | 32 | 178.8 | 61.9 | 4932.9 | 16.15508885 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 543.1 | 61.9 | 5106.8 | 16.15508885 |
internvl2-4b | INT4-MIXED | 1027 | 729.9 | 62 | 4958.7 | 16.12903226 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 120.6 | 62 | 4942.4 | 16.12903226 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 109 | 62.2 | 4954 | 16.07717042 |
phi-4-mini-instruct | INT8-CW | 32 | 146 | 62.4 | 5081.5 | 16.02564103 |
phi-4-mini-reasoning | INT8-CW | 32 | 127.8 | 62.4 | 5060.8 | 16.02564103 |
qwen3-4b | INT4-MIXED | 1024 | 482.2 | 62.4 | 3442.7 | 16.02564103 |
codellama-7b | INT4-MIXED | 1024 | 709.7 | 62.6 | 4967.8 | 15.97444089 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 768 | 62.7 | 6614.9 | 15.94896332 |
falcon-7b-instruct | INT4-MIXED | 32 | 118.6 | 62.8 | 4801.7 | 15.92356688 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 725.8 | 62.8 | 4726.6 | 15.92356688 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 708.8 | 63 | 4968.6 | 15.87301587 |
zephyr-7b-beta | INT4-MIXED | 1024 | 734.9 | 63.2 | 4726.9 | 15.82278481 |
gemma-3-4b-it | INT4-MIXED | 32 | 211 | 63.5 | 5118.4 | 15.7480315 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 729.2 | 63.5 | 4729.9 | 15.7480315 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 733.7 | 63.5 | 4732.8 | 15.7480315 |
falcon-7b-instruct | INT4-MIXED | 1024 | 733.3 | 63.6 | 4375.9 | 15.72327044 |
phi-4-mini-instruct | INT8-CW | 1024 | 463.9 | 63.7 | 4840.9 | 15.69858713 |
phi-4-mini-reasoning | INT8-CW | 1024 | 461.9 | 63.7 | 4843.8 | 15.69858713 |
gemma-3-4b-it | INT4-MIXED | 1024 | 572.5 | 64.7 | 6744.2 | 15.45595054 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 101.1 | 64.8 | 5522.2 | 15.43209877 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 100.3 | 64.8 | 5537.9 | 15.43209877 |
qwen3-4b | INT8-CW | 32 | 122.8 | 64.9 | 5276.8 | 15.40832049 |
internvl2-4b | INT8-CW | 297 | 303.2 | 65 | 5784.3 | 15.38461538 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 102.1 | 65 | 5537.8 | 15.38461538 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 115.7 | 65.1 | 5118 | 15.3609831 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 113.8 | 65.2 | 5184.6 | 15.33742331 |
biomistral-7b-slerp | INT4-MIXED | 7 | 101.9 | 65.3 | 5257.9 | 15.31393568 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 652.1 | 65.4 | 5168.2 | 15.29051988 |
codellama-7b | INT4-MIXED | 1024 | 972 | 65.6 | 5129.2 | 15.24390244 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 125.4 | 65.6 | 5121.9 | 15.24390244 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 645.9 | 65.6 | 5180.9 | 15.24390244 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 661.6 | 65.6 | 5178.1 | 15.24390244 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 125.8 | 65.7 | 5097.2 | 15.22070015 |
bloomz-7b1 | INT4-MIXED | 32 | 107.8 | 65.9 | 5255.9 | 15.17450683 |
qwen3-vl-4b-thinking | INT4-MIXED | 4909 | 8353.3 | 66 | 14334.2 | 15.15151515 |
gemma-3-4b-it | INT4-MIXED | 1024 | 687.7 | 66.8 | 6933.3 | 14.97005988 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 111.6 | 66.9 | 5709.2 | 14.94768311 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 988.8 | 67 | 4890.5 | 14.92537313 |
qwen3-4b | INT8-CW | 1024 | 502.8 | 67 | 5087.2 | 14.92537313 |
qwen3-vl-4b-thinking | INT4-MIXED | 4939 | 8254.8 | 67 | 16940.5 | 14.92537313 |
llama-3-8b-instruct | INT4-MIXED | 32 | 115.3 | 67.1 | 5736.4 | 14.90312966 |
phi-3.5-vision-instruct | INT8-CW | 802 | 594.4 | 67.1 | 6514.1 | 14.90312966 |
afm-4.5b | INT8-CW | 32 | 108 | 67.3 | 5825.3 | 14.85884101 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 1004 | 67.3 | 4869.9 | 14.85884101 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 122.2 | 67.4 | 5718.4 | 14.83679525 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 1000.3 | 67.5 | 4889.5 | 14.81481481 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 1019.4 | 67.7 | 4988.9 | 14.77104874 |
minicpm-v-2_6 | INT4-MIXED | 228 | 743.2 | 68.2 | 6749.1 | 14.6627566 |
whisper-large-v3 | FP16 | 32 | 507.2 | 68.2 | 5213.9 | 14.6627566 |
whisper-large-v3 | FP16 | 1024 | 596.5 | 68.3 | 4894.2 | 14.64128843 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 693.4 | 68.4 | 6162.3 | 14.61988304 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 115.1 | 68.5 | 5778.3 | 14.59854015 |
afm-4.5b | INT8-CW | 1024 | 440.7 | 68.6 | 5471.2 | 14.57725948 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 793.2 | 68.6 | 6591.6 | 14.57725948 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 109.9 | 68.6 | 5777.1 | 14.57725948 |
internvl2-4b | INT8-CW | 1027 | 740 | 68.7 | 6491.5 | 14.55604076 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 122.1 | 68.7 | 5697.7 | 14.55604076 |
qwen2-7b-instruct | INT4-MIXED | 32 | 116.3 | 68.8 | 5685 | 14.53488372 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 736.3 | 69.1 | 5509.9 | 14.47178003 |
minicpm4-8b | INT4-MIXED | 32 | 124 | 69.1 | 5587.8 | 14.47178003 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 726.1 | 69.2 | 5513.2 | 14.45086705 |
bloomz-7b1 | INT4-MIXED | 32 | 109.1 | 69.3 | 5436.7 | 14.43001443 |
minicpm-o-2_6 | INT4-MIXED | 238 | 743.6 | 69.3 | 6851.2 | 14.43001443 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 729.9 | 69.4 | 5504.4 | 14.4092219 |
qwen3-vl-4b-thinking | INT4-MIXED | 4909 | 8859 | 69.4 | 14496.9 | 14.4092219 |
minicpm3-4b | INT4-MIXED | 32 | 340.9 | 69.5 | 3629.7 | 14.38848921 |
qwen3-vl-4b-thinking | INT4-MIXED | 4939 | 8708.8 | 69.5 | 17092.5 | 14.38848921 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 753 | 69.6 | 6185.6 | 14.36781609 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 903.6 | 69.7 | 5342.7 | 14.3472023 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 892.9 | 69.7 | 5342.4 | 14.3472023 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 882.9 | 69.8 | 5438.3 | 14.32664756 |
glm-edge-4b-chat | INT8-CW | 32 | 212.4 | 69.8 | 5580.3 | 14.32664756 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 892.6 | 69.8 | 5434.8 | 14.32664756 |
bloomz-7b1 | INT4-MIXED | 1024 | 784.1 | 70.1 | 5458.3 | 14.26533524 |
minicpm4-8b | INT4-MIXED | 1024 | 794.5 | 70.1 | 5178.5 | 14.26533524 |
qwen3-8b | INT4-MIXED | 32 | 148.3 | 70.1 | 5895.8 | 14.26533524 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1573.2 | 70.2 | 8029.9 | 14.24501425 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1774.1 | 70.5 | 8405.6 | 14.18439716 |
llama-3-8b-instruct | INT4-MIXED | 32 | 124.8 | 71.1 | 5899.1 | 14.06469761 |
qwen3-8b | INT4-MIXED | 1024 | 778 | 71.9 | 5712.7 | 13.90820584 |
glm-edge-4b-chat | INT8-CW | 1024 | 613.2 | 72.1 | 5334.6 | 13.86962552 |
falcon-7b-instruct | INT4-MIXED | 32 | 122 | 72.3 | 5254.9 | 13.83125864 |
gemma-3-4b-it | INT4-MIXED | 32 | 180.5 | 72.3 | 5068.8 | 13.83125864 |
minicpm-v-2_6 | INT4-MIXED | 228 | 802 | 73 | 6984.7 | 13.69863014 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 979.9 | 73.1 | 5678.3 | 13.67989056 |
falcon-7b-instruct | INT4-MIXED | 1024 | 978.3 | 73.2 | 4786.3 | 13.66120219 |
minicpm4-8b | INT4-MIXED | 32 | 133.8 | 73.2 | 5864.7 | 13.66120219 |
bloomz-7b1 | INT4-MIXED | 1024 | 1024.4 | 73.4 | 5601.2 | 13.6239782 |
minicpm-o-2_6 | INT4-MIXED | 238 | 804.1 | 73.5 | 6734.8 | 13.60544218 |
qwen3-8b | INT4-MIXED | 32 | 136 | 73.8 | 6176.3 | 13.5501355 |
minicpm4-8b | INT4-MIXED | 1024 | 1078.9 | 74.4 | 5444.7 | 13.44086022 |
minicpm-v-4_5 | INT4-MIXED | 217 | 792.8 | 75.7 | 7180.4 | 13.21003963 |
qwen3-8b | INT4-MIXED | 1024 | 1034 | 75.7 | 5959.2 | 13.21003963 |
minicpm3-4b | INT4-MIXED | 32 | 349.3 | 76.2 | 3819.4 | 13.12335958 |
gemma-3-4b-it | INT8-CW | 32 | 182.1 | 76.3 | 6465.5 | 13.1061599 |
gemma-3-4b-it | INT4-MIXED | 1024 | 594.6 | 77.6 | 6893.1 | 12.88659794 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 124.4 | 78.2 | 6492 | 12.78772379 |
minicpm3-4b | INT4-MIXED | 1024 | 834.6 | 78.4 | 4424.7 | 12.75510204 |
minicpm-v-4_5 | INT4-MIXED | 217 | 831.8 | 79 | 7412.9 | 12.65822785 |
minicpm3-4b | INT4-MIXED | 32 | 301.7 | 79.6 | 3738.4 | 12.56281407 |
gemma-7b-it | INT4-MIXED | 32 | 108.8 | 79.8 | 5895.4 | 12.53132832 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 890.5 | 79.9 | 6112.2 | 12.51564456 |
gemma-3-4b-it | INT8-CW | 1024 | 611 | 80.6 | 8310.2 | 12.40694789 |
qwen3-vl-4b-thinking | INT4-MIXED | 4939 | 8401.5 | 81.1 | 17067.5 | 12.33045623 |
baichuan2-7b-chat | INT8-CW | 32 | 103.3 | 81.3 | 8607.3 | 12.300123 |
qwen3-vl-4b-thinking | INT4-MIXED | 4909 | 8568.3 | 81.4 | 14372.9 | 12.28501229 |
gemma-2b-it | FP16 | 32 | 95.6 | 81.9 | 6044.5 | 12.21001221 |
minicpm3-4b | INT8-CW | 32 | 325.9 | 81.9 | 5515.4 | 12.21001221 |
gemma-2b-it | FP16 | 1024 | 398.3 | 82.1 | 5798.5 | 12.18026797 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 149.3 | 82.4 | 6789.5 | 12.13592233 |
gemma-7b-it | INT4-MIXED | 1024 | 842.4 | 82.5 | 6108.5 | 12.12121212 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 101.7 | 82.9 | 4815.8 | 12.06272618 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 95.8 | 83.4 | 5645.9 | 11.99040767 |
zephyr-7b-beta | INT4-MIXED | 32 | 113.1 | 83.5 | 5829 | 11.9760479 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 138.7 | 83.6 | 6778.1 | 11.96172249 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 141.3 | 83.8 | 6682.7 | 11.93317422 |
minicpm3-4b | INT4-MIXED | 1024 | 959.9 | 83.8 | 4656.1 | 11.93317422 |
phi-4-multimodal-instruct | INT8-CW | 578 | 725.5 | 83.8 | 8060.8 | 11.93317422 |
phi-4-multimodal-instruct | INT8-CW | 786 | 798.8 | 83.8 | 8200.9 | 11.93317422 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 328.8 | 83.8 | 6553.8 | 11.93317422 |
gemma-7b-it | INT4-MIXED | 32 | 113.4 | 84.2 | 6166.6 | 11.87648456 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 1210.2 | 84.2 | 6401.6 | 11.87648456 |
qwen2-7b-instruct | INT4-MIXED | 32 | 104.9 | 84.2 | 5623.8 | 11.87648456 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 102.8 | 84.2 | 5623.9 | 11.87648456 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 103.4 | 84.4 | 5032 | 11.84834123 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 101.1 | 84.5 | 5071.2 | 11.83431953 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 660.1 | 84.6 | 5283.4 | 11.82033097 |
qwen3-vl-4b-thinking | INT8-CW | 4939 | 8513.4 | 84.7 | 18797.4 | 11.80637544 |
qwen3-vl-4b-thinking | INT8-CW | 4909 | 8608.3 | 84.8 | 16095 | 11.79245283 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 1679 | 85 | 9867 | 11.76470588 |
baichuan2-7b-chat | INT8-CW | 1024 | 603.2 | 85.1 | 8894.7 | 11.75088132 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 1861.7 | 85.1 | 10623.9 | 11.75088132 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 942 | 85.5 | 6584.5 | 11.69590643 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 657.5 | 85.6 | 5285.4 | 11.68224299 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 915.2 | 85.6 | 7543.6 | 11.68224299 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 946.4 | 85.7 | 6474.4 | 11.66861144 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 670.7 | 85.7 | 5283.7 | 11.66861144 |
zephyr-7b-beta | INT4-MIXED | 1024 | 733.1 | 85.7 | 5665.4 | 11.66861144 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 720.1 | 86.6 | 4829.1 | 11.54734411 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 4277.4 | 86.8 | 9040.3 | 11.52073733 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 726.9 | 86.8 | 4835.9 | 11.52073733 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 706.3 | 86.9 | 5065.4 | 11.50747986 |
minicpm3-4b | INT4-MIXED | 1024 | 865.2 | 87 | 4566.5 | 11.49425287 |
minicpm-v-2_6 | INT4-MIXED | 228 | 717 | 87.3 | 6844.7 | 11.45475372 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 791.6 | 87.4 | 6823.2 | 11.4416476 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 350.5 | 87.4 | 6701.9 | 11.4416476 |
gemma-7b-it | INT4-MIXED | 1024 | 1139.7 | 87.6 | 6374.4 | 11.41552511 |
chatglm3-6b | INT8-CW | 32 | 120.9 | 88 | 7413.5 | 11.36363636 |
gemma-2-2b | FP16 | 33 | 115.7 | 88.3 | 7447.3 | 11.32502831 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1920.2 | 88.3 | 8544.4 | 11.32502831 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 971.2 | 88.5 | 7181.3 | 11.29943503 |
gemma-2-9b-it | INT4-MIXED | 32 | 170.4 | 88.6 | 6361.2 | 11.28668172 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 102 | 88.9 | 5812.6 | 11.24859393 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 2152.8 | 88.9 | 9329.7 | 11.24859393 |
chatglm3-6b | INT8-CW | 1024 | 620.4 | 89.5 | 7040.1 | 11.17318436 |
gemma-2-2b | FP16 | 1025 | 464.7 | 89.7 | 7430.9 | 11.14827202 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 1123.4 | 89.7 | 7703.3 | 11.14827202 |
minicpm3-4b | INT8-CW | 1024 | 893.1 | 89.8 | 6342.4 | 11.13585746 |
llama-3-8b-instruct | INT4-MIXED | 32 | 106.2 | 90.2 | 5827.8 | 11.0864745 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 733.8 | 90.4 | 5608.9 | 11.0619469 |
llama-3-8b-instruct | INT4-MIXED | 32 | 130 | 90.4 | 5816.9 | 11.0619469 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 113.5 | 90.4 | 5839.3 | 11.0619469 |
phi-2 | FP16 | 32 | 125.9 | 90.8 | 6745.8 | 11.01321586 |
gemma-2-9b-it | INT4-MIXED | 1024 | 997.1 | 92.2 | 6452.3 | 10.84598698 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 724.9 | 92.3 | 5606.4 | 10.83423619 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 737.2 | 92.4 | 5612.2 | 10.82251082 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 728.5 | 92.4 | 5608 | 10.82251082 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 92 | 92.4 | 6785.5 | 10.82251082 |
qwen3-8b | INT4-MIXED | 32 | 149.9 | 93 | 6076.2 | 10.75268817 |
stable-zephyr-3b-dpo | FP16 | 32 | 153.5 | 93.6 | 6664.6 | 10.68376068 |
gemma-2-9b-it | INT4-MIXED | 32 | 173.4 | 93.7 | 6653.9 | 10.67235859 |
minicpm4-8b | INT4-MIXED | 32 | 133.8 | 94.4 | 5777.2 | 10.59322034 |
phi-2 | FP16 | 1024 | 599.9 | 94.9 | 7085.8 | 10.5374078 |
qwen3-8b | INT4-MIXED | 1024 | 774 | 95.5 | 5884.6 | 10.47120419 |
minicpm4-8b | INT4-MIXED | 1024 | 798 | 95.6 | 5376 | 10.46025105 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 100.5 | 96.4 | 6971.5 | 10.37344398 |
codellama-7b | INT8-CW | 32 | 127.9 | 96.6 | 7829.5 | 10.35196687 |
stable-zephyr-3b-dpo | FP16 | 1024 | 613.4 | 97.1 | 7105 | 10.29866117 |
gemma-2-9b-it | INT4-MIXED | 1024 | 1322.4 | 97.7 | 6768.1 | 10.23541453 |
ltx-video | INT8-CW | 11 | 95.7 | 98.4 | 9779.2 | 10.16260163 |
llama-2-7b-chat-hf | INT8-CW | 32 | 136.3 | 98.8 | 7925.9 | 10.12145749 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec |
|---|---|---|---|---|---|---|
distil-large-v2 | INT4-MIXED | 32 | 165.2 | 5.9 | 1578.7 | 169.4915254 |
distil-large-v2 | INT4-MIXED | 1024 | 209.7 | 6 | 1609.8 | 166.6666667 |
distil-large-v2 | INT8-CW | 1024 | 209 | 6.1 | 1868.3 | 163.9344262 |
distil-large-v2 | INT8-CW | 32 | 164.3 | 6.1 | 1835.8 | 163.9344262 |
gemma-3-270m | INT4-MIXED | 32 | 29 | 7.2 | 1257.8 | 138.8888889 |
minicpm4-0.5b | INT4-MIXED | 32 | 28.5 | 7.2 | 1117.5 | 138.8888889 |
gemma-3-270m | INT4-MIXED | 1024 | 34 | 7.3 | 1326.7 | 136.9863014 |
minicpm4-0.5b | INT4-MIXED | 1024 | 44.2 | 7.3 | 1166.3 | 136.9863014 |
gemma-3-270m | INT8-CW | 32 | 30.1 | 7.3 | 1310.4 | 136.9863014 |
minicpm4-0.5b | INT4-MIXED | 32 | 27.9 | 7.3 | 1131.4 | 136.9863014 |
gemma-3-270m | INT8-CW | 1024 | 34.8 | 7.4 | 1351.6 | 135.1351351 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 25.3 | 7.4 | 1557.4 | 135.1351351 |
minicpm4-0.5b | INT4-MIXED | 1024 | 47.3 | 7.6 | 1157.1 | 131.5789474 |
minicpm4-0.5b | INT8-CW | 32 | 30.8 | 7.6 | 1283.3 | 131.5789474 |
minicpm4-0.5b | INT4-MIXED | 1024 | 53.8 | 7.8 | 1312.8 | 128.2051282 |
minicpm4-0.5b | INT8-CW | 1024 | 51 | 7.8 | 1331.8 | 128.2051282 |
minicpm4-0.5b | INT4-MIXED | 32 | 30 | 7.8 | 1247.3 | 128.2051282 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 88.3 | 8 | 1606.4 | 125 |
gemma-3-270m | FP16 | 1024 | 31 | 8.1 | 1626.7 | 123.4567901 |
gemma-3-270m | FP16 | 32 | 24.8 | 8.1 | 1526.7 | 123.4567901 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 26.6 | 8.1 | 1638 | 123.4567901 |
distil-large-v2 | FP16 | 1024 | 209.3 | 8.2 | 2639.9 | 121.9512195 |
distil-large-v2 | FP16 | 32 | 164.5 | 8.2 | 2607.5 | 121.9512195 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 110.9 | 8.7 | 1736.2 | 114.9425287 |
codet5-base-sum | INT4-MIXED | 291 | 20.4 | 8.9 | 1235.6 | 112.3595506 |
codet5-base-sum | FP16 | 26 | 15.9 | 8.9 | 1914.2 | 112.3595506 |
codet5-base-sum | FP16 | 291 | 16.1 | 9 | 1970.1 | 111.1111111 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 22 | 9.4 | 1611.4 | 106.3829787 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 22.7 | 9.5 | 1626.4 | 105.2631579 |
codet5-base-sum | INT4-MIXED | 26 | 19.4 | 9.5 | 1326.7 | 105.2631579 |
codet5-base-sum | INT8-CW | 26 | 19.9 | 9.5 | 1471.8 | 105.2631579 |
codet5-base-sum | INT4-MIXED | 26 | 18.9 | 9.6 | 1183.9 | 104.1666667 |
codet5-base-sum | INT4-MIXED | 291 | 21.1 | 9.8 | 1365.9 | 102.0408163 |
codet5-base-sum | INT8-CW | 291 | 20.8 | 9.9 | 1522 | 101.010101 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 85.4 | 10 | 1695.3 | 100 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 98.2 | 10.1 | 1760.9 | 99.00990099 |
minicpm4-0.5b | FP16 | 32 | 25.1 | 10.8 | 1765.2 | 92.59259259 |
minicpm4-0.5b | FP16 | 1024 | 61 | 11.1 | 1813.2 | 90.09009009 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 33.9 | 11.1 | 1958.6 | 90.09009009 |
gemma-3-1b-it | INT4-MIXED | 32 | 37.8 | 11.3 | 1595.3 | 88.49557522 |
gemma-3-1b-it | INT4-MIXED | 32 | 38 | 11.3 | 1692.9 | 88.49557522 |
gemma-3-1b-it | INT4-MIXED | 1024 | 88.9 | 11.5 | 1700.7 | 86.95652174 |
gemma-3-1b-it | INT4-MIXED | 1024 | 91.8 | 11.5 | 1799.7 | 86.95652174 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 34.3 | 11.5 | 2012.5 | 86.95652174 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 34.2 | 11.5 | 1790.4 | 86.95652174 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 34.5 | 11.5 | 1791.8 | 86.95652174 |
gemma-3-1b-it | INT4-MIXED | 32 | 39.2 | 11.7 | 1778.1 | 85.47008547 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 126.5 | 11.8 | 2055.8 | 84.74576271 |
gemma-3-1b-it | INT4-MIXED | 1024 | 97.1 | 11.8 | 1868 | 84.74576271 |
nanollava | INT4-MIXED | 760 | 133.1 | 11.8 | 2985.7 | 84.74576271 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 27.9 | 11.8 | 1980.6 | 84.74576271 |
nanollava | INT8-CW | 760 | 134.6 | 12 | 3084.5 | 83.33333333 |
nanollava | INT4-MIXED | 1752 | 230.1 | 12.1 | 4775.2 | 82.6446281 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 136.9 | 12.1 | 2206 | 82.6446281 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 137.3 | 12.2 | 1977.8 | 81.96721311 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 136.1 | 12.2 | 1959.4 | 81.96721311 |
nanollava | INT8-CW | 1752 | 223.3 | 12.3 | 4759.4 | 81.30081301 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 38.1 | 12.4 | 1902.1 | 80.64516129 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 100.2 | 12.5 | 2061.5 | 80 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 37.8 | 12.5 | 1900.7 | 80 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 145.1 | 13 | 2057 | 76.92307692 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 145.7 | 13.1 | 2060.4 | 76.33587786 |
nanollava | FP16 | 760 | 129.1 | 13.1 | 4024.5 | 76.33587786 |
gemma-3-1b-it | INT8-CW | 32 | 42 | 13.5 | 2023.4 | 74.07407407 |
flan-t5-large-grammar-synthesis | FP16 | 32 | 35.6 | 13.9 | 4197.8 | 71.94244604 |
gemma-3-1b-it | INT8-CW | 1024 | 99.8 | 14.2 | 2119.6 | 70.42253521 |
flan-t5-large-grammar-synthesis | FP16 | 1024 | 33.9 | 14.3 | 4685.5 | 69.93006993 |
llama-3.2-1b-instruct | INT8-CW | 32 | 25.3 | 14.4 | 2129.5 | 69.44444444 |
nanollava | FP16 | 1752 | 236.1 | 14.5 | 5889.7 | 68.96551724 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 105.2 | 14.9 | 2248.8 | 67.11409396 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 38.4 | 14.9 | 2444.3 | 67.11409396 |
flan-t5-large-grammar-synthesis | INT4-MIXED | 32 | 38.9 | 15.1 | 2105.4 | 66.22516556 |
flan-t5-large-grammar-synthesis | INT8-CW | 32 | 42.9 | 15.4 | 2706.9 | 64.93506494 |
flan-t5-large-grammar-synthesis | INT4-MIXED | 1024 | 41.5 | 15.5 | 2614.1 | 64.51612903 |
flan-t5-large-grammar-synthesis | INT8-CW | 1024 | 42.9 | 15.5 | 3215.6 | 64.51612903 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 153.5 | 15.6 | 2511.5 | 64.1025641 |
minicpm-1b-sft | INT4-MIXED | 31 | 61.9 | 16 | 1709 | 62.5 |
minicpm-1b-sft | INT4-MIXED | 31 | 60.5 | 16.3 | 1819.8 | 61.34969325 |
minicpm-1b-sft | INT4-MIXED | 1014 | 136 | 16.4 | 1891.8 | 60.97560976 |
minicpm-1b-sft | INT4-MIXED | 1014 | 140 | 16.7 | 1986.7 | 59.88023952 |
minicpm-1b-sft | INT4-MIXED | 31 | 66 | 16.8 | 1875.8 | 59.52380952 |
minicpm-1b-sft | INT4-MIXED | 1014 | 159.6 | 17.1 | 2006.7 | 58.47953216 |
gemma-2b-it | INT4-MIXED | 32 | 29.3 | 17.1 | 2359.4 | 58.47953216 |
smolvlm2-256m-video-instruct | FP16 | 1141 | 384.7 | 17.2 | 3353.8 | 58.13953488 |
smolvlm2-256m-video-instruct | INT8-CW | 1141 | 391.5 | 17.6 | 2935.6 | 56.81818182 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 38.7 | 17.8 | 2372.7 | 56.17977528 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 39.8 | 17.8 | 2374.1 | 56.17977528 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 41.7 | 17.9 | 2696.1 | 55.86592179 |
gemma-2b-it | INT4-MIXED | 32 | 30.9 | 17.9 | 2561.3 | 55.86592179 |
smolvlm2-256m-video-instruct | INT4-MIXED | 1141 | 407.1 | 18.3 | 2839 | 54.64480874 |
gemma-2b-it | INT4-MIXED | 1024 | 151.3 | 18.4 | 2497.3 | 54.34782609 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 133.3 | 18.5 | 2792.6 | 54.05405405 |
phi-2 | INT4-MIXED | 32 | 58.9 | 18.5 | 2522.9 | 54.05405405 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 133 | 18.6 | 2474.9 | 53.76344086 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 133.5 | 18.6 | 2477.1 | 53.76344086 |
minicpm-1b-sft | INT8-CW | 31 | 72.1 | 18.6 | 2416.8 | 53.76344086 |
gemma-2b-it | INT4-MIXED | 1024 | 207.6 | 19.2 | 2659.7 | 52.08333333 |
gemma-2-2b | INT4-MIXED | 33 | 49.7 | 19.3 | 2536.8 | 51.8134715 |
minicpm-1b-sft | INT8-CW | 1014 | 150.3 | 20.1 | 2581.6 | 49.75124378 |
phi-2 | INT4-MIXED | 32 | 56 | 20.2 | 2873 | 49.5049505 |
gemma-2-2b | INT4-MIXED | 33 | 44.4 | 20.6 | 2771.3 | 48.54368932 |
phi-2 | INT4-MIXED | 1024 | 310.8 | 20.9 | 3058.4 | 47.84688995 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 62.7 | 21 | 2553.1 | 47.61904762 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 47.9 | 21 | 2640.8 | 47.61904762 |
gemma-2-2b | INT4-MIXED | 1025 | 190.4 | 21.3 | 2802.8 | 46.94835681 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 85.2 | 21.3 | 2145.9 | 46.94835681 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 186.6 | 21.5 | 2345.2 | 46.51162791 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 78.8 | 21.5 | 2242.5 | 46.51162791 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 47.1 | 21.6 | 2670.1 | 46.2962963 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 191.6 | 21.8 | 2437 | 45.87155963 |
gemma-2-2b | INT4-MIXED | 1025 | 230 | 22.2 | 2975.9 | 45.04504505 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 231.9 | 22.4 | 2679.1 | 44.64285714 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 239.9 | 22.5 | 2725.3 | 44.44444444 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 43.3 | 22.5 | 2754.8 | 44.44444444 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 55.1 | 22.5 | 2910.8 | 44.44444444 |
phi-2 | INT4-MIXED | 1024 | 374.6 | 22.6 | 3268.8 | 44.24778761 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 40.5 | 22.9 | 2805.1 | 43.66812227 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 243.6 | 23.3 | 2936.5 | 42.91845494 |
gemma-3-1b-it | FP16 | 32 | 46.4 | 23.3 | 2963.8 | 42.91845494 |
gemma-3-1b-it | FP16 | 1024 | 112.7 | 23.7 | 3126.1 | 42.19409283 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 428.9 | 23.8 | 2906.6 | 42.01680672 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 57.9 | 23.9 | 2995.1 | 41.84100418 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 57.7 | 23.9 | 3086.4 | 41.84100418 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 59.9 | 23.9 | 2993.6 | 41.84100418 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 252 | 24.2 | 2955.4 | 41.32231405 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 91.6 | 24.3 | 2813.1 | 41.15226337 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 275.4 | 24.5 | 2994.2 | 40.81632653 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 51.4 | 24.9 | 3103.8 | 40.16064257 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 187.9 | 25 | 2947.1 | 40 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 51.9 | 25 | 3102.4 | 40 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 52.6 | 25.3 | 3255.6 | 39.5256917 |
tiny-llama-1.1b-chat | FP16 | 32 | 30.4 | 25.6 | 3022.4 | 39.0625 |
tiny-llama-1.1b-chat | FP16 | 1024 | 116.2 | 26.3 | 3125.8 | 38.02281369 |
qwen3-4b | INT4-MIXED | 32 | 71 | 26.6 | 3111.2 | 37.59398496 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 382.2 | 26.8 | 3577.1 | 37.31343284 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 378.8 | 26.8 | 3671.6 | 37.31343284 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 382.5 | 26.8 | 3576.1 | 37.31343284 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 50.8 | 26.9 | 3417.1 | 37.17472119 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 55 | 27.2 | 3331.6 | 36.76470588 |
internvl2-4b | INT4-MIXED | 297 | 224 | 27.5 | 4421.1 | 36.36363636 |
phi-4-mini-instruct | INT4-MIXED | 32 | 74.8 | 27.6 | 3104.1 | 36.23188406 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 75.2 | 27.6 | 3105.9 | 36.23188406 |
qwen3-4b | INT4-MIXED | 32 | 61.6 | 27.7 | 3306.9 | 36.10108303 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 389.8 | 27.8 | 3623.5 | 35.97122302 |
internvl2-4b | INT4-MIXED | 297 | 221.9 | 27.8 | 4435.7 | 35.97122302 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 387.1 | 27.9 | 3608.9 | 35.84229391 |
afm-4.5b | INT4-MIXED | 32 | 54 | 28.2 | 3743.2 | 35.46099291 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 476.3 | 28.3 | 3744.1 | 35.33568905 |
phi-4-mini-instruct | INT4-MIXED | 32 | 67.8 | 28.5 | 3203.6 | 35.0877193 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 68.8 | 28.5 | 3201 | 35.0877193 |
gemma-2b-it | INT8-CW | 32 | 39.7 | 28.6 | 3402.1 | 34.96503497 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 68.1 | 28.8 | 3353.2 | 34.72222222 |
gemma-3-4b-it | INT4-MIXED | 32 | 92.6 | 29.1 | 4554.7 | 34.36426117 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 378.2 | 29.4 | 3422.6 | 34.01360544 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 378.3 | 29.4 | 3423.3 | 34.01360544 |
qwen3-4b | INT4-MIXED | 1024 | 353 | 29.4 | 3445 | 34.01360544 |
gemma-2b-it | INT8-CW | 1024 | 217.2 | 29.5 | 3536.8 | 33.89830508 |
afm-4.5b | INT4-MIXED | 1024 | 387.2 | 29.7 | 3830.9 | 33.67003367 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 506 | 29.8 | 3829 | 33.55704698 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 507.2 | 29.8 | 5366.7 | 33.55704698 |
llama-3.2-1b-instruct | FP16 | 32 | 35.9 | 29.9 | 3306.9 | 33.44481605 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 545.3 | 30 | 3761.1 | 33.33333333 |
gemma-3-4b-it | INT4-MIXED | 32 | 90.9 | 30.1 | 4690.4 | 33.22259136 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 380.9 | 30.2 | 3446.4 | 33.11258278 |
internvl2-4b | INT4-MIXED | 1027 | 590 | 30.3 | 5622.2 | 33.00330033 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 384.7 | 30.3 | 3478.8 | 33.00330033 |
gemma-3-4b-it | INT4-MIXED | 32 | 94.1 | 30.3 | 4742.4 | 33.00330033 |
qwen3-4b | INT4-MIXED | 1024 | 362.9 | 30.4 | 3593.4 | 32.89473684 |
phi-4-mini-instruct | INT4-MIXED | 32 | 74.3 | 30.4 | 3501.2 | 32.89473684 |
llama-3.2-1b-instruct | FP16 | 1024 | 136.4 | 30.5 | 3445.6 | 32.78688525 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 403 | 30.6 | 3585.7 | 32.67973856 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 616.5 | 30.7 | 5853.6 | 32.5732899 |
gemma-2-2b | INT8-CW | 33 | 55.4 | 30.7 | 3596.8 | 32.5732899 |
internvl2-4b | INT4-MIXED | 1027 | 596.5 | 30.8 | 5620.7 | 32.46753247 |
gemma-3-4b-it | INT4-MIXED | 1024 | 410.6 | 30.8 | 6846.8 | 32.46753247 |
phi-2 | INT8-CW | 32 | 78.8 | 31.7 | 3736.3 | 31.54574132 |
gemma-3-4b-it | INT4-MIXED | 1024 | 427.8 | 31.8 | 6962.5 | 31.44654088 |
gemma-3-4b-it | INT4-MIXED | 1024 | 448.9 | 32 | 7037.8 | 31.25 |
gemma-2-2b | INT8-CW | 1025 | 245 | 32.2 | 3825.7 | 31.05590062 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 472.2 | 32.2 | 3645.3 | 31.05590062 |
minicpm-1b-sft | FP16 | 31 | 62 | 32.7 | 3677.6 | 30.58103976 |
chatglm3-6b | INT4-MIXED | 32 | 49.3 | 33.3 | 4117.6 | 30.03003003 |
qwen2.5-1.5b-instruct | FP16 | 32 | 40.7 | 33.5 | 3812.1 | 29.85074627 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 41.8 | 33.6 | 4351.5 | 29.76190476 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 71.4 | 33.9 | 3890.2 | 29.49852507 |
phi-2 | INT8-CW | 1024 | 342.2 | 34 | 4182.9 | 29.41176471 |
minicpm-1b-sft | FP16 | 1014 | 186.7 | 34.2 | 3974.8 | 29.23976608 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 166.7 | 34.3 | 3926.1 | 29.15451895 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 165 | 34.4 | 4461.5 | 29.06976744 |
flan-t5-xxl | INT4-MIXED | 33 | 58.2 | 34.5 | 12873.2 | 28.98550725 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 484.1 | 34.8 | 5784.4 | 28.73563218 |
gpt-oss-20b | INT4-MIXED | 32 | 228.1 | 35.1 | 13065.5 | 28.49002849 |
chatglm3-6b | INT4-MIXED | 32 | 50 | 35.2 | 4471.1 | 28.40909091 |
chatglm3-6b | INT4-MIXED | 1024 | 464.1 | 35.3 | 4286.6 | 28.3286119 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 281.7 | 35.3 | 3997.4 | 28.3286119 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 577 | 36.1 | 6726.3 | 27.70083102 |
gpt-oss-20b | INT4-MIXED | 1024 | 716 | 36.8 | 13273.7 | 27.17391304 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 62.5 | 36.8 | 4493.9 | 27.17391304 |
codellama-7b | INT4-MIXED | 32 | 65.2 | 36.9 | 4495.3 | 27.100271 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1350.2 | 37 | 8869.1 | 27.02702703 |
llama-3.2-3b-instruct | INT8-CW | 32 | 55.6 | 37 | 4116 | 27.02702703 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1206 | 37.1 | 8188.8 | 26.9541779 |
chatglm3-6b | INT4-MIXED | 1024 | 540.3 | 37.2 | 4520.4 | 26.88172043 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 283.4 | 38.6 | 4343.2 | 25.90673575 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 55.5 | 38.6 | 4671 | 25.90673575 |
minicpm3-4b | INT4-MIXED | 32 | 204.2 | 38.6 | 3565.6 | 25.90673575 |
biomistral-7b-slerp | INT4-MIXED | 7 | 47.3 | 38.6 | 4512.9 | 25.90673575 |
zephyr-7b-beta | INT4-MIXED | 32 | 62.8 | 38.8 | 4526.7 | 25.77319588 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 62.9 | 38.9 | 4525.4 | 25.70694087 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 61.9 | 39 | 4531.9 | 25.64102564 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 62.7 | 39 | 4522.5 | 25.64102564 |
codellama-7b | INT4-MIXED | 32 | 56.7 | 39.5 | 4871.4 | 25.3164557 |
minicpm3-4b | INT4-MIXED | 32 | 206.9 | 39.5 | 3632.2 | 25.3164557 |
falcon-7b-instruct | INT4-MIXED | 32 | 64.2 | 40.3 | 4332.1 | 24.81389578 |
minicpm3-4b | INT4-MIXED | 32 | 206.6 | 40.4 | 3854.5 | 24.75247525 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 534.2 | 40.6 | 5094.6 | 24.63054187 |
codellama-7b | INT4-MIXED | 1024 | 530.8 | 40.8 | 5096.4 | 24.50980392 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 59.1 | 40.9 | 4704 | 24.44987775 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 59.1 | 40.9 | 4700.7 | 24.44987775 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 559.7 | 41.1 | 6677.2 | 24.33090024 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 42.6 | 41.1 | 6862.7 | 24.33090024 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 674.1 | 41.2 | 7218.8 | 24.27184466 |
zephyr-7b-beta | INT4-MIXED | 1024 | 543.5 | 41.4 | 4833.1 | 24.15458937 |
biomistral-7b-slerp | INT4-MIXED | 7 | 47.1 | 41.4 | 4912.6 | 24.15458937 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 545.5 | 41.5 | 4831.8 | 24.09638554 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 59.8 | 41.5 | 4829.2 | 24.09638554 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 58.7 | 41.5 | 4826.9 | 24.09638554 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 58.5 | 41.5 | 4825.9 | 24.09638554 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 545 | 41.6 | 4838.2 | 24.03846154 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 556.6 | 41.6 | 4829.7 | 24.03846154 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 58.8 | 41.6 | 4836.5 | 24.03846154 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 60.8 | 41.9 | 5072.2 | 23.86634845 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 60.8 | 41.9 | 5069.7 | 23.86634845 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 59.8 | 41.9 | 5068.4 | 23.86634845 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 60.2 | 41.9 | 5069.1 | 23.86634845 |
falcon-7b-instruct | INT4-MIXED | 1024 | 572.1 | 42.2 | 4476.9 | 23.69668246 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 557.5 | 42.5 | 5161.9 | 23.52941176 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1535 | 42.6 | 9687.5 | 23.4741784 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 44.1 | 42.6 | 6896.5 | 23.4741784 |
minicpm3-4b | INT4-MIXED | 1024 | 756.2 | 42.7 | 4524.9 | 23.41920375 |
llama-3-8b-instruct | INT4-MIXED | 32 | 67 | 42.7 | 5305.2 | 23.41920375 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 68.4 | 42.8 | 5303.7 | 23.36448598 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 68 | 42.8 | 5302.2 | 23.36448598 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1388 | 43 | 8972.4 | 23.25581395 |
codellama-7b | INT4-MIXED | 1024 | 623.6 | 43.2 | 5334.6 | 23.14814815 |
minicpm3-4b | INT4-MIXED | 1024 | 828.5 | 43.2 | 4587.8 | 23.14814815 |
minicpm-v-2_6 | INT4-MIXED | 228 | 547.3 | 43.3 | 6302.9 | 23.09468822 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 62.9 | 43.3 | 4722.9 | 23.09468822 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 565.5 | 43.4 | 4913.5 | 23.04147465 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 60 | 43.4 | 5252.5 | 23.04147465 |
phi-3.5-mini-instruct | INT8-CW | 32 | 62 | 43.4 | 4724.1 | 23.04147465 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 59.2 | 43.4 | 5254.5 | 23.04147465 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 500.6 | 43.5 | 5277 | 22.98850575 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 560.9 | 43.5 | 4909.1 | 22.98850575 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 486.3 | 43.5 | 5275.5 | 22.98850575 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 487 | 43.5 | 5274 | 22.98850575 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 59.3 | 43.5 | 5252.6 | 22.98850575 |
qwen2-7b-instruct | INT4-MIXED | 32 | 59.4 | 43.5 | 5255.5 | 22.98850575 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 489.2 | 43.6 | 5273.3 | 22.93577982 |
minicpm-o-2_6 | INT4-MIXED | 238 | 539.4 | 43.6 | 6409.8 | 22.93577982 |
minicpm3-4b | INT4-MIXED | 1024 | 867 | 43.8 | 4743.3 | 22.83105023 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 694 | 43.9 | 4969.7 | 22.77904328 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 653.5 | 43.9 | 4980.3 | 22.77904328 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 59 | 43.9 | 5458.8 | 22.77904328 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 59.2 | 43.9 | 5370.3 | 22.77904328 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 59.6 | 43.9 | 5373.5 | 22.77904328 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 59.2 | 43.9 | 5367.9 | 22.77904328 |
qwen2-7b-instruct | INT4-MIXED | 32 | 59.4 | 43.9 | 5370.4 | 22.77904328 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 646 | 44 | 4978.7 | 22.72727273 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 646.2 | 44.1 | 4988.5 | 22.67573696 |
bloomz-7b1 | INT4-MIXED | 32 | 62.8 | 44.4 | 5010.2 | 22.52252252 |
falcon-7b-instruct | INT4-MIXED | 32 | 62.3 | 44.6 | 5026.1 | 22.42152466 |
minicpm4-8b | INT4-MIXED | 32 | 65.9 | 44.7 | 5111.5 | 22.37136465 |
minicpm-v-2_6 | INT4-MIXED | 228 | 570.2 | 44.8 | 6456.6 | 22.32142857 |
llama-3-8b-instruct | INT4-MIXED | 32 | 64.5 | 44.8 | 5461.5 | 22.32142857 |
llama-3-8b-instruct | INT4-MIXED | 32 | 63.5 | 44.8 | 5464 | 22.32142857 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 63.4 | 44.9 | 5464.9 | 22.27171492 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 63.6 | 44.9 | 5462.2 | 22.27171492 |
phi-4-mini-instruct | INT8-CW | 32 | 81.9 | 44.9 | 4649.1 | 22.27171492 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 520.8 | 45.1 | 5369.9 | 22.172949 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 509.2 | 45.1 | 5371.1 | 22.172949 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 518.6 | 45.1 | 5354.4 | 22.172949 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 513.8 | 45.2 | 5354.4 | 22.12389381 |
phi-4-mini-reasoning | INT8-CW | 32 | 79.8 | 45.2 | 4648.3 | 22.12389381 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 545.6 | 45.3 | 5611.1 | 22.07505519 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 547.8 | 45.4 | 5610.8 | 22.02643172 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 549.3 | 45.4 | 5609.5 | 22.02643172 |
minicpm-v-2_6 | INT4-MIXED | 228 | 585.5 | 45.4 | 6679.8 | 22.02643172 |
llama-3-8b-instruct | INT4-MIXED | 32 | 63.6 | 45.4 | 5574.1 | 22.02643172 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 589.6 | 45.5 | 5529.9 | 21.97802198 |
minicpm-o-2_6 | INT4-MIXED | 238 | 566.7 | 45.5 | 6672.9 | 21.97802198 |
gemma-3-4b-it | INT8-CW | 32 | 98 | 45.5 | 6144.7 | 21.97802198 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 84.4 | 45.5 | 4725.1 | 21.97802198 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 587.6 | 45.6 | 5438.2 | 21.92982456 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 589 | 45.6 | 5435.3 | 21.92982456 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 587.6 | 45.6 | 5433 | 21.92982456 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 588.8 | 45.6 | 5433.9 | 21.92982456 |
internvl2-4b | INT8-CW | 297 | 219 | 45.9 | 5858.2 | 21.78649237 |
minicpm-v-4_5 | INT4-MIXED | 217 | 567.6 | 46 | 6820.1 | 21.73913043 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 489.4 | 46.1 | 5207.2 | 21.69197397 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 499 | 46.2 | 5211.1 | 21.64502165 |
qwen3-4b | INT8-CW | 32 | 77.5 | 46.2 | 4949.6 | 21.64502165 |
qwen3-8b | INT4-MIXED | 32 | 74.5 | 46.6 | 5736 | 21.45922747 |
falcon-7b-instruct | INT4-MIXED | 1024 | 673.9 | 46.7 | 4919.7 | 21.41327623 |
minicpm4-8b | INT4-MIXED | 1024 | 574.6 | 46.7 | 5276.5 | 21.41327623 |
phi-4-mini-instruct | INT8-CW | 1024 | 437.7 | 46.7 | 4925.6 | 21.41327623 |
afm-4.5b | INT8-CW | 32 | 65.5 | 46.9 | 5361.7 | 21.32196162 |
bloomz-7b1 | INT4-MIXED | 32 | 68 | 46.9 | 5258 | 21.32196162 |
minicpm4-8b | INT4-MIXED | 32 | 70.4 | 46.9 | 5324.8 | 21.32196162 |
phi-4-mini-reasoning | INT8-CW | 1024 | 431 | 47 | 4924.2 | 21.27659574 |
qwen3-8b | INT4-MIXED | 1024 | 631.5 | 47.2 | 5812.6 | 21.18644068 |
qwen3-8b | INT4-MIXED | 32 | 72.9 | 47.2 | 5760.9 | 21.18644068 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 568.7 | 47.3 | 5685 | 21.14164905 |
gemma-3-4b-it | INT8-CW | 1024 | 462.4 | 47.3 | 8462.5 | 21.14164905 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 575.1 | 47.3 | 5683.6 | 21.14164905 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 571.7 | 47.3 | 5685 | 21.14164905 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 571.8 | 47.4 | 5699.6 | 21.09704641 |
minicpm4-8b | INT4-MIXED | 32 | 70.4 | 47.5 | 5535.6 | 21.05263158 |
phi-3.5-vision-instruct | INT8-CW | 802 | 521.6 | 47.8 | 6783.7 | 20.92050209 |
bloomz-7b1 | INT4-MIXED | 1024 | 678.6 | 48 | 5571.2 | 20.83333333 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 642.2 | 48.1 | 5755.2 | 20.79002079 |
qwen3-8b | INT4-MIXED | 32 | 101.2 | 48.1 | 5503.4 | 20.79002079 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 627.9 | 48.5 | 7122.5 | 20.6185567 |
afm-4.5b | INT8-CW | 1024 | 381.9 | 48.5 | 5519.4 | 20.6185567 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 519.2 | 48.5 | 5210.2 | 20.6185567 |
minicpm-v-4_5 | INT4-MIXED | 217 | 591 | 48.8 | 7147.1 | 20.49180328 |
internvl2-4b | INT8-CW | 1027 | 627.6 | 48.9 | 6898.9 | 20.44989775 |
qwen3-4b | INT8-CW | 1024 | 402.4 | 48.9 | 5232.3 | 20.44989775 |
qwen3-8b | INT4-MIXED | 1024 | 592.8 | 49.2 | 5997.3 | 20.32520325 |
minicpm4-8b | INT4-MIXED | 1024 | 617.5 | 49.3 | 5363.9 | 20.28397566 |
minicpm4-8b | INT4-MIXED | 1024 | 691.7 | 49.6 | 5565.2 | 20.16129032 |
qwen3-8b | INT4-MIXED | 1024 | 668.1 | 49.9 | 5954.1 | 20.04008016 |
zephyr-7b-beta | INT4-MIXED | 32 | 67 | 49.9 | 5438.5 | 20.04008016 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 77.2 | 50.1 | 6038.3 | 19.96007984 |
bloomz-7b1 | INT4-MIXED | 1024 | 786.7 | 50.3 | 5796 | 19.88071571 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 185.1 | 50.3 | 6220.1 | 19.88071571 |
gemma-7b-it | INT4-MIXED | 32 | 74.4 | 51.1 | 5633.1 | 19.56947162 |
baichuan2-7b-chat | INT4-MIXED | 32 | 67.5 | 51.5 | 6147.4 | 19.41747573 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 643.1 | 52.1 | 7603.7 | 19.19385797 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 74.3 | 52.1 | 6334.3 | 19.19385797 |
zephyr-7b-beta | INT4-MIXED | 1024 | 571.9 | 52.4 | 5700 | 19.08396947 |
gpt-oss-20b | INT4-MIXED | 32 | 273.3 | 52.4 | 12343.5 | 19.08396947 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 74.1 | 52.7 | 6482.8 | 18.97533207 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 674.3 | 52.9 | 6220.4 | 18.90359168 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 186.8 | 53 | 6434.9 | 18.86792453 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 187.3 | 53.3 | 6550.4 | 18.76172608 |
gemma-7b-it | INT4-MIXED | 32 | 78.7 | 53.6 | 5931.9 | 18.65671642 |
gpt-oss-20b | INT4-MIXED | 1024 | 691.4 | 53.8 | 12528.5 | 18.58736059 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 681.6 | 54.1 | 7682 | 18.48428835 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 746.2 | 54.3 | 7768.1 | 18.41620626 |
ltx-video | INT4-MIXED | 11 | 55.5 | 54.5 | 6160.4 | 18.34862385 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 692.1 | 54.8 | 6413.3 | 18.24817518 |
phi-4-multimodal-instruct | INT8-CW | 578 | 524.6 | 54.9 | 7564.4 | 18.21493625 |
gemma-7b-it | INT4-MIXED | 1024 | 640.3 | 55 | 6203.8 | 18.18181818 |
phi-4-multimodal-instruct | INT8-CW | 786 | 631.8 | 55.2 | 8482 | 18.11594203 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 650 | 55.4 | 6694.2 | 18.05054152 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 778.2 | 55.5 | 6481.4 | 18.01801802 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 3136.4 | 56.3 | 9258 | 17.76198934 |
minicpm3-4b | INT8-CW | 32 | 197 | 56.3 | 5459.5 | 17.76198934 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 1471.2 | 56.5 | 10810.3 | 17.69911504 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 75.6 | 56.5 | 6452.1 | 17.69911504 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 1318.5 | 56.6 | 9941.4 | 17.66784452 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 75 | 56.6 | 6541.9 | 17.66784452 |
ltx-video | INT8-CW | 11 | 57.3 | 56.7 | 9508.1 | 17.6366843 |
phi-2 | FP16 | 32 | 66.7 | 57.6 | 6547.1 | 17.36111111 |
gemma-2-9b-it | INT4-MIXED | 32 | 91.9 | 57.7 | 6056.6 | 17.33102253 |
gemma-7b-it | INT4-MIXED | 1024 | 749.6 | 57.8 | 6465.9 | 17.30103806 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 711.3 | 58.8 | 6550.5 | 17.00680272 |
gemma-2-2b | FP16 | 33 | 66.4 | 58.8 | 7241.3 | 17.00680272 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 718.4 | 58.9 | 6642.4 | 16.97792869 |
gemma-2-9b-it | INT4-MIXED | 32 | 83.1 | 59.8 | 6276.2 | 16.72240803 |
gemma-2-2b | FP16 | 1025 | 335.5 | 60.4 | 7549.3 | 16.55629139 |
gemma-2-9b-it | INT4-MIXED | 32 | 84.3 | 60.5 | 6434.5 | 16.52892562 |
gemma-2b-it | FP16 | 32 | 68.5 | 60.9 | 5774 | 16.42036125 |
phi-2 | FP16 | 1024 | 423.8 | 61.6 | 7173 | 16.23376623 |
gemma-2b-it | FP16 | 1024 | 260.4 | 61.7 | 5915.1 | 16.20745543 |
gemma-2-9b-it | INT4-MIXED | 1024 | 717.1 | 62.2 | 6555.7 | 16.07717042 |
minicpm3-4b | INT8-CW | 1024 | 902.9 | 62.3 | 6404.7 | 16.05136437 |
stable-diffusion-xl-1.0-inpainting-0.1 | FP16 | 32 | 64.1 | 63.1 | 9346.9 | 15.84786054 |
gemma-2-9b-it | INT4-MIXED | 1024 | 741.3 | 64.6 | 6761 | 15.47987616 |
gemma-2-9b-it | INT4-MIXED | 1024 | 806.7 | 65.2 | 6850 | 15.33742331 |
lcm-dreamshaper-v7 | INT8-HYBRID | 32 | 69.5 | 66.3 | 3567.4 | 15.08295626 |
lcm-dreamshaper-v7 | INT8-HYBRID | 1024 | 67.9 | 66.7 | 3273.3 | 14.99250375 |
chatglm3-6b | INT8-CW | 32 | 81.8 | 67.1 | 6940.8 | 14.90312966 |
llama-3.2-3b-instruct | FP16 | 32 | 78 | 67.4 | 7170.1 | 14.83679525 |
dolly-v2-12b | INT4-MIXED | 32 | 104.2 | 68.2 | 7600.2 | 14.6627566 |
chatglm3-6b | INT8-CW | 1024 | 530.4 | 69 | 7102.7 | 14.49275362 |
llama-3.2-3b-instruct | FP16 | 1024 | 410.6 | 69.2 | 7458.2 | 14.45086705 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 102.9 | 70.3 | 7696.4 | 14.22475107 |
qwen2.5-coder-3b-instruct | FP16 | 32 | 80 | 72.7 | 6876.7 | 13.75515818 |
dolly-v2-12b | INT4-MIXED | 1024 | 1257.7 | 73.3 | 8405.5 | 13.6425648 |
gemma-3-12b-it | INT4-MIXED | 32 | 168 | 73.5 | 9114.4 | 13.60544218 |
flan-t5-xxl | INT8-CW | 33 | 247.2 | 73.7 | 22627.8 | 13.56852103 |
qwen2.5-coder-3b-instruct | FP16 | 1024 | 317.4 | 74.3 | 7032 | 13.4589502 |
falcon-7b-instruct | INT8-CW | 32 | 93.2 | 74.7 | 7580.7 | 13.38688086 |
qwen2.5-7b-instruct-1m | INT8-CW | 32 | 100.5 | 74.9 | 8186 | 13.35113485 |
qwen2.5-7b-instruct | INT8-CW | 32 | 99 | 75 | 8186.7 | 13.33333333 |
qwen2.5-7b-instruct | INT8-CW | 32 | 98.3 | 75 | 8185.9 | 13.33333333 |
qwen2-7b-instruct | INT8-CW | 32 | 99.6 | 75 | 8187.4 | 13.33333333 |
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 99.3 | 75.1 | 8187.5 | 13.31557923 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 1016.5 | 75.9 | 8610.4 | 13.17523057 |
falcon-7b-instruct | INT8-CW | 1024 | 830.2 | 76.5 | 7740.6 | 13.07189542 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 577.1 | 76.5 | 8395.8 | 13.07189542 |
minicpm-o-2_6 | INT8-CW | 238 | 550 | 76.5 | 9494 | 13.07189542 |
minicpm-v-2_6 | INT8-CW | 228 | 557.2 | 76.5 | 9481.8 | 13.07189542 |
deepseek-r1-distill-qwen-7b | INT8-CW | 1024 | 581.4 | 76.6 | 8395.8 | 13.05483029 |
qwen2-7b-instruct | INT8-CW | 1024 | 575.5 | 76.6 | 8395.1 | 13.05483029 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 575.5 | 76.7 | 8395.9 | 13.03780965 |
qwen2.5-7b-instruct-1m | INT8-CW | 1024 | 577.2 | 76.7 | 8397 | 13.03780965 |
llama-2-7b-chat-hf | INT8-CW | 32 | 92.2 | 76.7 | 7681.5 | 13.03780965 |
gemma-3-12b-it | INT4-MIXED | 32 | 174.6 | 77.3 | 9390.6 | 12.93661061 |
phi-3-mini-128k-instruct | FP16 | 32 | 86.9 | 77.5 | 8582.3 | 12.90322581 |
gemma-3-12b-it | INT4-MIXED | 1024 | 1105.7 | 77.6 | 11768.6 | 12.88659794 |
codellama-7b | INT8-CW | 32 | 93.6 | 77.6 | 7680.1 | 12.88659794 |
phi-4 | INT4-MIXED | 32 | 114.5 | 77.6 | 8505.8 | 12.88659794 |
phi-4-reasoning | INT4-MIXED | 32 | 114.9 | 77.6 | 8506.8 | 12.88659794 |
phi-3.5-mini-instruct | FP16 | 32 | 90.1 | 77.7 | 8487 | 12.87001287 |
phi-3-mini-4k-instruct | FP16 | 32 | 88.5 | 77.7 | 8486.7 | 12.87001287 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 81.5 | 78.3 | 3490.1 | 12.77139208 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 79.8 | 78.6 | 5041 | 12.72264631 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 109.6 | 79.2 | 8841.3 | 12.62626263 |
lcm-dreamshaper-v7 | FP16 | 32 | 80 | 79.3 | 4024.1 | 12.61034048 |
lcm-dreamshaper-v7 | FP16 | 1024 | 80.2 | 79.4 | 3930.4 | 12.59445844 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 83.4 | 79.5 | 3493.8 | 12.57861635 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 80.6 | 79.6 | 5194.2 | 12.56281407 |
qwen1.5-14b-chat | INT4-MIXED | 32 | 116.6 | 79.7 | 9202.9 | 12.54705144 |
internvl2-4b | FP16 | 297 | 287.7 | 79.8 | 9597.4 | 12.53132832 |
phi-4-mini-reasoning | FP16 | 32 | 91.2 | 80.1 | 8410.1 | 12.48439451 |
llama-2-7b-chat-hf | INT8-CW | 1024 | 592.1 | 80.4 | 8262.7 | 12.43781095 |
phi-4-mini-instruct | FP16 | 32 | 92.1 | 80.7 | 8406.9 | 12.39157373 |
phi-4-reasoning | INT4-MIXED | 32 | 122.8 | 80.7 | 8926 | 12.39157373 |
baichuan2-7b-chat | INT8-CW | 32 | 97.9 | 80.9 | 8456.2 | 12.36093943 |
codellama-7b | INT8-CW | 1024 | 682.5 | 81.2 | 8263.5 | 12.31527094 |
mistral-7b-instruct-v0.2 | INT8-CW | 32 | 102.7 | 81.2 | 7862.6 | 12.31527094 |
phi-4-reasoning | INT4-MIXED | 1024 | 1107.8 | 81.3 | 8982.9 | 12.300123 |
zephyr-7b-beta | INT8-CW | 32 | 99.3 | 81.3 | 7861.6 | 12.300123 |
gemma-3-12b-it | INT4-MIXED | 1024 | 1232.1 | 81.4 | 12032.5 | 12.28501229 |
phi-4 | INT4-MIXED | 1024 | 1122.2 | 81.5 | 8982.4 | 12.26993865 |
phi-4 | INT4-MIXED | 32 | 123 | 81.5 | 9155.6 | 12.26993865 |
phi-3-mini-128k-instruct | FP16 | 1024 | 567 | 81.6 | 9341.5 | 12.25490196 |
biomistral-7b-slerp | INT8-CW | 7 | 88.5 | 81.7 | 7852.1 | 12.23990208 |
phi-3.5-mini-instruct | FP16 | 1024 | 565.9 | 81.9 | 9247.1 | 12.21001221 |
mistral-7b-instruct-v0.1 | INT8-CW | 32 | 100.6 | 82 | 7860.3 | 12.19512195 |
neural-chat-7b-v3-3 | INT8-CW | 32 | 98 | 82 | 7858.8 | 12.19512195 |
phi-3.5-vision-instruct | FP16 | 802 | 699.6 | 82.1 | 10357.6 | 12.18026797 |
phi-3-mini-4k-instruct | FP16 | 1024 | 569.5 | 82.2 | 9247 | 12.16545012 |
mistral-7b-instruct-v0.3 | INT8-CW | 32 | 98.6 | 82.2 | 7868.9 | 12.16545012 |
phi-4-mini-reasoning | FP16 | 1024 | 554.2 | 82.4 | 8729.5 | 12.13592233 |
internvl2-4b | FP16 | 1027 | 849.8 | 82.7 | 10403.4 | 12.09189843 |
phi-4-mini-instruct | FP16 | 1024 | 558.5 | 82.8 | 8730.7 | 12.07729469 |
phi-3.5-vision-instruct | FP16 | 1032 | 866.9 | 83 | 10576.5 | 12.04819277 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 1028.7 | 83.3 | 9242.7 | 12.00480192 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 114.8 | 83.4 | 9494.7 | 11.99040767 |
mistral-7b-instruct-v0.2 | INT8-CW | 1024 | 639.4 | 83.7 | 8168.7 | 11.9474313 |
qwen2.5-vl-7b-instruct | INT8-CW | 32 | 218 | 83.7 | 9415.1 | 11.9474313 |
zephyr-7b-beta | INT8-CW | 1024 | 612.5 | 83.8 | 8163.9 | 11.93317422 |
mistral-7b-instruct-v0.1 | INT8-CW | 1025 | 660.7 | 84.3 | 8155.5 | 11.8623962 |
neural-chat-7b-v3-3 | INT8-CW | 1024 | 613.9 | 84.5 | 8161.2 | 11.83431953 |
phi-4-reasoning | INT4-MIXED | 1024 | 1158.4 | 84.5 | 9270.2 | 11.83431953 |
baichuan2-7b-chat | INT8-CW | 1024 | 647.5 | 84.6 | 9043.3 | 11.82033097 |
mistral-7b-instruct-v0.3 | INT8-CW | 1024 | 616.3 | 84.6 | 8170.7 | 11.82033097 |
llama-3.1-8b-instruct | INT8-CW | 32 | 104.5 | 84.7 | 8640.4 | 11.80637544 |
bloomz-7b1 | INT8-CW | 32 | 95.2 | 84.9 | 7994.2 | 11.77856302 |
qwen2.5-vl-7b-instruct | INT8-CW | 1024 | 752 | 85 | 10806.4 | 11.76470588 |
llama-3-8b-instruct | INT8-CW | 32 | 102.9 | 85.1 | 8640 | 11.75088132 |
qwen1.5-14b-chat | INT4-MIXED | 1024 | 1154.6 | 85.3 | 10086.1 | 11.72332943 |
phi-4 | INT4-MIXED | 1024 | 1306.3 | 85.4 | 9376.9 | 11.70960187 |
deepseek-r1-distill-llama-8b | INT8-CW | 32 | 104.1 | 85.6 | 8726.7 | 11.68224299 |
qwen3-4b | FP16 | 32 | 96.4 | 85.7 | 8773.1 | 11.66861144 |
phi-4-reasoning | INT4-MIXED | 32 | 123.9 | 86.3 | 9505.1 | 11.58748552 |
llama-3.1-8b-instruct | INT8-CW | 1024 | 618.6 | 87 | 8944.8 | 11.49425287 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 1205.8 | 87.4 | 9653.2 | 11.4416476 |
llama-3-8b-instruct | INT8-CW | 1024 | 618.6 | 87.4 | 8944.1 | 11.4416476 |
deepseek-r1-distill-llama-8b | INT8-CW | 1024 | 622.9 | 87.7 | 9040 | 11.40250855 |
gemma-3-4b-it | FP16 | 32 | 127.8 | 87.9 | 10778.6 | 11.37656428 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 121.4 | 87.9 | 9388.1 | 11.37656428 |
bloomz-7b1 | INT8-CW | 1024 | 792.8 | 88.3 | 8547.2 | 11.32502831 |
llava-v1.6-mistral-7b-hf | INT8-CW | 2944 | 2891.8 | 88.8 | 10236.5 | 11.26126126 |
qwen3-4b | FP16 | 1024 | 496.2 | 88.9 | 9122.1 | 11.24859393 |
afm-4.5b | FP16 | 32 | 101.1 | 89.1 | 9829.9 | 11.22334456 |
gemma-3-4b-it | FP16 | 1024 | 605 | 89.3 | 13129.3 | 11.19820829 |
starcoder | INT4-MIXED | 32 | 141.2 | 89.3 | 9249.4 | 11.19820829 |
phi-4-reasoning | INT4-MIXED | 1024 | 1365.3 | 90.2 | 9669.2 | 11.0864745 |
afm-4.5b | FP16 | 1024 | 496.9 | 90.9 | 10076.8 | 11.00110011 |
qwen3-8b | INT8-CW | 32 | 105.3 | 90.9 | 8916.7 | 11.00110011 |
minicpm-v-4_5 | INT8-CW | 217 | 571.7 | 92.3 | 10151.6 | 10.83423619 |
starcoder | INT4-MIXED | 1024 | 1426.3 | 92.7 | 9069.2 | 10.78748652 |
minicpm4-8b | INT8-CW | 32 | 111.1 | 93 | 8735.4 | 10.75268817 |
qwen3-8b | INT8-CW | 1024 | 664.4 | 93.4 | 9222.7 | 10.70663812 |
gemma-7b-it | INT8-CW | 32 | 119.4 | 93.4 | 9334.7 | 10.70663812 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 1164.9 | 93.5 | 10230 | 10.69518717 |
llava-next-video-7b-hf | INT8-CW | 2945 | 3277.1 | 93.6 | 12069.4 | 10.68376068 |
minicpm4-8b | INT8-CW | 1024 | 727.2 | 95.2 | 8900.9 | 10.50420168 |
minicpm3-4b | FP16 | 32 | 176.5 | 95.5 | 9596.7 | 10.47120419 |
glm-4-9b-chat-hf | INT8-CW | 32 | 118.5 | 96.9 | 10027.2 | 10.31991744 |
gemma-7b-it | INT8-CW | 1024 | 764.7 | 97.2 | 9901.3 | 10.28806584 |
phi-4-multimodal-instruct | FP16 | 578 | 659.8 | 99.3 | 12503.4 | 10.07049345 |
glm-4-9b-chat-hf | INT8-CW | 1024 | 782.8 | 99.8 | 10198.1 | 10.02004008 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | max rss memory | 2nd token per sec |
|---|---|---|---|---|---|---|
t5-small | FP16 | 1024 | 6.7 | 3.1 | 1301.9 | 322.5806452 |
t5-small | FP16 | 32 | 6.3 | 3.1 | 1180.4 | 322.5806452 |
t5-small | INT4-MIXED | 1024 | 7.2 | 3.2 | 1129.4 | 312.5 |
t5-small | INT4-MIXED | 1024 | 7.3 | 3.2 | 1022.7 | 312.5 |
t5-small | INT4-MIXED | 1024 | 7.2 | 3.2 | 1125.6 | 312.5 |
t5-small | INT8-CW | 1024 | 7.5 | 3.2 | 1169.2 | 312.5 |
t5-small | INT4-MIXED | 32 | 6.8 | 3.3 | 1019.4 | 303.030303 |
t5-small | INT4-MIXED | 32 | 6.8 | 3.3 | 909.2 | 303.030303 |
t5-small | INT4-MIXED | 32 | 7 | 3.3 | 1009.1 | 303.030303 |
t5-small | INT8-CW | 32 | 6.8 | 3.4 | 1053.2 | 294.1176471 |
minicpm4-0.5b | INT4-MIXED | 32 | 16.3 | 4.1 | 1056.3 | 243.902439 |
minicpm4-0.5b | INT4-MIXED | 32 | 17.6 | 4.2 | 1195.6 | 238.0952381 |
minicpm4-0.5b | INT4-MIXED | 1024 | 23.6 | 4.2 | 1109.1 | 238.0952381 |
minicpm4-0.5b | INT4-MIXED | 32 | 16.4 | 4.2 | 1155.1 | 238.0952381 |
minicpm4-0.5b | INT4-MIXED | 1024 | 29.1 | 4.4 | 1259.2 | 227.2727273 |
minicpm4-0.5b | INT4-MIXED | 1024 | 24.4 | 4.4 | 1190.9 | 227.2727273 |
gemma-3-270m | INT4-MIXED | 32 | 17.4 | 4.5 | 1207 | 222.2222222 |
gemma-3-270m | INT8-CW | 1024 | 21.2 | 4.5 | 1285.8 | 222.2222222 |
gemma-3-270m | INT8-CW | 32 | 18 | 4.5 | 1241.3 | 222.2222222 |
whisper-large-v3-turbo | INT4-MIXED | 1024 | 173.9 | 4.5 | 1711.3 | 222.2222222 |
distil-large-v2 | INT4-MIXED | 1024 | 163.7 | 4.6 | 1639.3 | 217.3913043 |
gemma-3-270m | INT4-MIXED | 1024 | 20.2 | 4.6 | 1256.1 | 217.3913043 |
distil-large-v2 | INT4-MIXED | 32 | 111.3 | 4.7 | 1615.2 | 212.7659574 |
whisper-large-v3-turbo | INT4-MIXED | 32 | 117.8 | 4.7 | 1678.5 | 212.7659574 |
distil-large-v2 | INT8-CW | 1024 | 155.4 | 4.8 | 1899 | 208.3333333 |
whisper-large-v3-turbo | INT8-CW | 1024 | 166.9 | 4.8 | 1996 | 208.3333333 |
distil-large-v2 | INT8-CW | 32 | 105.3 | 4.9 | 1866 | 204.0816327 |
whisper-large-v3-turbo | INT8-CW | 32 | 112 | 4.9 | 1963.7 | 204.0816327 |
minicpm4-0.5b | INT8-CW | 32 | 18.4 | 5.7 | 1315.8 | 175.4385965 |
whisper-small | INT4-MIXED | 32 | 66.5 | 6 | 1291.6 | 166.6666667 |
minicpm4-0.5b | INT8-CW | 1024 | 26.4 | 6.1 | 1364.4 | 163.9344262 |
whisper-small | INT4-MIXED | 1024 | 118.7 | 6.2 | 1324.4 | 161.2903226 |
whisper-small | INT4-MIXED | 32 | 71.1 | 6.3 | 1444 | 158.7301587 |
whisper-small | INT4-MIXED | 1024 | 123.2 | 6.3 | 1477.7 | 158.7301587 |
whisper-small | INT8-CW | 1024 | 122.4 | 6.3 | 1531.2 | 158.7301587 |
whisper-small | INT4-MIXED | 1024 | 120.2 | 6.5 | 1427.5 | 153.8461538 |
whisper-small | INT8-CW | 32 | 69.2 | 6.5 | 1497.2 | 153.8461538 |
whisper-large-v3-turbo | FP16 | 1024 | 175 | 6.6 | 2752.3 | 151.5151515 |
whisper-large-v3-turbo | FP16 | 32 | 119.9 | 6.7 | 2721.3 | 149.2537313 |
distil-large-v2 | FP16 | 32 | 113.5 | 6.8 | 2682.2 | 147.0588235 |
distil-large-v2 | FP16 | 1024 | 163.8 | 6.8 | 2714.8 | 147.0588235 |
gemma-3-270m | FP16 | 1024 | 19 | 6.8 | 1565.7 | 147.0588235 |
gemma-3-270m | FP16 | 32 | 15.6 | 6.8 | 1481 | 147.0588235 |
whisper-small | INT4-MIXED | 32 | 69.6 | 6.8 | 1396.7 | 147.0588235 |
nanollava | INT4-MIXED | 760 | 87.9 | 7.1 | 2898.9 | 140.8450704 |
nanollava | INT4-MIXED | 1752 | 133.7 | 7.7 | 4601 | 129.8701299 |
nanollava | INT8-CW | 760 | 91 | 7.7 | 2983.1 | 129.8701299 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 13.8 | 8.1 | 1662.4 | 123.4567901 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 13.5 | 8.1 | 1639.2 | 123.4567901 |
whisper-small | FP16 | 1024 | 114.8 | 8.1 | 1722.2 | 123.4567901 |
whisper-small | FP16 | 32 | 62.4 | 8.2 | 1691.4 | 121.9512195 |
gemma-3-1b-it | INT4-MIXED | 32 | 22.8 | 8.3 | 1521.2 | 120.4819277 |
gemma-3-1b-it | INT4-MIXED | 32 | 23 | 8.5 | 1646.4 | 117.6470588 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 43.6 | 8.5 | 1730.3 | 117.6470588 |
gemma-3-1b-it | INT4-MIXED | 32 | 23.9 | 8.6 | 1704.5 | 116.2790698 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 51.2 | 8.6 | 1787.3 | 116.2790698 |
gemma-3-1b-it | INT4-MIXED | 1024 | 48.9 | 8.7 | 1615.7 | 114.9425287 |
gemma-3-1b-it | INT4-MIXED | 1024 | 49 | 8.8 | 1737.7 | 113.6363636 |
minicpm4-0.5b | FP4-NORMALIZED | 32 | 14.8 | 8.8 | 1645.9 | 113.6363636 |
nanollava | INT8-CW | 1752 | 132.5 | 8.8 | 4709 | 113.6363636 |
minicpm4-0.5b | FP4-NORMALIZED | 1024 | 31.5 | 9 | 1704.9 | 111.1111111 |
gemma-3-1b-it | INT4-MIXED | 1024 | 55.7 | 9.1 | 1800 | 109.8901099 |
minicpm4-0.5b | FP16 | 32 | 15.7 | 9.6 | 1717.5 | 104.1666667 |
minicpm4-0.5b | FP16 | 1024 | 32.5 | 9.8 | 1790.4 | 102.0408163 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 20.9 | 9.9 | 2001.7 | 101.010101 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 20.8 | 10.1 | 2049.5 | 99.00990099 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 21.3 | 10.2 | 1729.2 | 98.03921569 |
smolvlm2-256m-video-instruct | INT8-CW | 1141 | 220.7 | 10.2 | 2645.7 | 98.03921569 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 75.7 | 10.5 | 2097.6 | 95.23809524 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 79.8 | 10.5 | 2241.5 | 95.23809524 |
glm-edge-1.5b-chat | INT4-MIXED | 32 | 33.3 | 10.5 | 1809.5 | 95.23809524 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 80.4 | 10.6 | 1908 | 94.33962264 |
glm-edge-1.5b-chat | INT4-MIXED | 1024 | 116.8 | 11 | 1922.2 | 90.90909091 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 25.5 | 11.1 | 1940.2 | 90.09009009 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 85.7 | 11.3 | 2143.6 | 88.49557522 |
gemma-3-1b-it | INT8-CW | 32 | 25.5 | 11.6 | 1972.8 | 86.20689655 |
nanollava | FP16 | 760 | 94 | 11.7 | 3970.7 | 85.47008547 |
gemma-3-1b-it | INT8-CW | 1024 | 61.1 | 11.9 | 2052.9 | 84.03361345 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 24.4 | 12.7 | 2399 | 78.74015748 |
llama-3.2-1b-instruct | INT8-CW | 32 | 15.3 | 12.8 | 2121.6 | 78.125 |
nanollava | FP16 | 1752 | 140.1 | 12.9 | 6028.1 | 77.51937984 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 90.7 | 13.2 | 2463 | 75.75757576 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 56.7 | 13.2 | 2228 | 75.75757576 |
smolvlm2-256m-video-instruct | INT4-MIXED | 1141 | 244.6 | 14.5 | 2797.1 | 68.96551724 |
smolvlm2-256m-video-instruct | FP16 | 1141 | 225 | 15.1 | 3104.3 | 66.22516556 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 24.6 | 15.9 | 2650.1 | 62.89308176 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 25 | 16 | 2336.6 | 62.5 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 37.1 | 16 | 2574.8 | 62.5 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 37.3 | 16 | 2489.1 | 62.5 |
phi-2 | INT4-MIXED | 32 | 33 | 16.2 | 2475.4 | 61.72839506 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 75.2 | 16.3 | 2749.2 | 61.34969325 |
glm-edge-1.5b-chat | INT8-CW | 32 | 36.1 | 16.5 | 2395.9 | 60.60606061 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 75.3 | 16.5 | 2433.3 | 60.60606061 |
phi-2 | INT4-MIXED | 32 | 33.1 | 17.2 | 2829.3 | 58.13953488 |
glm-edge-1.5b-chat | INT8-CW | 1024 | 121.6 | 17.3 | 2533 | 57.80346821 |
whisper-large-v3 | INT4-MIXED | 32 | 187.9 | 17.3 | 2997.6 | 57.80346821 |
gemma-3-1b-it | FP4-NORMALIZED | 32 | 22.1 | 17.7 | 2613.4 | 56.49717514 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 41.4 | 17.9 | 2822.7 | 55.86592179 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 163 | 17.9 | 2999 | 55.86592179 |
whisper-large-v3 | INT4-MIXED | 1024 | 245 | 17.9 | 3027.1 | 55.86592179 |
gemma-3-1b-it | FP4-NORMALIZED | 1024 | 70.8 | 18 | 2722.8 | 55.55555556 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 163.3 | 18 | 3096.8 | 55.55555556 |
phi-2 | INT4-MIXED | 1024 | 170.4 | 18.1 | 3010.8 | 55.24861878 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 25.4 | 18.3 | 2632.2 | 54.64480874 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 33 | 18.4 | 2593 | 54.34782609 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 27.3 | 18.8 | 2686.8 | 53.19148936 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 29.5 | 18.8 | 2586.7 | 53.19148936 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 27.6 | 19 | 2828.6 | 52.63157895 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 124.5 | 19.1 | 2683.7 | 52.35602094 |
phi-2 | INT4-MIXED | 1024 | 217 | 19.2 | 3241.9 | 52.08333333 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 126.2 | 19.3 | 2896.1 | 51.8134715 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 40.1 | 19.3 | 2859.2 | 51.8134715 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 41.7 | 19.4 | 2943.9 | 51.54639175 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 124.6 | 19.5 | 2711.6 | 51.28205128 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 128.2 | 19.9 | 2929.4 | 50.25125628 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 186.3 | 19.9 | 3302.6 | 50.25125628 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 147.6 | 20 | 3062.1 | 50 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 216.2 | 20 | 2867 | 50 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 29 | 20.1 | 3060.5 | 49.75124378 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 29.2 | 20.1 | 2957.9 | 49.75124378 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 29.5 | 20.1 | 2953.5 | 49.75124378 |
gemma-3-1b-it | FP16 | 32 | 24.8 | 20.4 | 2912.9 | 49.01960784 |
whisper-large-v3 | INT8-CW | 32 | 188.3 | 20.4 | 3603.5 | 49.01960784 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 29.2 | 20.7 | 3033.6 | 48.30917874 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 29.7 | 20.7 | 3037.1 | 48.30917874 |
qwen3-30b-a3b | INT4-MIXED | 32 | 147.7 | 20.7 | 16556.5 | 48.30917874 |
whisper-large-v3 | INT8-CW | 1024 | 245 | 20.7 | 3637.6 | 48.30917874 |
qwen3-30b-a3b | INT4-MIXED | 32 | 148.1 | 20.8 | 16457.2 | 48.07692308 |
gemma-3-1b-it | FP16 | 1024 | 73.5 | 20.9 | 3062.3 | 47.84688995 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 30.7 | 21.1 | 3214.2 | 47.39336493 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 182.5 | 21.1 | 3436.5 | 47.39336493 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 30.9 | 22 | 3367.2 | 45.45454545 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 33 | 22.2 | 3370.1 | 45.04504505 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 190.3 | 22.4 | 3666.8 | 44.64285714 |
qwen3-30b-a3b | INT4-MIXED | 1024 | 553 | 22.4 | 16809.1 | 44.64285714 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 190.8 | 22.5 | 3564.4 | 44.44444444 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 189.8 | 22.5 | 3558.3 | 44.44444444 |
qwen3-30b-a3b | INT4-MIXED | 1024 | 554.7 | 22.5 | 16688.5 | 44.44444444 |
internvl2-4b | INT4-MIXED | 297 | 124.2 | 22.9 | 4369.1 | 43.66812227 |
qwen3-4b | INT4-MIXED | 32 | 36.8 | 22.9 | 3065.8 | 43.66812227 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 196.2 | 23.1 | 3613.7 | 43.29004329 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 195.4 | 23.1 | 3600.9 | 43.29004329 |
internvl2-4b | INT4-MIXED | 297 | 125.8 | 23.2 | 4420.1 | 43.10344828 |
phi-4-mini-instruct | INT4-MIXED | 32 | 41 | 23.2 | 3159.8 | 43.10344828 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 41.1 | 23.2 | 3063.7 | 43.10344828 |
llama-3.2-1b-instruct | FP16 | 32 | 25.3 | 23.4 | 3257.7 | 42.73504274 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 252.1 | 23.5 | 3736.6 | 42.55319149 |
qwen3-4b | INT4-MIXED | 32 | 36.8 | 23.5 | 3237.2 | 42.55319149 |
llama-3.2-1b-instruct | FP16 | 1024 | 78.2 | 23.8 | 3416.3 | 42.01680672 |
phi-4-mini-instruct | INT4-MIXED | 32 | 41.1 | 23.8 | 3231.5 | 42.01680672 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 41.4 | 23.8 | 3143.8 | 42.01680672 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 42.7 | 24.1 | 3298 | 41.49377593 |
afm-4.5b | INT4-MIXED | 32 | 32.1 | 24.2 | 3702.7 | 41.32231405 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 271.3 | 24.3 | 3802 | 41.15226337 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 191.9 | 24.3 | 3473.7 | 41.15226337 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 193.1 | 24.3 | 3376.6 | 41.15226337 |
qwen3-4b | INT4-MIXED | 1024 | 183.6 | 24.4 | 3402.5 | 40.98360656 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 291.4 | 24.5 | 3819.3 | 40.81632653 |
phi-4-mini-instruct | INT4-MIXED | 32 | 45.5 | 24.9 | 3450.1 | 40.16064257 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 196.8 | 24.9 | 3526.9 | 40.16064257 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 197.5 | 24.9 | 3440.4 | 40.16064257 |
afm-4.5b | INT4-MIXED | 1024 | 200 | 25 | 3803.5 | 40 |
gemma-3-4b-it | INT4-MIXED | 32 | 50.8 | 25 | 4448.2 | 40 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 270.2 | 25 | 5425.9 | 40 |
qwen3-4b | INT4-MIXED | 1024 | 186.9 | 25.1 | 3554 | 39.84063745 |
glm-edge-4b-chat | INT4-MIXED | 32 | 58.9 | 25.2 | 3470.7 | 39.68253968 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 217.7 | 25.2 | 3551.2 | 39.68253968 |
internvl2-4b | INT4-MIXED | 1027 | 319 | 25.5 | 5736.7 | 39.21568627 |
gemma-3-4b-it | INT4-MIXED | 32 | 52.3 | 25.7 | 4631.8 | 38.91050584 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 331.7 | 25.7 | 5981.5 | 38.91050584 |
internvl2-4b | INT4-MIXED | 1027 | 333.2 | 25.8 | 5794.7 | 38.75968992 |
gemma-3-4b-it | INT4-MIXED | 32 | 51.5 | 25.9 | 4690.2 | 38.61003861 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 253.5 | 26 | 3653.5 | 38.46153846 |
gemma-3-4b-it | INT4-MIXED | 1024 | 246.5 | 26.5 | 6784.8 | 37.73584906 |
glm-edge-4b-chat | INT4-MIXED | 1024 | 318 | 26.6 | 3781.7 | 37.59398496 |
gpt-oss-20b | INT4-MIXED | 32 | 124.9 | 26.6 | 13007.7 | 37.59398496 |
gemma-3-4b-it | INT4-MIXED | 1024 | 253.5 | 27.1 | 6959.3 | 36.900369 |
phi-2 | INT8-CW | 32 | 40.6 | 27.3 | 3773.7 | 36.63003663 |
gemma-3-4b-it | INT4-MIXED | 1024 | 269.3 | 27.4 | 7015 | 36.49635036 |
gpt-oss-20b | INT4-MIXED | 1024 | 401.4 | 27.5 | 13243.6 | 36.36363636 |
gpt-oss-20b | INT4-MIXED | 32 | 130.6 | 27.6 | 12064 | 36.23188406 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 40.3 | 27.6 | 3802.1 | 36.23188406 |
stablelm-3b-4e1t | INT8-CW | 32 | 39.6 | 27.6 | 3703.5 | 36.23188406 |
deepseek-r1-distill-qwen-1.5b | FP4-NORMALIZED | 32 | 30.6 | 27.7 | 3850 | 36.10108303 |
qwen2.5-1.5b-instruct | FP4-NORMALIZED | 32 | 31 | 27.7 | 3531.7 | 36.10108303 |
deepseek-r1-distill-qwen-1.5b | FP4-NORMALIZED | 1024 | 89.5 | 28.1 | 3927 | 35.58718861 |
qwen2.5-1.5b-instruct | FP4-NORMALIZED | 1024 | 90 | 28.1 | 3615.4 | 35.58718861 |
gpt-oss-20b | INT4-MIXED | 1024 | 388.2 | 28.6 | 12273.9 | 34.96503497 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 270 | 28.7 | 5685.6 | 34.84320557 |
glm-edge-1.5b-chat | FP16 | 32 | 36 | 29.1 | 3772.5 | 34.36426117 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 322.4 | 29.1 | 6599.5 | 34.36426117 |
phi-2 | INT8-CW | 1024 | 183.1 | 29.2 | 4289.2 | 34.24657534 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 173.9 | 29.4 | 4292.4 | 34.01360544 |
stablelm-3b-4e1t | INT8-CW | 1024 | 173.5 | 29.4 | 4195.2 | 34.01360544 |
minicpm3-4b | INT4-MIXED | 32 | 107.4 | 29.5 | 3501.2 | 33.89830508 |
glm-edge-1.5b-chat | FP16 | 1024 | 128 | 29.7 | 3989.8 | 33.67003367 |
chatglm3-6b | INT4-MIXED | 32 | 34.4 | 29.9 | 4072.9 | 33.44481605 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 32.2 | 29.9 | 4315.2 | 33.44481605 |
flan-t5-xxl | INT4-MIXED | 33 | 41.3 | 29.9 | 12873.3 | 33.44481605 |
qwen2.5-1.5b-instruct | FP16 | 32 | 35.5 | 29.9 | 3759.1 | 33.44481605 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 654.6 | 30 | 8108.2 | 33.33333333 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 92.4 | 30.3 | 4450.4 | 33.00330033 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 728.1 | 30.3 | 9127.2 | 33.00330033 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 92 | 30.3 | 3889.5 | 33.00330033 |
whisper-large-v3 | FP16 | 1024 | 237.2 | 30.3 | 4883.3 | 33.00330033 |
whisper-large-v3 | FP16 | 32 | 181.4 | 30.4 | 4856 | 32.89473684 |
minicpm3-4b | INT4-MIXED | 32 | 109.4 | 30.5 | 3687.3 | 32.78688525 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 36.6 | 30.8 | 3924.5 | 32.46753247 |
chatglm3-6b | INT4-MIXED | 1024 | 227.7 | 30.9 | 4254.3 | 32.36245955 |
minicpm3-4b | INT4-MIXED | 32 | 113.1 | 31 | 3810.8 | 32.25806452 |
chatglm3-6b | INT4-MIXED | 32 | 37.3 | 31.2 | 4421.1 | 32.05128205 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 143.3 | 31.6 | 4039 | 31.64556962 |
chatglm3-6b | INT4-MIXED | 1024 | 269.8 | 32.2 | 4519.8 | 31.05590062 |
llama-3.2-3b-instruct | INT8-CW | 32 | 36.4 | 32.5 | 4080.6 | 30.76923077 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 37.9 | 33.1 | 4451.3 | 30.21148036 |
qwen3-vl-4b-thinking | INT4-MIXED | 4909 | 4168.5 | 33.5 | 13927 | 29.85074627 |
qwen3-vl-4b-thinking | INT4-MIXED | 4939 | 4080.5 | 33.5 | 17375.5 | 29.85074627 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 141.1 | 33.6 | 4306.7 | 29.76190476 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 313.9 | 33.9 | 6945.6 | 29.49852507 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 40.6 | 34.1 | 4709 | 29.3255132 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 379.9 | 34.1 | 6802.1 | 29.3255132 |
qwen3-vl-4b-thinking | INT4-MIXED | 4909 | 4194.1 | 34.1 | 14034 | 29.3255132 |
qwen3-vl-4b-thinking | INT4-MIXED | 4939 | 4070.9 | 34.2 | 17418.8 | 29.23976608 |
qwen3-vl-4b-thinking | INT4-MIXED | 4909 | 4329.1 | 34.4 | 14213.5 | 29.06976744 |
qwen3-vl-4b-thinking | INT4-MIXED | 4939 | 4215.9 | 34.5 | 17392.9 | 28.98550725 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 35.1 | 34.6 | 6692.8 | 28.9017341 |
minicpm3-4b | INT4-MIXED | 1024 | 474.2 | 34.7 | 4493.6 | 28.8184438 |
biomistral-7b-slerp | INT4-MIXED | 7 | 37 | 34.8 | 4474.5 | 28.73563218 |
ltx-video | INT4-MIXED | 11 | 35.6 | 34.8 | 6461.5 | 28.73563218 |
falcon-7b-instruct | INT4-MIXED | 32 | 40.5 | 34.9 | 4289.3 | 28.65329513 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 40.9 | 34.9 | 4486.1 | 28.65329513 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 40.8 | 34.9 | 4493 | 28.65329513 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 751.1 | 35.2 | 8514.6 | 28.40909091 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 35.8 | 35.4 | 6838.1 | 28.24858757 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 838.2 | 35.5 | 9375.3 | 28.16901408 |
minicpm3-4b | INT4-MIXED | 1024 | 484.5 | 35.7 | 4611.2 | 28.01120448 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 249.9 | 35.8 | 5060.5 | 27.93296089 |
falcon-7b-instruct | INT4-MIXED | 1024 | 276.7 | 35.9 | 4436.8 | 27.8551532 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 44.1 | 36 | 4662 | 27.77777778 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 43.8 | 36 | 4663.2 | 27.77777778 |
minicpm3-4b | INT4-MIXED | 1024 | 517.5 | 36.1 | 4680.7 | 27.70083102 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 261.5 | 36.1 | 4789.7 | 27.70083102 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 262.6 | 36.2 | 4797.9 | 27.62430939 |
biomistral-7b-slerp | INT4-MIXED | 7 | 38.8 | 36.4 | 4864.7 | 27.47252747 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 44.9 | 36.4 | 4782.3 | 27.47252747 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 42.4 | 36.5 | 5028.1 | 27.39726027 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 44.8 | 36.5 | 4780.7 | 27.39726027 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 44.9 | 36.5 | 4875 | 27.39726027 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 42.6 | 36.5 | 5023.5 | 27.39726027 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 42.8 | 36.5 | 5027.1 | 27.39726027 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 259.1 | 36.8 | 5254.1 | 27.17391304 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 257.1 | 37.3 | 5232.8 | 26.80965147 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 272 | 37.3 | 4877.9 | 26.80965147 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 271.6 | 37.3 | 4885.3 | 26.80965147 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 257.9 | 37.4 | 5227.9 | 26.73796791 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 258.2 | 37.4 | 5231.1 | 26.73796791 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 44.8 | 37.5 | 5267.4 | 26.66666667 |
qwen2-7b-instruct | INT4-MIXED | 32 | 44.7 | 37.5 | 5184 | 26.66666667 |
minicpm-v-2_6 | INT4-MIXED | 228 | 326.3 | 37.6 | 6198.1 | 26.59574468 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 44.9 | 37.6 | 5202 | 26.59574468 |
minicpm-o-2_6 | INT4-MIXED | 238 | 331.7 | 37.7 | 6299.7 | 26.52519894 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 348.9 | 37.7 | 4944.1 | 26.52519894 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 319.2 | 37.8 | 4954.6 | 26.45502646 |
qwen2-7b-instruct | INT4-MIXED | 32 | 46 | 37.9 | 5318.8 | 26.38522427 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 319.3 | 38 | 5051 | 26.31578947 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 46 | 38 | 5315.1 | 26.31578947 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 46.2 | 38 | 5415 | 26.31578947 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 46 | 38.1 | 5426.5 | 26.24671916 |
falcon-7b-instruct | INT4-MIXED | 32 | 47.1 | 38.1 | 4972.9 | 26.24671916 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 43.4 | 38.1 | 4782.8 | 26.24671916 |
phi-3.5-mini-instruct | INT8-CW | 32 | 43.1 | 38.2 | 4685.1 | 26.17801047 |
ltx-video | INT8-CW | 11 | 38.7 | 38.3 | 9352.3 | 26.10966057 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 43.2 | 38.3 | 4774.2 | 26.10966057 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 44.3 | 38.4 | 5346.5 | 26.04166667 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 263.8 | 38.4 | 5405.5 | 26.04166667 |
llama-3-8b-instruct | INT4-MIXED | 32 | 44.3 | 38.4 | 5260.9 | 26.04166667 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 44.1 | 38.4 | 5260.9 | 26.04166667 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 266 | 38.4 | 5323.5 | 26.04166667 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 266.4 | 38.5 | 5339 | 25.97402597 |
minicpm-v-2_6 | INT4-MIXED | 228 | 331.5 | 38.6 | 6353.7 | 25.90673575 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 319.3 | 38.9 | 5493.6 | 25.70694087 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 322.4 | 38.9 | 5402.6 | 25.70694087 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 325 | 39 | 5406.9 | 25.64102564 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 323.5 | 39 | 5498.1 | 25.64102564 |
falcon-7b-instruct | INT4-MIXED | 1024 | 349 | 39.1 | 4894.4 | 25.57544757 |
minicpm-v-2_6 | INT4-MIXED | 228 | 339.8 | 39.1 | 6582.6 | 25.57544757 |
minicpm4-8b | INT4-MIXED | 32 | 48.3 | 39.1 | 5069 | 25.57544757 |
qwen3-4b | INT8-CW | 32 | 45.1 | 39.1 | 4893.9 | 25.57544757 |
lcm-dreamshaper-v7 | INT8-HYBRID | 32 | 41.4 | 39.2 | 3706.3 | 25.51020408 |
minicpm-o-2_6 | INT4-MIXED | 238 | 347.3 | 39.2 | 6570.2 | 25.51020408 |
phi-4-mini-reasoning | INT8-CW | 32 | 44.6 | 39.2 | 4704.2 | 25.51020408 |
lcm-dreamshaper-v7 | INT8-HYBRID | 1024 | 40.7 | 39.3 | 3439.2 | 25.44529262 |
phi-4-mini-instruct | INT8-CW | 32 | 44.9 | 39.3 | 4707.5 | 25.44529262 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 47.5 | 39.5 | 5420.7 | 25.3164557 |
llama-3-8b-instruct | INT4-MIXED | 32 | 47.2 | 39.5 | 5427.9 | 25.3164557 |
llama-3-8b-instruct | INT4-MIXED | 32 | 47.3 | 39.5 | 5429.2 | 25.3164557 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 47.4 | 39.5 | 5427.2 | 25.3164557 |
qwen3-8b | INT4-MIXED | 32 | 45.9 | 39.5 | 5449.1 | 25.3164557 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 265.1 | 39.6 | 5652 | 25.25252525 |
gemma-3-4b-it | INT8-CW | 32 | 54.8 | 39.6 | 6046.7 | 25.25252525 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 264.9 | 39.6 | 5564.9 | 25.25252525 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 264.9 | 39.7 | 5565.6 | 25.18891688 |
llama-3-8b-instruct | INT4-MIXED | 32 | 48.3 | 40 | 5560.4 | 25 |
minicpm4-8b | INT4-MIXED | 1024 | 292.6 | 40 | 5233.2 | 25 |
internvl2-4b | INT8-CW | 297 | 129.7 | 40.1 | 5828.4 | 24.93765586 |
minicpm4-8b | INT4-MIXED | 32 | 51.4 | 40.3 | 5349.5 | 24.81389578 |
phi-4-mini-reasoning | INT8-CW | 1024 | 211.7 | 40.3 | 4965.1 | 24.81389578 |
phi-4-mini-instruct | INT8-CW | 1024 | 210.6 | 40.4 | 4972.9 | 24.75247525 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 244.4 | 40.6 | 5262 | 24.63054187 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 244.3 | 40.6 | 5263.6 | 24.63054187 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 243.8 | 40.6 | 5164.6 | 24.63054187 |
qwen3-4b | INT8-CW | 1024 | 201.3 | 40.6 | 5172.7 | 24.63054187 |
qwen3-8b | INT4-MIXED | 32 | 48.8 | 40.6 | 5687.1 | 24.63054187 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 276.5 | 40.7 | 5675.4 | 24.57002457 |
minicpm-v-4_5 | INT4-MIXED | 217 | 336.3 | 40.7 | 6746.5 | 24.57002457 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 279.1 | 40.8 | 5671.3 | 24.50980392 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 275.5 | 40.8 | 5674.5 | 24.50980392 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 274.9 | 40.8 | 5675.1 | 24.50980392 |
minicpm4-8b | INT4-MIXED | 32 | 52.5 | 40.8 | 5485.8 | 24.50980392 |
qwen3-8b | INT4-MIXED | 1024 | 279.6 | 40.9 | 5758 | 24.44987775 |
afm-4.5b | INT8-CW | 32 | 45.9 | 41.1 | 5320 | 24.33090024 |
gemma-3-4b-it | INT8-CW | 1024 | 261.3 | 41.1 | 8383.7 | 24.33090024 |
qwen3-8b | INT4-MIXED | 32 | 49.7 | 41.1 | 5830.2 | 24.33090024 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 322.2 | 41.3 | 5734 | 24.21307506 |
minicpm4-8b | INT4-MIXED | 1024 | 309.2 | 41.3 | 5421.1 | 24.21307506 |
minicpm4-8b | INT4-MIXED | 1024 | 351.9 | 41.8 | 5537.2 | 23.92344498 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 109.5 | 41.8 | 6170.8 | 23.92344498 |
phi-3.5-vision-instruct | INT8-CW | 802 | 267.3 | 41.9 | 6844.5 | 23.86634845 |
afm-4.5b | INT8-CW | 1024 | 192.3 | 42 | 5467.2 | 23.80952381 |
qwen3-8b | INT4-MIXED | 1024 | 288.5 | 42.1 | 5963.4 | 23.75296912 |
minicpm-v-4_5 | INT4-MIXED | 217 | 350 | 42.5 | 7096.6 | 23.52941176 |
internvl2-4b | INT8-CW | 1027 | 335.1 | 42.7 | 6893 | 23.41920375 |
qwen3-8b | INT4-MIXED | 1024 | 332.8 | 42.7 | 6016.9 | 23.41920375 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 323.6 | 42.8 | 7183.6 | 23.36448598 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 388.2 | 42.9 | 7458.9 | 23.31002331 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 112.8 | 43.6 | 6350.4 | 22.93577982 |
glm-edge-4b-chat | INT8-CW | 32 | 61.5 | 44 | 5182.5 | 22.72727273 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 397.2 | 44 | 7580.2 | 22.72727273 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 451.4 | 44.5 | 7653.3 | 22.47191011 |
gemma-7b-it | INT4-MIXED | 32 | 51.9 | 44.6 | 5585.2 | 22.42152466 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 114 | 44.7 | 6467.7 | 22.37136465 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 51.1 | 45 | 5958.7 | 22.22222222 |
glm-edge-4b-chat | INT8-CW | 1024 | 308.7 | 45.2 | 5437.9 | 22.12389381 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 54.1 | 46.1 | 6282.4 | 21.69197397 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 328.4 | 46.2 | 6147.9 | 21.64502165 |
gemma-7b-it | INT4-MIXED | 32 | 55.1 | 46.3 | 5930.1 | 21.59827214 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 55.3 | 46.7 | 6433.4 | 21.41327623 |
gemma-7b-it | INT4-MIXED | 1024 | 323.7 | 46.8 | 6157.6 | 21.36752137 |
phi-4-multimodal-instruct | INT8-CW | 578 | 287.5 | 47.1 | 7480 | 21.23142251 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 60.4 | 47.4 | 6392.7 | 21.09704641 |
phi-4-multimodal-instruct | INT8-CW | 786 | 351.7 | 47.4 | 8414.2 | 21.09704641 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 62 | 47.5 | 6495.6 | 21.05263158 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 345.5 | 47.5 | 6387.3 | 21.05263158 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 48.7 | 47.5 | 3673 | 21.05263158 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 48.6 | 47.5 | 5312.1 | 21.05263158 |
minicpm3-4b | INT8-CW | 32 | 117.8 | 47.6 | 5462.8 | 21.00840336 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 49.1 | 47.8 | 3620.2 | 20.92050209 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 49 | 47.9 | 4465.1 | 20.87682672 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 396.1 | 48 | 6482.5 | 20.83333333 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 694.2 | 48.3 | 9880.1 | 20.70393375 |
gemma-7b-it | INT4-MIXED | 1024 | 388.2 | 48.6 | 6442.1 | 20.57613169 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 767.3 | 48.6 | 10504.4 | 20.57613169 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 367.5 | 48.7 | 6513 | 20.5338809 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 369.5 | 48.8 | 6609.4 | 20.49180328 |
lcm-dreamshaper-v7 | FP16 | 32 | 50 | 49.2 | 4330.4 | 20.32520325 |
lcm-dreamshaper-v7 | FP16 | 1024 | 50 | 49.3 | 4019.3 | 20.28397566 |
qwen3-vl-4b-thinking | INT8-CW | 4909 | 4313.6 | 49.4 | 15766.1 | 20.24291498 |
qwen3-vl-4b-thinking | INT8-CW | 4939 | 4201.1 | 49.6 | 18495.2 | 20.16129032 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 1895.1 | 50 | 10084.3 | 20 |
gemma-2-9b-it | INT4-MIXED | 32 | 58.3 | 50.1 | 6008.3 | 19.96007984 |
phi-2 | FP16 | 32 | 54.5 | 51 | 6524.6 | 19.60784314 |
gemma-2-9b-it | INT4-MIXED | 32 | 61.5 | 51.5 | 6236.1 | 19.41747573 |
stable-zephyr-3b-dpo | FP16 | 32 | 55.3 | 51.5 | 6550.7 | 19.41747573 |
gemma-2-9b-it | INT4-MIXED | 32 | 63 | 52.1 | 6404.7 | 19.19385797 |
gemma-2-9b-it | INT4-MIXED | 1024 | 385.3 | 52.4 | 6502.5 | 19.08396947 |
minicpm3-4b | INT8-CW | 1024 | 505.7 | 52.7 | 6396.2 | 18.97533207 |
flan-t5-xxl | INT8-CW | 33 | 66.5 | 53.1 | 22610.6 | 18.83239171 |
stable-diffusion-xl-1.0-inpainting-0.1 | FP16 | 32 | 53.9 | 53.4 | 9291.6 | 18.72659176 |
gemma-2-9b-it | INT4-MIXED | 1024 | 394.4 | 53.9 | 6735.1 | 18.5528757 |
phi-2 | FP16 | 1024 | 225.1 | 54 | 7164.3 | 18.51851852 |
gemma-2-9b-it | INT4-MIXED | 1024 | 462.9 | 54.5 | 6823.8 | 18.34862385 |
stable-zephyr-3b-dpo | FP16 | 1024 | 215.6 | 54.5 | 7201.8 | 18.34862385 |
chatglm3-6b | INT8-CW | 32 | 60 | 55.3 | 6896.8 | 18.08318264 |
ltx-video | FP16 | 11 | 55.9 | 55.4 | 15108.8 | 18.05054152 |
chatglm3-6b | INT8-CW | 1024 | 265.9 | 56.2 | 7061.8 | 17.79359431 |
llama-3.2-3b-instruct | FP4-NORMALIZED | 32 | 59.2 | 56.3 | 6748.8 | 17.76198934 |
llama-3.2-3b-instruct | FP4-NORMALIZED | 1024 | 198.1 | 57.2 | 6975.7 | 17.48251748 |
qwen2.5-coder-3b-instruct | FP16 | 32 | 61.3 | 57.6 | 6827 | 17.36111111 |
qwen2.5-coder-3b-instruct | FP16 | 1024 | 183.6 | 58.1 | 6986.4 | 17.21170396 |
llama-3.2-3b-instruct | FP16 | 32 | 62.8 | 60 | 7154.3 | 16.66666667 |
llama-3.2-3b-instruct | FP16 | 1024 | 202 | 61.5 | 7442.8 | 16.2601626 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 69.6 | 61.8 | 7747.8 | 16.18122977 |
llama-2-7b-chat-hf | INT8-CW | 32 | 67.7 | 61.9 | 7640.5 | 16.15508885 |
gemma-3-12b-it | INT4-MIXED | 32 | 87 | 64 | 9044.4 | 15.625 |
falcon-7b-instruct | INT8-CW | 32 | 70.7 | 64.5 | 7541.9 | 15.50387597 |
llama-2-7b-chat-hf | INT8-CW | 1024 | 286.7 | 64.8 | 8217.8 | 15.43209877 |
falcon-7b-instruct | INT8-CW | 1024 | 407.6 | 65.5 | 7677.6 | 15.26717557 |
biomistral-7b-slerp | INT8-CW | 7 | 68.2 | 65.7 | 7809.9 | 15.22070015 |
mistral-7b-instruct-v0.2 | INT8-CW | 32 | 72.9 | 65.9 | 7821.3 | 15.17450683 |
mistral-7b-instruct-v0.3 | INT8-CW | 32 | 72.3 | 65.9 | 7913 | 15.17450683 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 482.8 | 66 | 8665 | 15.15151515 |
mistral-7b-instruct-v0.1 | INT8-CW | 32 | 72.4 | 66 | 7820.7 | 15.15151515 |
phi-4-mini-reasoning | FP4-NORMALIZED | 32 | 71.6 | 66.2 | 7657.3 | 15.10574018 |
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 72.7 | 66.5 | 8231.8 | 15.03759398 |
phi-4-mini-instruct | FP4-NORMALIZED | 32 | 71.3 | 66.5 | 7760.4 | 15.03759398 |
qwen2-7b-instruct | INT8-CW | 32 | 72.5 | 66.6 | 8145.1 | 15.01501502 |
qwen2.5-7b-instruct | INT8-CW | 32 | 72.8 | 66.6 | 8143.3 | 15.01501502 |
qwen2.5-7b-instruct-1m | INT8-CW | 32 | 73.1 | 66.6 | 8234.3 | 15.01501502 |
gemma-3-12b-it | INT4-MIXED | 32 | 86.5 | 66.8 | 9363.2 | 14.97005988 |
deepseek-r1-distill-qwen-7b | INT8-CW | 1024 | 298.9 | 67.3 | 8434.2 | 14.85884101 |
mistral-7b-instruct-v0.1 | INT8-CW | 1025 | 336.1 | 67.3 | 8106.1 | 14.85884101 |
mistral-7b-instruct-v0.2 | INT8-CW | 1024 | 306 | 67.3 | 8115.6 | 14.85884101 |
mistral-7b-instruct-v0.3 | INT8-CW | 1024 | 305.1 | 67.3 | 8211.7 | 14.85884101 |
phi-4-mini-reasoning | FP4-NORMALIZED | 1024 | 273.8 | 67.3 | 7918 | 14.85884101 |
qwen2-7b-instruct | INT8-CW | 1024 | 300.8 | 67.4 | 8338.2 | 14.83679525 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 300.9 | 67.4 | 8342 | 14.83679525 |
qwen2.5-7b-instruct-1m | INT8-CW | 1024 | 302.4 | 67.4 | 8435.1 | 14.83679525 |
minicpm-v-2_6 | INT8-CW | 228 | 344.6 | 67.5 | 9379.3 | 14.81481481 |
gemma-3-4b-it | FP4-NORMALIZED | 32 | 74.8 | 67.6 | 9047.3 | 14.79289941 |
phi-4-mini-instruct | FP4-NORMALIZED | 1024 | 272.8 | 67.6 | 8012.8 | 14.79289941 |
minicpm-o-2_6 | INT8-CW | 238 | 353.7 | 67.7 | 9391.9 | 14.77104874 |
gemma-3-12b-it | INT4-MIXED | 1024 | 589.2 | 67.8 | 11696.5 | 14.74926254 |
phi-4 | INT4-MIXED | 32 | 79.1 | 68.6 | 8458.7 | 14.57725948 |
gemma-3-4b-it | FP4-NORMALIZED | 1024 | 314.3 | 69 | 11387 | 14.49275362 |
phi-4-reasoning | INT4-MIXED | 32 | 78.9 | 69 | 8458.2 | 14.49275362 |
llama-3-8b-instruct | INT8-CW | 32 | 76.2 | 69.4 | 8595 | 14.4092219 |
llama-3.1-8b-instruct | INT8-CW | 32 | 76.4 | 69.4 | 8594.5 | 14.4092219 |
deepseek-r1-distill-llama-8b | INT8-CW | 32 | 76.7 | 69.5 | 8696.7 | 14.38848921 |
phi-3-mini-128k-instruct | FP16 | 32 | 73.6 | 69.5 | 8556.8 | 14.38848921 |
phi-3-mini-4k-instruct | FP16 | 32 | 72.9 | 69.5 | 8544.3 | 14.38848921 |
phi-3.5-mini-instruct | FP16 | 32 | 72.9 | 69.5 | 8467.2 | 14.38848921 |
qwen1.5-14b-chat | INT4-MIXED | 32 | 80 | 69.8 | 9218.9 | 14.32664756 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 79.2 | 70 | 8788.4 | 14.28571429 |
internvl2-4b | FP4-NORMALIZED | 297 | 167.4 | 70.4 | 9063.7 | 14.20454545 |
phi-4 | INT4-MIXED | 1024 | 549 | 70.5 | 8931.9 | 14.18439716 |
qwen3-8b | INT8-CW | 32 | 77.8 | 70.5 | 8869.1 | 14.18439716 |
gemma-3-12b-it | INT4-MIXED | 1024 | 656.4 | 70.6 | 12009.2 | 14.16430595 |
phi-4-reasoning | INT4-MIXED | 1024 | 541.6 | 70.7 | 8930.9 | 14.14427157 |
phi-4-reasoning | INT4-MIXED | 32 | 83.7 | 70.7 | 8875.2 | 14.14427157 |
deepseek-r1-distill-llama-8b | INT8-CW | 1024 | 309 | 70.8 | 8984.3 | 14.12429379 |
llama-3-8b-instruct | INT8-CW | 1024 | 307.7 | 70.8 | 8892.2 | 14.12429379 |
llama-3.1-8b-instruct | INT8-CW | 1024 | 308.3 | 70.9 | 8891.9 | 14.10437236 |
stable-diffusion-v1-5 | INT8-HYBRID | 1024 | 72.1 | 70.9 | 3253.1 | 14.10437236 |
stable-diffusion-v1-5 | INT8-HYBRID | 32 | 72.2 | 70.9 | 3771.1 | 14.10437236 |
internvl2-4b | FP16 | 297 | 173.8 | 71.4 | 9576.9 | 14.00560224 |
llava-v1.6-mistral-7b-hf | INT8-CW | 2944 | 1595.6 | 71.6 | 10146.1 | 13.96648045 |
minicpm-v-4_5 | INT8-CW | 217 | 357.6 | 71.6 | 10045.4 | 13.96648045 |
phi-4 | INT4-MIXED | 32 | 85.7 | 71.8 | 9104.9 | 13.9275766 |
qwen3-8b | INT8-CW | 1024 | 323 | 71.9 | 9158.6 | 13.90820584 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 518.4 | 72 | 9179.6 | 13.88888889 |
phi-4-mini-instruct | FP16 | 32 | 76.3 | 72 | 8385.9 | 13.88888889 |
phi-4-mini-reasoning | FP16 | 32 | 76.5 | 72 | 8391.2 | 13.88888889 |
qwen2.5-vl-7b-instruct | INT8-CW | 32 | 141.5 | 72.1 | 9374.7 | 13.86962552 |
phi-4-reasoning | INT4-MIXED | 1024 | 572.3 | 72.5 | 9243.9 | 13.79310345 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 85.5 | 72.8 | 9437 | 13.73626374 |
internvl2-4b | FP4-NORMALIZED | 1027 | 445.7 | 73 | 9942 | 13.69863014 |
phi-3.5-vision-instruct | FP16 | 802 | 353.1 | 73 | 10393 | 13.69863014 |
phi-3-mini-4k-instruct | FP16 | 1024 | 286.1 | 73.1 | 9309.7 | 13.67989056 |
qwen2.5-vl-7b-instruct | INT8-CW | 1024 | 429.7 | 73.1 | 10679 | 13.67989056 |
phi-3-mini-128k-instruct | FP16 | 1024 | 286.4 | 73.2 | 9319.8 | 13.66120219 |
phi-3.5-mini-instruct | FP16 | 1024 | 286.6 | 73.2 | 9223.8 | 13.66120219 |
gemma-3-4b-it | FP16 | 32 | 80.5 | 73.5 | 10751.3 | 13.60544218 |
phi-4-mini-instruct | FP16 | 1024 | 282 | 73.5 | 8702.3 | 13.60544218 |
phi-4-mini-reasoning | FP16 | 1024 | 282.1 | 73.6 | 8705.3 | 13.58695652 |
phi-3.5-vision-instruct | FP16 | 1032 | 452.4 | 73.7 | 10608.3 | 13.56852103 |
phi-4 | INT4-MIXED | 1024 | 662.1 | 73.9 | 9345.8 | 13.53179973 |
internvl2-4b | FP16 | 1027 | 454.3 | 74 | 10432.5 | 13.51351351 |
qwen1.5-14b-chat | INT4-MIXED | 1024 | 586.1 | 74.1 | 10122.9 | 13.49527665 |
phi-4-reasoning | INT4-MIXED | 32 | 98.2 | 74.7 | 9678.3 | 13.38688086 |
minicpm4-8b | INT8-CW | 32 | 82.3 | 74.9 | 8781.4 | 13.35113485 |
gemma-3-4b-it | FP16 | 1024 | 326.7 | 75 | 13114.3 | 13.33333333 |
qwen3-4b | FP16 | 32 | 79.5 | 75 | 8739.7 | 13.33333333 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 636.5 | 75.1 | 9619.6 | 13.31557923 |
minicpm4-8b | INT8-CW | 1024 | 371.1 | 75.8 | 8939.7 | 13.19261214 |
sd-turbo | INT8-HYBRID | 1024 | 77.1 | 75.9 | 4010.9 | 13.17523057 |
sd-turbo | INT8-HYBRID | 32 | 76.9 | 76 | 4009.9 | 13.15789474 |
stable-diffusion-v2-1 | INT8-HYBRID | 1024 | 77 | 76.2 | 4015.7 | 13.12335958 |
stable-diffusion-v2-1 | INT8-HYBRID | 32 | 77.2 | 76.2 | 4058.2 | 13.12335958 |
phi-4-reasoning | INT4-MIXED | 1024 | 733.4 | 76.7 | 9641.7 | 13.03780965 |
llava-next-video-7b-hf | INT8-CW | 2945 | 1899.1 | 77 | 12181.7 | 12.98701299 |
qwen3-4b | FP16 | 1024 | 267.9 | 77 | 9093 | 12.98701299 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 94 | 77.3 | 9332.7 | 12.93661061 |
starcoder2-15b | INT4-MIXED | 32 | 91 | 77.8 | 9563.4 | 12.85347044 |
glm-edge-4b-chat | FP16 | 32 | 85.6 | 78.7 | 9338.3 | 12.7064803 |
starcoder2-15b | INT4-MIXED | 1024 | 733.3 | 79.6 | 9486.1 | 12.56281407 |
afm-4.5b | FP16 | 32 | 83.7 | 80.1 | 9812.3 | 12.48439451 |
glm-edge-4b-chat | FP16 | 1024 | 363.1 | 80.3 | 9714 | 12.45330012 |
gemma-7b-it | INT8-CW | 32 | 89.2 | 80.8 | 9375.9 | 12.37623762 |
glm-4-9b-chat-hf | INT8-CW | 32 | 89.2 | 81 | 9974.2 | 12.34567901 |
afm-4.5b | FP16 | 1024 | 268.8 | 81.1 | 10050.4 | 12.33045623 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 578.1 | 81.6 | 10212.6 | 12.25490196 |
minicpm3-4b | FP4-NORMALIZED | 32 | 103 | 81.6 | 9069.4 | 12.25490196 |
phi-4-multimodal-instruct | FP16 | 578 | 367.4 | 82.6 | 12458.2 | 12.10653753 |
glm-4-9b-chat-hf | INT8-CW | 1024 | 390.1 | 82.9 | 10134.7 | 12.06272618 |
phi-4-multimodal-instruct | FP16 | 786 | 421 | 82.9 | 13020.7 | 12.06272618 |
gemma-7b-it | INT8-CW | 1024 | 394.5 | 83.1 | 9944.3 | 12.03369434 |
stable-diffusion-v1-5 | INT8-CW | 32 | 84.5 | 83.1 | 5908.8 | 12.03369434 |
stable-diffusion-v1-5 | INT8-CW | 1024 | 84.3 | 83.3 | 3610.4 | 12.00480192 |
minicpm3-4b | FP16 | 32 | 103.9 | 83.5 | 9542 | 11.9760479 |
stable-diffusion-v1-5 | INT8-CW | 1024 | 84.8 | 83.7 | 3775.1 | 11.9474313 |
stable-diffusion-v1-5 | INT8-CW | 32 | 84.9 | 83.8 | 6709.7 | 11.93317422 |
phi-4-multimodal-instruct | FP16 | 1362 | 855 | 84 | 14975.4 | 11.9047619 |
phi-4-multimodal-instruct | FP16 | 1570 | 941.1 | 84.3 | 15668.7 | 11.8623962 |
qwen3-vl-4b-thinking | FP16 | 4909 | 4934.8 | 84.6 | 20141.7 | 11.82033097 |
qwen3-vl-4b-thinking | FP16 | 4939 | 4830.7 | 84.6 | 22849.6 | 11.82033097 |
stable-diffusion-v1-5 | FP16 | 1024 | 85.5 | 84.9 | 4093.6 | 11.77856302 |
stable-diffusion-v1-5 | FP16 | 32 | 85.9 | 85 | 4604.6 | 11.76470588 |
stable-diffusion-v2-1 | INT8-CW | 1024 | 86.6 | 85.7 | 4952.6 | 11.66861144 |
sd-turbo | INT8-CW | 32 | 86.4 | 85.8 | 5353.3 | 11.65501166 |
sd-turbo | INT8-CW | 1024 | 86.4 | 85.9 | 4917.4 | 11.64144354 |
stable-diffusion-v2-1 | INT8-CW | 32 | 86.5 | 85.9 | 5455 | 11.64144354 |
sd-turbo | INT8-CW | 32 | 87 | 86.3 | 5208.7 | 11.58748552 |
stable-diffusion-v2-1 | INT8-CW | 32 | 87.3 | 86.4 | 5642 | 11.57407407 |
minicpm3-4b | FP4-NORMALIZED | 1024 | 562.5 | 86.5 | 9963.3 | 11.56069364 |
sd-turbo | INT8-CW | 1024 | 87.3 | 86.7 | 5007.1 | 11.53402537 |
stable-diffusion-v2-1 | INT8-CW | 1024 | 86.9 | 86.7 | 5251.7 | 11.53402537 |
stable-diffusion-v2-1 | FP16 | 1024 | 87.6 | 87.1 | 5429.2 | 11.48105626 |
stable-diffusion-v2-1 | FP16 | 32 | 87.9 | 87.1 | 5424.5 | 11.48105626 |
sd-turbo | FP16 | 32 | 87.3 | 87.2 | 5470.6 | 11.46788991 |
sd-turbo | FP16 | 1024 | 87.4 | 87.7 | 5476 | 11.40250855 |
gemma-2-9b-it | INT8-CW | 32 | 97.1 | 88.2 | 10079.2 | 11.33786848 |
gemma-2-9b-it | INT8-CW | 1024 | 445.2 | 90.5 | 10539.3 | 11.04972376 |
minicpm3-4b | FP16 | 1024 | 571.4 | 90.7 | 11081.7 | 11.02535832 |
All models listed here were tested with the following parameters:
Framework: PyTorch
Beam: 1
Batch size: 1