Most Efficient Large Language Models for AI PC#
This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2026.2, 28 May 2026.
The tables below list the key performance indicators for inference on built-in GPUs.
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec |
|---|---|---|---|---|---|
llama-3.2-1b-instruct | INT4-MIXED | 32 | 13 | 8 | 76.92 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 13.3 | 8 | 75.19 |
minicpm4-0.5b | FP4-NORMALIZED | 32 | 14.1 | 8.7 | 70.92 |
gemma-3-270m | FP16 | 32 | 14.8 | 6.5 | 67.57 |
minicpm4-0.5b | FP16 | 32 | 14.8 | 9.4 | 67.57 |
llama-3.2-1b-instruct | INT8-CW | 32 | 15 | 12.7 | 66.67 |
minicpm4-0.5b | INT4-MIXED | 32 | 15.4 | 3.9 | 64.94 |
minicpm4-0.5b | INT4-MIXED | 32 | 15.4 | 3.9 | 64.94 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 15.8 | 4.4 | 63.29 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 15.9 | 4.5 | 62.89 |
qwen2.5-coder-0.5b-instruct | FP16 | 32 | 16.1 | 10.5 | 62.11 |
minicpm4-0.5b | INT4-MIXED | 32 | 16.4 | 4 | 60.98 |
gemma-3-270m | INT8-CW | 32 | 16.7 | 7.5 | 59.88 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 16.8 | 4.5 | 59.52 |
minicpm4-0.5b | INT8-CW | 32 | 17.3 | 5.5 | 57.80 |
gemma-3-270m | INT4-MIXED | 32 | 17.4 | 5.7 | 57.47 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 32 | 17.8 | 6.1 | 56.18 |
gemma-3-270m | FP16 | 1024 | 18.7 | 6.7 | 53.48 |
gemma-2b-it | INT4-MIXED | 32 | 19.1 | 15 | 52.36 |
gemma-3-270m | INT4-MIXED | 1024 | 19.1 | 4.3 | 52.36 |
qwen3-embedding-0.6b | FP16 | 32 | 19.1 | -1 | 52.36 |
qwen3-embedding-0.6b | INT4-MIXED | 32 | 19.6 | -1 | 51.02 |
gemma-2b-it | INT4-MIXED | 32 | 19.8 | 15.7 | 50.51 |
gemma-3-270m | INT8-CW | 1024 | 19.9 | 4.6 | 50.25 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 19.9 | 9.8 | 50.25 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 19.9 | 10 | 50.25 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 20 | 9.9 | 50.00 |
qwen3-embedding-0.6b | INT4-MIXED | 32 | 20 | -1 | 50.00 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 20.2 | 9.9 | 49.50 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 20.7 | 10.1 | 48.31 |
qwen3-embedding-0.6b | INT4-MIXED | 32 | 20.7 | -1 | 48.31 |
qwen3-embedding-0.6b | INT8-CW | 32 | 21.3 | -1 | 46.95 |
qwen3-reranker-0.6b-seq-cls | FP16 | 22 | 21.3 | -1 | 46.95 |
gemma-3-1b-it | FP4-NORMALIZED | 32 | 21.4 | 18.3 | 46.73 |
gemma-3-1b-it | INT4-MIXED | 32 | 21.7 | 11.2 | 46.08 |
gemma-3-1b-it | INT4-MIXED | 32 | 22.1 | 11.3 | 45.25 |
gemma-3-1b-it | INT4-MIXED | 32 | 22.4 | 11.7 | 44.64 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 22.5 | 12.6 | 44.44 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 22.5 | 10.8 | 44.44 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 23.2 | 15.7 | 43.10 |
minicpm4-0.5b | INT4-MIXED | 1024 | 23.6 | 4 | 42.37 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 23.9 | 15.7 | 41.84 |
minicpm4-0.5b | INT4-MIXED | 1024 | 24 | 4.2 | 41.67 |
gemma-3-1b-it | FP16 | 32 | 24.3 | 20.8 | 41.15 |
gemma-2-2b | INT4-MIXED | 33 | 24.4 | 16.9 | 40.98 |
gemma-3-1b-it | INT8-CW | 32 | 24.5 | 13.2 | 40.82 |
llama-3.2-1b-instruct | FP16 | 32 | 24.9 | 23.2 | 40.16 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 32 | 25.2 | 15.7 | 39.68 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 26.1 | 4.5 | 38.31 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 26.2 | 4.6 | 38.17 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 26.4 | 18.1 | 37.88 |
minicpm4-0.5b | INT8-CW | 1024 | 26.4 | 5.9 | 37.88 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 26.8 | 18.6 | 37.31 |
qwen3-reranker-0.6b-seq-cls | INT4-MIXED | 22 | 26.8 | -1 | 37.31 |
qwen3-reranker-0.6b-seq-cls | INT4-MIXED | 22 | 27.1 | -1 | 36.90 |
qwen3-reranker-0.6b-seq-cls | INT4-MIXED | 22 | 27.6 | -1 | 36.23 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 27.8 | 18.9 | 35.97 |
gemma-2b-it | INT8-CW | 32 | 28.3 | 25 | 35.34 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 1024 | 28.6 | 6.3 | 34.97 |
minicpm4-0.5b | INT4-MIXED | 1024 | 28.9 | 4.1 | 34.60 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 28.9 | 19.9 | 34.60 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 28.9 | 19.8 | 34.60 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 28.9 | 19.8 | 34.60 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 29.1 | 20.4 | 34.36 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 29.3 | 20.4 | 34.13 |
gemma-2-2b | INT4-MIXED | 33 | 29.6 | 17.5 | 33.78 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 30 | 20.9 | 33.33 |
deepseek-r1-distill-qwen-1.5b | FP4-NORMALIZED | 32 | 30.3 | 27.5 | 33.00 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 30.7 | 21.7 | 32.57 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 31 | 15.7 | 32.26 |
minicpm4-0.5b | FP4-NORMALIZED | 1024 | 31.2 | 8.9 | 32.05 |
qwen3-reranker-0.6b | INT4-MIXED | 22 | 31.3 | -1 | 31.95 |
qwen3-reranker-0.6b-seq-cls | INT8-CW | 22 | 31.4 | -1 | 31.85 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 31.5 | 4.6 | 31.75 |
qwen3-reranker-0.6b | INT4-MIXED | 22 | 31.6 | -1 | 31.65 |
afm-4.5b | INT4-MIXED | 32 | 31.9 | 24 | 31.35 |
qwen2.5-1.5b-instruct | FP16 | 32 | 31.9 | 29.7 | 31.35 |
qwen2.5-coder-1.5b-instruct | FP16 | 32 | 32 | 29.6 | 31.25 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 32.1 | 29.6 | 31.15 |
minicpm4-0.5b | FP16 | 1024 | 32.2 | 9.7 | 31.06 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 32.2 | 21.9 | 31.06 |
qwen2.5-1.5b-instruct | FP4-NORMALIZED | 32 | 32.3 | 27.5 | 30.96 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 32.5 | 18.7 | 30.77 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 32.5 | 18.1 | 30.77 |
glm-edge-1.5b-chat | INT4-MIXED | 32 | 32.6 | 11.1 | 30.67 |
gemma-2-2b | INT8-CW | 33 | 32.7 | 26 | 30.58 |
qwen3-reranker-0.6b | FP16 | 22 | 32.7 | -1 | 30.58 |
minicpm-1b-sft | INT4-MIXED | 31 | 33.7 | 9.6 | 29.67 |
chatglm3-6b | INT4-MIXED | 32 | 34.3 | 29.8 | 29.15 |
qwen3-reranker-0.6b | INT4-MIXED | 22 | 34.5 | -1 | 28.99 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 34.5 | 34 | 28.99 |
minicpm-1b-sft | INT4-MIXED | 31 | 34.6 | 9.8 | 28.90 |
glm-edge-1.5b-chat | INT8-CW | 32 | 34.8 | 16.2 | 28.74 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 35.3 | 35.1 | 28.33 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 35.6 | 30.5 | 28.09 |
qwen2.5-coder-0.5b-instruct | FP16 | 1024 | 35.7 | 10.7 | 28.01 |
llama-3.2-3b-instruct | INT8-CW | 32 | 35.9 | 32.3 | 27.86 |
minicpm-1b-sft | INT4-MIXED | 31 | 36.2 | 10.5 | 27.62 |
minicpm-1b-sft | FP16 | 31 | 36.3 | 27.9 | 27.55 |
biomistral-7b-slerp | INT4-MIXED | 7 | 36.4 | 34.5 | 27.47 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 36.4 | 15.8 | 27.47 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 36.5 | 19.1 | 27.40 |
glm-edge-1.5b-chat | FP16 | 32 | 36.8 | 28.7 | 27.17 |
qwen3-4b | INT4-MIXED | 32 | 36.8 | 22.7 | 27.17 |
phi-2 | INT4-MIXED | 32 | 36.9 | 15.9 | 27.10 |
chatglm3-6b | INT4-MIXED | 32 | 37 | 31.1 | 27.03 |
ltx-video | INT4-MIXED | 11 | 37 | 36.4 | 27.03 |
codellama-7b | INT4-MIXED | 32 | 37.4 | 32.8 | 26.74 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 37.4 | 32.8 | 26.74 |
qwen3-4b | INT4-MIXED | 32 | 37.6 | 23.4 | 26.60 |
phi-2 | INT4-MIXED | 32 | 38.1 | 16.9 | 26.25 |
biomistral-7b-slerp | INT4-MIXED | 7 | 38.4 | 36.2 | 26.04 |
ltx-video | INT8-CW | 11 | 38.7 | 38 | 25.84 |
qwen3-reranker-0.6b | INT8-CW | 22 | 39.4 | -1 | 25.38 |
phi-2 | INT8-CW | 32 | 39.6 | 27.1 | 25.25 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 39.6 | 27.2 | 25.25 |
stablelm-3b-4e1t | INT8-CW | 32 | 39.6 | 27.3 | 25.25 |
lcm-dreamshaper-v7 | INT8-HYBRID | 32 | 39.8 | 38.6 | 25.13 |
falcon-7b-instruct | INT4-MIXED | 32 | 40.3 | 34.6 | 24.81 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 40.4 | 34.7 | 24.75 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 40.6 | 22.9 | 24.63 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 40.7 | 34.6 | 24.57 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 40.7 | 23.5 | 24.57 |
lcm-dreamshaper-v7 | INT8-HYBRID | 1024 | 40.8 | 38.8 | 24.51 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 40.8 | 33.8 | 24.51 |
phi-4-mini-instruct | INT4-MIXED | 32 | 41 | 22.9 | 24.39 |
minicpm-1b-sft | INT8-CW | 31 | 41.2 | 15.6 | 24.27 |
codellama-7b | INT4-MIXED | 32 | 41.4 | 34.3 | 24.15 |
phi-4-mini-instruct | INT4-MIXED | 32 | 41.5 | 23.5 | 24.10 |
qwen3-embedding-0.6b | INT4-MIXED | 1024 | 41.5 | -1 | 24.10 |
qwen3-reranker-0.6b-seq-cls | INT4-MIXED | 836 | 41.8 | -1 | 23.92 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 41.9 | 17.6 | 23.87 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 41.9 | 19.2 | 23.87 |
qwen3-embedding-0.6b | INT4-MIXED | 1024 | 42.1 | -1 | 23.75 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 42.2 | 36.3 | 23.70 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 42.8 | 37.9 | 23.36 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 42.9 | 37.9 | 23.31 |
phi-3.5-mini-instruct | INT8-CW | 32 | 43 | 38.1 | 23.26 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 43.2 | 23.8 | 23.15 |
qwen3-reranker-0.6b-seq-cls | INT4-MIXED | 836 | 43.2 | -1 | 23.15 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 43.4 | 8.4 | 23.04 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 43.6 | 35.8 | 22.94 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 43.9 | 35.8 | 22.78 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 44 | 38.2 | 22.73 |
llama-3-8b-instruct | INT4-MIXED | 32 | 44 | 38.2 | 22.73 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 44 | 38.2 | 22.73 |
phi-4-mini-reasoning | INT8-CW | 32 | 44.3 | 39 | 22.57 |
phi-4-mini-instruct | INT8-CW | 32 | 44.4 | 39.1 | 22.52 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 44.5 | 36.3 | 22.47 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 44.6 | 36.3 | 22.42 |
qwen2-7b-instruct | INT4-MIXED | 32 | 44.6 | 37.4 | 22.42 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 44.7 | 37.4 | 22.37 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 44.8 | 37.3 | 22.32 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 44.8 | 36.2 | 22.32 |
qwen3-4b | INT8-CW | 32 | 44.8 | 39 | 22.32 |
qwen3-embedding-0.6b | INT8-CW | 1024 | 44.8 | -1 | 22.32 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 44.9 | 36.3 | 22.27 |
bloomz-7b1 | INT4-MIXED | 32 | 45.2 | 38.9 | 22.12 |
phi-4-mini-instruct | INT4-MIXED | 32 | 45.2 | 24.6 | 22.12 |
qwen3-8b | INT4-MIXED | 32 | 45.2 | 39.3 | 22.12 |
qwen2-7b-instruct | INT4-MIXED | 32 | 45.5 | 37.9 | 21.98 |
afm-4.5b | INT8-CW | 32 | 45.7 | 40.9 | 21.88 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 45.8 | 37.8 | 21.83 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 45.8 | 37.8 | 21.83 |
qwen3-reranker-0.6b-seq-cls | FP16 | 836 | 45.8 | -1 | 21.83 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 45.9 | 37.8 | 21.79 |
minicpm4-8b | INT4-MIXED | 32 | 46.4 | 38.8 | 21.55 |
falcon-7b-instruct | INT4-MIXED | 32 | 47.1 | 37.9 | 21.23 |
qwen3-embedding-0.6b | FP16 | 1024 | 47.2 | -1 | 21.19 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 47.3 | 39.3 | 21.14 |
llama-3-8b-instruct | INT4-MIXED | 32 | 47.3 | 39.3 | 21.14 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 47.5 | 39.2 | 21.05 |
qwen3-reranker-0.6b-seq-cls | INT4-MIXED | 836 | 47.6 | -1 | 21.01 |
llama-3-8b-instruct | INT4-MIXED | 32 | 47.7 | 39.2 | 20.96 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 47.9 | 47.2 | 20.88 |
llama-3-8b-instruct | INT4-MIXED | 32 | 48.2 | 39.8 | 20.75 |
qwen3-8b | INT4-MIXED | 32 | 48.2 | 40.5 | 20.75 |
qwen3-reranker-0.6b-seq-cls | INT8-CW | 836 | 48.2 | -1 | 20.75 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 48.3 | 47.6 | 20.70 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 48.5 | 47.7 | 20.62 |
bloomz-7b1 | INT4-MIXED | 32 | 48.9 | 40.3 | 20.45 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 48.9 | 47.1 | 20.45 |
gemma-2b-it | FP16 | 32 | 49.6 | 46.3 | 20.16 |
minicpm4-8b | INT4-MIXED | 32 | 49.6 | 40 | 20.16 |
qwen3-embedding-0.6b | INT4-MIXED | 1024 | 49.6 | -1 | 20.16 |
lcm-dreamshaper-v7 | FP16 | 1024 | 49.7 | 49 | 20.12 |
lcm-dreamshaper-v7 | FP16 | 32 | 49.7 | 48.9 | 20.12 |
qwen3-8b | INT4-MIXED | 32 | 49.8 | 40.9 | 20.08 |
baichuan2-7b-chat | INT4-MIXED | 32 | 50.9 | 43.4 | 19.65 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 51.2 | 44.5 | 19.53 |
minicpm4-8b | INT4-MIXED | 32 | 51.3 | 40.6 | 19.49 |
gemma-3-1b-it | INT4-MIXED | 1024 | 51.7 | 8.4 | 19.34 |
gemma-7b-it | INT4-MIXED | 32 | 51.8 | 44.4 | 19.31 |
qwen3_8b_eagle3 | INT4-MIXED | 32 | 51.9 | 33.8 | 19.27 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 52.1 | 8.6 | 19.19 |
gemma-3-4b-it | INT4-MIXED | 32 | 52.7 | 25.3 | 18.98 |
gemma-2-2b | FP16 | 33 | 53.3 | 50.2 | 18.76 |
gemma-3-1b-it | INT4-MIXED | 1024 | 53.8 | 8.6 | 18.59 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 54.1 | 45.8 | 18.48 |
phi-2 | FP16 | 32 | 54.1 | 50.8 | 18.48 |
stable-zephyr-3b-dpo | FP16 | 32 | 54.3 | 51 | 18.42 |
stablelm-3b-4e1t | FP16 | 32 | 54.5 | 51.1 | 18.35 |
gemma-7b-it | INT4-MIXED | 32 | 55 | 46.2 | 18.18 |
gemma-3-4b-it | INT4-MIXED | 32 | 55.1 | 25.6 | 18.15 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 55.3 | 46.4 | 18.08 |
ltx-video | FP16 | 11 | 55.4 | 54.8 | 18.05 |
gemma-3-4b-it | INT4-MIXED | 32 | 55.6 | 24.8 | 17.99 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 56.3 | 13 | 17.76 |
nanollava | INT8-CW | 760 | 56.5 | 7.4 | 17.70 |
nanollava | INT4-MIXED | 760 | 57.3 | 6.5 | 17.45 |
gemma-2-9b-it | INT4-MIXED | 32 | 58 | 49.7 | 17.24 |
gemma-3-4b-it | INT8-CW | 32 | 58.5 | 39.5 | 17.09 |
glm-edge-4b-chat | INT4-MIXED | 32 | 58.5 | 24.5 | 17.09 |
llama-3.2-3b-instruct | FP4-NORMALIZED | 32 | 59.4 | 56.3 | 16.84 |
chatglm3-6b | INT8-CW | 32 | 60.3 | 55.3 | 16.58 |
gemma-3-1b-it | INT4-MIXED | 1024 | 60.3 | 9.1 | 16.58 |
nanollava | FP16 | 760 | 60.3 | 11.5 | 16.58 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 60.5 | 47.1 | 16.53 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 60.8 | 47.1 | 16.45 |
gemma-3-1b-it | INT8-CW | 1024 | 61.1 | 11.8 | 16.37 |
qwen2.5-coder-3b-instruct | FP16 | 32 | 61.1 | 57.3 | 16.37 |
glm-edge-4b-chat | INT8-CW | 32 | 61.2 | 43.3 | 16.34 |
gemma-2-9b-it | INT4-MIXED | 32 | 61.6 | 51.1 | 16.23 |
gemma-2-9b-it | INT4-MIXED | 32 | 62.5 | 51.7 | 16.00 |
llama-3.2-3b-instruct | FP16 | 32 | 62.5 | 60 | 16.00 |
whisper-small | FP16 | prompt0 | 64.1 | 8.1 | 15.60 |
whisper-small | INT4-MIXED | prompt0 | 66.1 | 6 | 15.13 |
qwen3.5-0.8b | FP16 | 85 | 66.2 | 15.8 | 15.11 |
whisper-small | INT4-MIXED | prompt0 | 66.8 | 6.2 | 14.97 |
llama-2-7b-chat-hf | INT8-CW | 32 | 67.1 | 62.1 | 14.90 |
biomistral-7b-slerp | INT8-CW | 7 | 67.5 | 65.5 | 14.81 |
codellama-7b | INT8-CW | 32 | 67.5 | 61.7 | 14.81 |
qwen3.5-0.8b | FP4-NORMALIZED | 85 | 68.1 | 13.4 | 14.68 |
lcm-sdxl | INT8-HYBRID | 32 | 68.7 | 68 | 14.56 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 69.2 | 61.5 | 14.45 |
lcm-sdxl | INT8-HYBRID | 1024 | 69.4 | 68.2 | 14.41 |
lcm-sdxl | INT8-CW | 1024 | 69.8 | 69.2 | 14.33 |
lcm-sdxl | INT8-CW | 32 | 69.9 | 69.3 | 14.31 |
whisper-small | INT8-CW | prompt0 | 70 | 6.2 | 14.29 |
falcon-7b-instruct | INT8-CW | 32 | 70.1 | 64.3 | 14.27 |
minicpm-1b-sft | INT4-MIXED | 1014 | 70.6 | 10.5 | 14.16 |
lcm-sdxl | INT8-CW | 1024 | 70.9 | 70.5 | 14.10 |
lcm-sdxl | INT8-CW | 32 | 71.1 | 70.5 | 14.06 |
phi-4-mini-reasoning | FP4-NORMALIZED | 32 | 71.1 | 66.2 | 14.06 |
phi-4-mini-instruct | FP4-NORMALIZED | 32 | 71.2 | 66.4 | 14.04 |
stable-diffusion-v1-5 | INT8-HYBRID | 1024 | 71.2 | 70.4 | 14.04 |
stable-diffusion-v1-5 | INT8-HYBRID | 32 | 71.2 | 70.2 | 14.04 |
whisper-small | INT4-MIXED | prompt0 | 71.2 | 6.4 | 14.04 |
baichuan2-7b-chat | INT8-CW | 32 | 71.6 | 65.1 | 13.97 |
dolly-v2-12b | INT4-MIXED | 32 | 71.7 | 59.7 | 13.95 |
mistral-7b-instruct-v0.1 | INT8-CW | 32 | 72 | 65.9 | 13.89 |
mistral-7b-instruct-v0.2 | INT8-CW | 32 | 72 | 65.9 | 13.89 |
mistral-7b-instruct-v0.3 | INT8-CW | 32 | 72 | 65.7 | 13.89 |
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 72.5 | 66.5 | 13.79 |
qwen2.5-7b-instruct | INT8-CW | 32 | 72.5 | 66.8 | 13.79 |
qwen2.5-7b-instruct-1m | INT8-CW | 32 | 72.6 | 66.4 | 13.77 |
qwen2-7b-instruct | INT8-CW | 32 | 72.7 | 66.4 | 13.76 |
phi-3-mini-4k-instruct | FP16 | 32 | 72.8 | 69.6 | 13.74 |
bloomz-7b1 | INT8-CW | 32 | 72.9 | 68 | 13.72 |
gemma-3-1b-it | FP4-NORMALIZED | 1024 | 72.9 | 18 | 13.72 |
phi-3-mini-128k-instruct | FP16 | 32 | 72.9 | 69.3 | 13.72 |
phi-3.5-mini-instruct | FP16 | 32 | 73.1 | 69.3 | 13.68 |
gemma-3-1b-it | FP16 | 1024 | 73.6 | 20.7 | 13.59 |
qwen3.5-0.8b | INT4-MIXED | 85 | 74 | 6.9 | 13.51 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 74.7 | 10.1 | 13.39 |
sd-turbo | INT8-HYBRID | 1024 | 74.7 | 74.3 | 13.39 |
sd-turbo | INT8-HYBRID | 32 | 74.7 | 74.1 | 13.39 |
stable-diffusion-v2-1 | INT8-HYBRID | 1024 | 74.8 | 74.3 | 13.37 |
stable-diffusion-v2-1 | INT8-HYBRID | 32 | 75 | 74.2 | 13.33 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 75.5 | 16.3 | 13.25 |
llama-3.1-8b-instruct | INT8-CW | 32 | 75.6 | 69.4 | 13.23 |
deepseek-r1-distill-llama-8b | INT8-CW | 32 | 75.7 | 69.3 | 13.21 |
llama-3-8b-instruct | INT8-CW | 32 | 75.7 | 69.3 | 13.21 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 1024 | 76 | 16.3 | 13.16 |
llama-3.2-1b-instruct | FP16 | 1024 | 76.2 | 23.6 | 13.12 |
phi-4-mini-instruct | FP16 | 32 | 76.4 | 71.8 | 13.09 |
phi-4-mini-reasoning | FP16 | 32 | 76.4 | 72 | 13.09 |
qwen3.5-0.8b | INT4-MIXED | 85 | 76.6 | 7.1 | 13.05 |
qwen3-reranker-0.6b | INT4-MIXED | 836 | 77.1 | -1 | 12.97 |
qwen3-8b | INT8-CW | 32 | 77.6 | 70.1 | 12.89 |
minicpm-1b-sft | INT4-MIXED | 1014 | 78.2 | 11.3 | 12.79 |
phi-4 | INT4-MIXED | 32 | 78.2 | 68.3 | 12.79 |
phi-4-reasoning | INT4-MIXED | 32 | 78.2 | 68.3 | 12.79 |
gemma-3-4b-it | FP4-NORMALIZED | 32 | 78.5 | 67.5 | 12.74 |
qwen3-reranker-0.6b | FP16 | 836 | 78.7 | -1 | 12.71 |
qwen3-4b | FP16 | 32 | 78.8 | 74.4 | 12.69 |
qwen3.5-0.8b | INT8-CW | 85 | 78.8 | 9.2 | 12.69 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 79 | 10.5 | 12.66 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 79.1 | 16.2 | 12.64 |
qwen3.5-0.8b | INT4-MIXED | 85 | 79.8 | 7.3 | 12.53 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 80 | 10.4 | 12.50 |
qwen1.5-14b-chat | INT4-MIXED | 32 | 80.1 | 69.5 | 12.48 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 80.3 | 10.3 | 12.45 |
qwen3-reranker-0.6b | INT4-MIXED | 836 | 80.7 | -1 | 12.39 |
qwen3_8b_eagle3 | INT8-CW | 32 | 80.8 | 55.7 | 12.38 |
minicpm4-8b | INT8-CW | 32 | 81.5 | 75.2 | 12.27 |
phi-4-reasoning | INT4-MIXED | 32 | 82.5 | 70.4 | 12.12 |
stable-diffusion-v1-5 | INT8-CW | 1024 | 83 | 82.5 | 12.05 |
minicpm-1b-sft | INT8-CW | 1014 | 83.2 | 16.7 | 12.02 |
afm-4.5b | FP16 | 32 | 83.3 | 79.8 | 12.00 |
qwen3-reranker-0.6b | INT8-CW | 836 | 83.3 | -1 | 12.00 |
stable-diffusion-v1-5 | INT8-CW | 32 | 83.3 | 82.5 | 12.00 |
stable-diffusion-v1-5 | INT8-CW | 32 | 83.8 | 82.7 | 11.93 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 84.1 | 10.4 | 11.89 |
gemma-3-4b-it | FP16 | 32 | 84.2 | 73.4 | 11.88 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 84.5 | 11.2 | 11.83 |
sd-turbo | INT8-CW | 1024 | 84.5 | 84.2 | 11.83 |
stable-diffusion-v1-5 | FP16 | 1024 | 84.5 | 84.6 | 11.83 |
sd-turbo | INT8-CW | 32 | 84.6 | 84 | 11.82 |
stable-diffusion-v2-1 | INT8-CW | 32 | 84.6 | 84.1 | 11.82 |
phi-4 | INT4-MIXED | 32 | 84.8 | 71.4 | 11.79 |
glm-edge-4b-chat | FP16 | 32 | 85 | 78.3 | 11.76 |
stable-diffusion-v1-5 | FP16 | 32 | 85 | 84.3 | 11.76 |
stable-diffusion-v1-5 | INT8-CW | 1024 | 85.1 | 83 | 11.75 |
stable-diffusion-v2-1 | INT8-CW | 1024 | 85.2 | 84.1 | 11.74 |
stable-diffusion-v2-1 | INT8-CW | 32 | 85.3 | 84.5 | 11.72 |
stable-diffusion-v2-1 | INT8-CW | 1024 | 85.4 | 84.6 | 11.71 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 85.5 | 72.4 | 11.70 |
sd-turbo | INT8-CW | 1024 | 85.5 | 85.2 | 11.70 |
sd-turbo | INT8-CW | 32 | 85.5 | 84.9 | 11.70 |
sd-turbo | FP16 | 32 | 86 | 85.7 | 11.63 |
sd-turbo | FP16 | 1024 | 86.2 | 89.7 | 11.60 |
glm-4-9b-chat-hf | INT8-CW | 32 | 88 | 80.7 | 11.36 |
qwen3-reranker-0.6b | INT4-MIXED | 836 | 88 | -1 | 11.36 |
lcm-sdxl | FP16 | 32 | 88.7 | 88.3 | 11.27 |
lcm-sdxl | FP16 | 1024 | 88.8 | 88.4 | 11.26 |
qwen2.5-1.5b-instruct | FP4-NORMALIZED | 1024 | 89 | 27.9 | 11.24 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 89.1 | 13.1 | 11.22 |
deepseek-r1-distill-qwen-1.5b | FP4-NORMALIZED | 1024 | 89.3 | 27.9 | 11.20 |
gemma-7b-it | INT8-CW | 32 | 89.4 | 81 | 11.19 |
gemma-3-12b-it | INT4-MIXED | 32 | 89.5 | 63.4 | 11.17 |
starcoder2-15b | INT4-MIXED | 32 | 89.7 | 77.6 | 11.15 |
minicpm-1b-sft | INT4-MIXED | 1014 | 89.8 | 11.3 | 11.14 |
gemma-3-12b-it | INT4-MIXED | 32 | 91.6 | 66 | 10.92 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 91.6 | 30.1 | 10.92 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 91.9 | 30.1 | 10.88 |
qwen2.5-coder-1.5b-instruct | FP16 | 1024 | 92 | 30.1 | 10.87 |
nanollava | INT8-CW | 1752 | 92.2 | 8.5 | 10.85 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 92.8 | 76.4 | 10.78 |
gemma-2b-it | INT4-MIXED | 1024 | 93.8 | 15.7 | 10.66 |
minicpm-1b-sft | FP16 | 1014 | 95 | 28.9 | 10.53 |
gemma-2-9b-it | INT8-CW | 32 | 96.7 | 87.9 | 10.34 |
phi-4-reasoning | INT4-MIXED | 32 | 97.6 | 74.3 | 10.25 |
nanollava | INT4-MIXED | 1752 | 99.9 | 7.4 | 10.01 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec |
|---|---|---|---|---|---|
distil-large-v2 | INT4-MIXED | prompt0 | 186.5 | 6.4 | 156.25 |
distil-large-v2 | INT8-CW | prompt0 | 174.8 | 6.4 | 156.25 |
distil-large-v2 | INT8-CW | prompt1 | 218 | 6.4 | 156.25 |
distil-large-v2 | INT4-MIXED | prompt1 | 226.6 | 6.5 | 153.85 |
whisper-large-v3-turbo | INT4-MIXED | prompt0 | 243.8 | 9.1 | 109.89 |
whisper-large-v3-turbo | INT4-MIXED | prompt1 | 311.4 | 9.2 | 108.70 |
distil-large-v2 | FP16 | prompt0 | 217.1 | 9.4 | 106.38 |
whisper-large-v3-turbo | INT8-CW | prompt1 | 310.4 | 9.4 | 106.38 |
distil-large-v2 | FP16 | prompt1 | 267.3 | 9.5 | 105.26 |
whisper-large-v3-turbo | INT8-CW | prompt0 | 226.8 | 9.8 | 102.04 |
minicpm4-0.5b | INT4-MIXED | 32 | 36.1 | 10.5 | 95.24 |
minicpm4-0.5b | INT4-MIXED | 32 | 32.6 | 10.7 | 93.46 |
minicpm4-0.5b | INT4-MIXED | 1024 | 71.1 | 11.1 | 90.09 |
minicpm4-0.5b | INT4-MIXED | 1024 | 74.5 | 11.3 | 88.50 |
minicpm4-0.5b | INT8-CW | 32 | 33.2 | 11.5 | 86.96 |
minicpm4-0.5b | INT4-MIXED | 1024 | 67.3 | 11.6 | 86.21 |
minicpm4-0.5b | INT4-MIXED | 32 | 40 | 11.9 | 84.03 |
minicpm4-0.5b | INT8-CW | 1024 | 74.7 | 12.2 | 81.97 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 45.1 | 12.6 | 79.37 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 70.6 | 12.8 | 78.13 |
whisper-small | INT4-MIXED | prompt0 | 119.2 | 13.2 | 75.76 |
whisper-large-v3-turbo | FP16 | prompt0 | 277.1 | 13.3 | 75.19 |
whisper-small | INT4-MIXED | prompt0 | 118.8 | 13.4 | 74.63 |
whisper-large-v3-turbo | FP16 | prompt1 | 342.8 | 13.5 | 74.07 |
whisper-small | INT4-MIXED | prompt0 | 129.3 | 13.5 | 74.07 |
gemma-3-270m | INT4-MIXED | 32 | 40.8 | 13.7 | 72.99 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 49.1 | 13.7 | 72.99 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 71.8 | 13.8 | 72.46 |
whisper-small | INT4-MIXED | prompt1 | 186.5 | 13.8 | 72.46 |
whisper-small | INT4-MIXED | prompt1 | 178.3 | 14 | 71.43 |
gemma-3-270m | INT4-MIXED | 1024 | 57.8 | 14.1 | 70.92 |
whisper-small | INT4-MIXED | prompt1 | 185.2 | 14.1 | 70.92 |
whisper-small | INT8-CW | prompt1 | 198.8 | 14.6 | 68.49 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 48.9 | 14.7 | 68.03 |
gemma-3-270m | FP16 | 1024 | 53.5 | 14.8 | 67.57 |
gemma-3-270m | INT8-CW | 32 | 47.5 | 14.8 | 67.57 |
gemma-3-270m | FP16 | 32 | 39 | 14.9 | 67.11 |
whisper-small | INT8-CW | prompt0 | 127.1 | 14.9 | 67.11 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 76.3 | 15.1 | 66.23 |
gemma-3-270m | INT8-CW | 1024 | 59.1 | 15.2 | 65.79 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 139.1 | 16.8 | 59.52 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 63.7 | 17.1 | 58.48 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 41.8 | 17.2 | 58.14 |
whisper-small | FP16 | prompt0 | 128.2 | 17.2 | 58.14 |
whisper-small | FP16 | prompt1 | 174.1 | 17.2 | 58.14 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 145.4 | 18.1 | 55.25 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 32 | 64.7 | 18.1 | 55.25 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 68.9 | 18.3 | 54.64 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 295.4 | 18.3 | 54.64 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 1024 | 81.6 | 18.5 | 54.05 |
qwen3.5-0.8b | INT4-MIXED | 85 | 153.4 | 18.5 | 54.05 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 170.4 | 18.8 | 53.19 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 298.1 | 18.9 | 52.91 |
qwen3.5-0.8b | INT4-MIXED | 85 | 166.2 | 19.3 | 51.81 |
qwen3.5-0.8b | INT4-MIXED | 85 | 178.4 | 19.9 | 50.25 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 308.4 | 20 | 50.00 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 46.8 | 20.1 | 49.75 |
qwen3.5-0.8b | INT8-CW | 1024 | 320.6 | 20.6 | 48.54 |
qwen3.5-0.8b | INT8-CW | 85 | 193.3 | 20.6 | 48.54 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 140.6 | 20.9 | 47.85 |
nanollava | INT4-MIXED | 760 | 170 | 21 | 47.62 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 38.8 | 21.2 | 47.17 |
nanollava | INT8-CW | 760 | 169.2 | 21.2 | 47.17 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 120.7 | 21.5 | 46.51 |
llama-3.2-1b-instruct | INT8-CW | 32 | 39.1 | 21.5 | 46.51 |
qwen2.5-coder-0.5b-instruct | FP16 | 32 | 38.9 | 21.7 | 46.08 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 78.7 | 22.4 | 44.64 |
gemma-3-1b-it | INT4-MIXED | 1024 | 127.7 | 22.5 | 44.44 |
nanollava | INT8-CW | 1752 | 281 | 22.5 | 44.44 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 161.3 | 22.5 | 44.44 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 133 | 22.6 | 44.25 |
minicpm4-0.5b | FP16 | 32 | 50.8 | 22.9 | 43.67 |
nanollava | INT4-MIXED | 1752 | 281.8 | 22.9 | 43.67 |
qwen2.5-coder-0.5b-instruct | FP16 | 1024 | 88.1 | 23.2 | 43.10 |
minicpm4-0.5b | FP16 | 1024 | 92.2 | 23.8 | 42.02 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 83.4 | 24.5 | 40.82 |
gemma-3-1b-it | INT4-MIXED | 1024 | 142.3 | 24.6 | 40.65 |
glm-edge-1.5b-chat | INT4-MIXED | 32 | 90.8 | 24.7 | 40.49 |
gemma-3-1b-it | INT4-MIXED | 32 | 71.5 | 24.8 | 40.32 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 73.5 | 24.9 | 40.16 |
gemma-3-1b-it | INT4-MIXED | 1024 | 148.6 | 25 | 40.00 |
gemma-3-1b-it | INT4-MIXED | 32 | 60.2 | 25.2 | 39.68 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 199 | 25.3 | 39.53 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 69.4 | 25.3 | 39.53 |
gemma-3-1b-it | INT4-MIXED | 32 | 81.4 | 25.4 | 39.37 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 195.5 | 25.5 | 39.22 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 75 | 25.8 | 38.76 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 170 | 26.6 | 37.59 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 166.6 | 26.9 | 37.17 |
glm-edge-1.5b-chat | INT4-MIXED | 1024 | 268.1 | 27.1 | 36.90 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 74.6 | 27.5 | 36.36 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 32 | 76.3 | 27.9 | 35.84 |
glm-edge-1.5b-chat | INT8-CW | 32 | 94.3 | 28.6 | 34.97 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 1024 | 180.9 | 28.8 | 34.72 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 173.6 | 29.1 | 34.36 |
gemma-3-1b-it | INT8-CW | 1024 | 152 | 29.5 | 33.90 |
glm-edge-1.5b-chat | INT8-CW | 1024 | 244.1 | 30.1 | 33.22 |
gemma-3-1b-it | INT8-CW | 32 | 93.1 | 30.2 | 33.11 |
nanollava | FP16 | 760 | 202.5 | 32.7 | 30.58 |
nanollava | FP16 | 1752 | 324 | 33.8 | 29.59 |
phi-2 | INT4-MIXED | 32 | 136.9 | 34.8 | 28.74 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 96 | 35.6 | 28.09 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 98.3 | 35.7 | 28.01 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 126.5 | 36.5 | 27.40 |
phi-2 | INT4-MIXED | 1024 | 393.7 | 36.6 | 27.32 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 123.3 | 36.7 | 27.25 |
afm-4.5b | INT4-MIXED | 32 | 85.2 | 36.9 | 27.10 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 326.2 | 37.2 | 26.88 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 295.6 | 37.3 | 26.81 |
phi-2 | INT4-MIXED | 32 | 150.6 | 37.8 | 26.46 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 96.4 | 37.9 | 26.39 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 123.5 | 37.9 | 26.39 |
qwen3.5-0.8b | FP16 | 85 | 230.3 | 38 | 26.32 |
afm-4.5b | INT4-MIXED | 1024 | 460.5 | 38.3 | 26.11 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 387.3 | 38.4 | 26.04 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 112.8 | 38.5 | 25.97 |
qwen3.5-0.8b | FP16 | 1024 | 397.7 | 38.6 | 25.91 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 106.8 | 38.7 | 25.84 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 114.7 | 38.7 | 25.84 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 384.3 | 38.8 | 25.77 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 497.2 | 39.6 | 25.25 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 140 | 39.7 | 25.19 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 417.6 | 39.9 | 25.06 |
phi-2 | INT4-MIXED | 1024 | 486.7 | 40.3 | 24.81 |
chatglm3-6b | INT4-MIXED | 32 | 73.5 | 40.6 | 24.63 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 112.9 | 41.6 | 24.04 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 500.9 | 41.7 | 23.98 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 505.8 | 41.8 | 23.92 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 66.6 | 41.9 | 23.87 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 511.7 | 41.9 | 23.87 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 462.5 | 41.9 | 23.87 |
chatglm3-6b | INT4-MIXED | 1024 | 484.7 | 42.1 | 23.75 |
gemma-3-1b-it | FP16 | 1024 | 178.3 | 42.4 | 23.58 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 200.9 | 42.5 | 23.53 |
gemma-3-1b-it | FP16 | 32 | 88.6 | 42.5 | 23.53 |
chatglm3-6b | INT4-MIXED | 32 | 68.5 | 43 | 23.26 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 655 | 44.3 | 22.57 |
chatglm3-6b | INT4-MIXED | 1024 | 678.8 | 44.5 | 22.47 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 87 | 44.7 | 22.37 |
phi-2 | INT8-CW | 32 | 117.5 | 44.7 | 22.37 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 113.8 | 44.9 | 22.27 |
llama-3.2-1b-instruct | FP16 | 32 | 62.9 | 45.1 | 22.17 |
qwen3-4b | INT4-MIXED | 32 | 130.7 | 45.1 | 22.17 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 132.6 | 45.1 | 22.17 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 122.1 | 45.3 | 22.08 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 147.1 | 45.4 | 22.03 |
stablelm-3b-4e1t | INT8-CW | 32 | 120.9 | 45.5 | 21.98 |
phi-4-mini-instruct | INT4-MIXED | 32 | 139.3 | 45.7 | 21.88 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 130 | 45.8 | 21.83 |
llama-3.2-1b-instruct | FP16 | 1024 | 202.3 | 45.9 | 21.79 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 335.3 | 46.6 | 21.46 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 430.5 | 47.2 | 21.19 |
falcon-7b-instruct | INT4-MIXED | 32 | 87.8 | 47.8 | 20.92 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 423.4 | 47.8 | 20.92 |
biomistral-7b-slerp | INT4-MIXED | 7 | 65.7 | 47.9 | 20.88 |
phi-2 | INT8-CW | 1024 | 410.1 | 48.1 | 20.79 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 643.9 | 48.2 | 20.75 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 156.7 | 48.2 | 20.75 |
qwen3-4b | INT4-MIXED | 1024 | 474.9 | 48.5 | 20.62 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 414.6 | 48.5 | 20.62 |
internvl2-4b | INT4-MIXED | 297 | 308.3 | 48.7 | 20.53 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 667.8 | 48.8 | 20.49 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 413.9 | 48.8 | 20.49 |
stablelm-3b-4e1t | INT8-CW | 1024 | 422.7 | 49 | 20.41 |
falcon-7b-instruct | INT4-MIXED | 1024 | 596.1 | 49.4 | 20.24 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 533.3 | 49.7 | 20.12 |
llama-3.2-3b-instruct | INT8-CW | 32 | 85.7 | 50.5 | 19.80 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 695.8 | 50.6 | 19.76 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 114.5 | 50.7 | 19.72 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 128.4 | 50.8 | 19.69 |
biomistral-7b-slerp | INT4-MIXED | 7 | 74.8 | 51.1 | 19.57 |
phi-4-mini-instruct | INT4-MIXED | 32 | 161.3 | 51.1 | 19.57 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 110.4 | 51.3 | 19.49 |
afm-4.5b | INT8-CW | 32 | 76.3 | 51.5 | 19.42 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 412.6 | 52.1 | 19.19 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 355.7 | 52.3 | 19.12 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 84.5 | 52.5 | 19.05 |
internvl2-4b | INT4-MIXED | 1027 | 814.3 | 52.7 | 18.98 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 863.8 | 52.7 | 18.98 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 68.3 | 52.8 | 18.94 |
afm-4.5b | INT8-CW | 1024 | 344.2 | 53 | 18.87 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 80.4 | 53.8 | 18.59 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 565.7 | 54 | 18.52 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 162.2 | 54.4 | 18.38 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 315.5 | 54.4 | 18.38 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 508.3 | 54.5 | 18.35 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 502.7 | 54.6 | 18.32 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 580.6 | 54.7 | 18.28 |
phi-4-mini-instruct | INT4-MIXED | 32 | 130.5 | 55 | 18.18 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 724.8 | 55.3 | 18.08 |
whisper-large-v3 | INT4-MIXED | prompt1 | 550.9 | 55.3 | 18.08 |
whisper-large-v3 | INT4-MIXED | prompt0 | 502.8 | 55.7 | 17.95 |
whisper-large-v3 | INT8-CW | prompt1 | 526.1 | 55.7 | 17.95 |
gemma-4-e2b-it | INT4-MIXED | 274 | 861.1 | 55.8 | 17.92 |
falcon-7b-instruct | INT4-MIXED | 32 | 91.6 | 56 | 17.86 |
internvl2-4b | INT4-MIXED | 297 | 275.5 | 56.2 | 17.79 |
qwen2.5-1.5b-instruct | FP16 | 32 | 89.8 | 56.9 | 17.57 |
qwen2.5-coder-1.5b-instruct | FP16 | 32 | 91.6 | 56.9 | 17.57 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 431.9 | 57.4 | 17.42 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 442.9 | 57.4 | 17.42 |
falcon-7b-instruct | INT4-MIXED | 1024 | 795.7 | 57.5 | 17.39 |
qwen2.5-coder-1.5b-instruct | FP16 | 1024 | 254.6 | 57.6 | 17.36 |
glm-edge-4b-chat | INT4-MIXED | 32 | 206 | 57.7 | 17.33 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 249.4 | 57.7 | 17.33 |
whisper-large-v3 | INT8-CW | prompt0 | 473.8 | 57.7 | 17.33 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 105.6 | 57.9 | 17.27 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 102.7 | 58 | 17.24 |
phi-3.5-mini-instruct | INT8-CW | 32 | 96.8 | 58.1 | 17.21 |
glm-edge-1.5b-chat | FP16 | 32 | 131.8 | 58.3 | 17.15 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 118.9 | 58.3 | 17.15 |
glm-edge-1.5b-chat | FP16 | 1024 | 315.8 | 59.2 | 16.89 |
qwen3-4b | INT4-MIXED | 32 | 124.1 | 59.4 | 16.84 |
glm-edge-4b-chat | INT4-MIXED | 1024 | 701.1 | 59.7 | 16.75 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 75.3 | 60 | 16.67 |
gemma-4-e2b-it | INT4-MIXED | 274 | 875.5 | 60.1 | 16.64 |
internvl2-4b | INT4-MIXED | 1027 | 713.8 | 61 | 16.39 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 106.4 | 61.1 | 16.37 |
gemma-3-4b-it | INT4-MIXED | 32 | 220.3 | 61.3 | 16.31 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 108.5 | 61.3 | 16.31 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 537.7 | 61.4 | 16.29 |
phi-4-mini-instruct | INT8-CW | 32 | 146.1 | 61.4 | 16.29 |
phi-4-mini-reasoning | INT8-CW | 32 | 121.7 | 61.4 | 16.29 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 541.5 | 61.5 | 16.26 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 543.9 | 61.7 | 16.21 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 543 | 61.8 | 16.18 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 86.2 | 62.6 | 15.97 |
gemma-4-e2b-it | INT4-MIXED | 1024 | 1052.7 | 62.7 | 15.95 |
gemma-3-4b-it | INT4-MIXED | 32 | 213.2 | 62.9 | 15.90 |
internvl2-4b | INT8-CW | 297 | 299.9 | 63.1 | 15.85 |
qwen3-4b | INT4-MIXED | 1024 | 480.7 | 63.1 | 15.85 |
gemma-3-4b-it | INT4-MIXED | 1024 | 513 | 63.3 | 15.80 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 715.1 | 63.4 | 15.77 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 733.2 | 63.8 | 15.67 |
phi-4-mini-instruct | INT8-CW | 1024 | 464.7 | 63.8 | 15.67 |
phi-4-mini-reasoning | INT8-CW | 1024 | 465.8 | 63.8 | 15.67 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 7718.6 | 64 | 15.63 |
qwen3-4b | INT8-CW | 32 | 115.3 | 64.1 | 15.60 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 731.2 | 64.2 | 15.58 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 7877.4 | 64.2 | 15.58 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 93.3 | 64.3 | 15.55 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 89.6 | 64.4 | 15.53 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 130.3 | 64.5 | 15.50 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 110.3 | 64.6 | 15.48 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 600 | 64.7 | 15.46 |
whisper-large-v3 | FP16 | prompt0 | 510.4 | 64.8 | 15.43 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 113.8 | 65 | 15.38 |
whisper-large-v3 | FP16 | prompt1 | 566.9 | 65 | 15.38 |
gemma-4-e2b-it | INT4-MIXED | 1024 | 1082.6 | 65.1 | 15.36 |
gemma-4-e2b-it | INT8-CW | 274 | 844.5 | 65.1 | 15.36 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 645.4 | 65.4 | 15.29 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 654.6 | 65.4 | 15.29 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 99.8 | 65.5 | 15.27 |
gemma-3-4b-it | INT4-MIXED | 1024 | 617.2 | 66.2 | 15.11 |
gpt-oss-20b | INT4-MIXED | 32 | 250.6 | 66.2 | 15.11 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 118.3 | 66.3 | 15.08 |
phi-3.5-vision-instruct | INT8-CW | 802 | 602.3 | 66.3 | 15.08 |
llama-3-8b-instruct | INT4-MIXED | 32 | 110.7 | 66.5 | 15.04 |
qwen3-4b | INT8-CW | 1024 | 502.4 | 66.9 | 14.95 |
gpt-oss-20b | INT4-MIXED | 1024 | 3204.9 | 67.2 | 14.88 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 1008.6 | 67.4 | 14.84 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 759.5 | 67.6 | 14.79 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 999.4 | 67.6 | 14.79 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 775.8 | 67.6 | 14.79 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 8299.5 | 67.6 | 14.79 |
internvl2-4b | INT8-CW | 1027 | 732.4 | 67.7 | 14.77 |
minicpm-v-2_6 | INT4-MIXED | 228 | 726.5 | 67.7 | 14.77 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 1022.7 | 67.7 | 14.77 |
minicpm-o-2_6 | INT4-MIXED | 238 | 744 | 68 | 14.71 |
chatglm3-6b | INT8-CW | 32 | 90.8 | 68.1 | 14.68 |
minicpm4-8b | INT4-MIXED | 32 | 120.3 | 68.1 | 14.68 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 119.1 | 68.1 | 14.68 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 8220.5 | 68.1 | 14.68 |
qwen2-7b-instruct | INT4-MIXED | 32 | 114.5 | 68.2 | 14.66 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 109.2 | 68.3 | 14.64 |
minicpm3-4b | INT4-MIXED | 32 | 309.7 | 68.5 | 14.60 |
glm-edge-4b-chat | INT8-CW | 32 | 192.4 | 68.9 | 14.51 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 598.2 | 68.9 | 14.51 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 685.6 | 68.9 | 14.51 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 714.2 | 69.4 | 14.41 |
qwen3-8b | INT4-MIXED | 32 | 135.8 | 69.6 | 14.37 |
chatglm3-6b | INT8-CW | 1024 | 481.5 | 69.7 | 14.35 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 729 | 69.7 | 14.35 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 897.8 | 69.8 | 14.33 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1481.7 | 70.2 | 14.25 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 894 | 70.2 | 14.25 |
minicpm4-8b | INT4-MIXED | 1024 | 798.7 | 70.4 | 14.20 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1638.5 | 70.4 | 14.20 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 889.5 | 70.4 | 14.20 |
llama-3-8b-instruct | INT4-MIXED | 32 | 121.2 | 70.5 | 14.18 |
gemma-3-4b-it | INT4-MIXED | 32 | 208.8 | 71 | 14.08 |
gemma-4-e2b-it | INT8-CW | 1024 | 1063.7 | 71.7 | 13.95 |
glm-edge-4b-chat | INT8-CW | 1024 | 613.1 | 71.9 | 13.91 |
minicpm-v-2_6 | INT4-MIXED | 228 | 811.4 | 72 | 13.89 |
minicpm-o-2_6 | INT4-MIXED | 238 | 806.5 | 72.2 | 13.85 |
qwen3-8b | INT4-MIXED | 1024 | 775.9 | 72.3 | 13.83 |
minicpm4-8b | INT4-MIXED | 32 | 136.6 | 72.6 | 13.77 |
qwen3-8b | INT4-MIXED | 32 | 139.6 | 73 | 13.70 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 981.8 | 73.1 | 13.68 |
minicpm-v-4_5 | INT4-MIXED | 217 | 830.3 | 73.9 | 13.53 |
gemma-3-4b-it | INT8-CW | 32 | 205.4 | 74.4 | 13.44 |
minicpm4-8b | INT4-MIXED | 1024 | 1088.7 | 74.5 | 13.42 |
minicpm3-4b | INT4-MIXED | 32 | 355.7 | 75.5 | 13.25 |
gemma-3-4b-it | INT4-MIXED | 1024 | 523.7 | 76.5 | 13.07 |
minicpm3-4b | INT4-MIXED | 1024 | 830.4 | 76.9 | 13.00 |
minicpm-v-4_5 | INT4-MIXED | 217 | 878.1 | 77.5 | 12.90 |
minicpm3-4b | INT4-MIXED | 32 | 295 | 77.5 | 12.90 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 130 | 77.6 | 12.89 |
qwen3.5-9b | INT4-MIXED | 83 | 458.9 | 78.3 | 12.77 |
qwen3.5-9b | INT4-MIXED | 1024 | 1267.7 | 78.8 | 12.69 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 7956 | 78.9 | 12.67 |
gemma-3-4b-it | INT8-CW | 1024 | 536.9 | 79 | 12.66 |
gemma-7b-it | INT4-MIXED | 32 | 118 | 79 | 12.66 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 7691.1 | 79.3 | 12.61 |
minicpm3-4b | INT8-CW | 32 | 328.7 | 80.2 | 12.47 |
gpt-oss-20b | INT4-MIXED | 32 | 289.1 | 80.3 | 12.45 |
biomistral-7b-slerp | INT8-CW | 7 | 107.1 | 80.5 | 12.42 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 890.3 | 80.5 | 12.42 |
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 103.2 | 81.2 | 12.32 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 101.8 | 82.1 | 12.18 |
qwen3.5-9b | INT4-MIXED | 83 | 497.7 | 82.1 | 12.18 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 140.3 | 82.5 | 12.12 |
qwen3.5-9b | INT4-MIXED | 1024 | 1532 | 82.6 | 12.11 |
phi-4-multimodal-instruct | INT8-CW | 578 | 627.9 | 82.8 | 12.08 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 314.1 | 82.8 | 12.08 |
deepseek-r1-distill-qwen-7b | INT8-CW | 1024 | 509.4 | 82.9 | 12.06 |
qwen3-vl-4b-instruct | INT8-CW | 4907 | 7760.6 | 83.1 | 12.03 |
qwen3-vl-4b-instruct | INT8-CW | 4937 | 7793.8 | 83.1 | 12.03 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 129.7 | 83.2 | 12.02 |
gemma-7b-it | INT4-MIXED | 1024 | 877 | 83.3 | 12.00 |
phi-4-multimodal-instruct | INT8-CW | 786 | 739.4 | 83.4 | 11.99 |
qwen2-7b-instruct | INT4-MIXED | 32 | 98.6 | 83.5 | 11.98 |
minicpm3-4b | INT4-MIXED | 1024 | 950.8 | 83.6 | 11.96 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 98.2 | 83.8 | 11.93 |
gemma-7b-it | INT4-MIXED | 32 | 128.3 | 83.9 | 11.92 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 104.4 | 83.9 | 11.92 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 102 | 84 | 11.90 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 1565 | 84.1 | 11.89 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 1759.2 | 84.1 | 11.89 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 1176.7 | 84.2 | 11.88 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 838.1 | 84.5 | 11.83 |
gpt-oss-20b | INT4-MIXED | 1024 | 3427.6 | 85.2 | 11.74 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 656.2 | 85.3 | 11.72 |
minicpm3-4b | INT4-MIXED | 1024 | 878 | 85.5 | 11.70 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 655.1 | 85.5 | 11.70 |
deepseek-r1-distill-llama-8b | INT8-CW | 32 | 112.4 | 85.7 | 11.67 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 947.9 | 85.9 | 11.64 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 4195.8 | 86.1 | 11.61 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 750.6 | 86.2 | 11.60 |
minicpm-v-2_6 | INT4-MIXED | 228 | 705 | 86.4 | 11.57 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 700.9 | 86.4 | 11.57 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 360.6 | 86.5 | 11.56 |
fara-7b | INT4-MIXED | 32 | 411.5 | 86.6 | 11.55 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 718 | 86.6 | 11.55 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 86 | 86.8 | 11.52 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 717.3 | 86.9 | 11.51 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1827.9 | 87 | 11.49 |
gemma-2-9b-it | INT4-MIXED | 32 | 178.7 | 87.4 | 11.44 |
minicpm3-4b | INT8-CW | 1024 | 895.7 | 87.4 | 11.44 |
deepseek-r1-distill-llama-8b | INT8-CW | 1024 | 567.3 | 88 | 11.36 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 2034.6 | 88 | 11.36 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 905.4 | 88.1 | 11.35 |
gemma-7b-it | INT4-MIXED | 1024 | 1156.8 | 88.2 | 11.34 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 1088.4 | 88.5 | 11.30 |
fara-7b | INT4-MIXED | 1024 | 1293.2 | 88.9 | 11.25 |
llama-3-8b-instruct | INT4-MIXED | 32 | 113 | 89.4 | 11.19 |
llama-3-8b-instruct | INT4-MIXED | 32 | 109.8 | 89.6 | 11.16 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 115.1 | 89.7 | 11.15 |
phi-2 | FP16 | 32 | 135.2 | 90 | 11.11 |
stable-zephyr-3b-dpo | FP16 | 32 | 123.7 | 91.5 | 10.93 |
stablelm-3b-4e1t | FP16 | 32 | 147.6 | 92 | 10.87 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 725.3 | 92.1 | 10.86 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 723.5 | 92.2 | 10.85 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 714.3 | 92.2 | 10.85 |
gemma-2-9b-it | INT4-MIXED | 1024 | 1006.2 | 92.4 | 10.82 |
qwen3-8b | INT4-MIXED | 32 | 135.3 | 92.5 | 10.81 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 92.5 | 92.7 | 10.79 |
minicpm4-8b | INT4-MIXED | 32 | 116.4 | 93.3 | 10.72 |
phi-2 | FP16 | 1024 | 616.7 | 94.5 | 10.58 |
minicpm4-8b | INT4-MIXED | 1024 | 798.7 | 95.3 | 10.49 |
qwen3-8b | INT4-MIXED | 1024 | 774.1 | 95.8 | 10.44 |
stable-zephyr-3b-dpo | FP16 | 1024 | 615.1 | 96.8 | 10.33 |
stablelm-3b-4e1t | FP16 | 1024 | 598.5 | 96.8 | 10.33 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 146 | 97.3 | 10.28 |
ltx-video | INT8-CW | 11 | 98.3 | 97.9 | 10.21 |
llama-2-7b-chat-hf | INT8-CW | 32 | 139.2 | 98.4 | 10.16 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec |
|---|---|---|---|---|---|
distil-large-v2 | INT4-MIXED | prompt0 | 166.7 | 5.9 | 169.49 |
distil-large-v2 | INT4-MIXED | prompt1 | 211.5 | 5.9 | 169.49 |
distil-large-v2 | INT8-CW | prompt0 | 163.9 | 6 | 166.67 |
distil-large-v2 | INT8-CW | prompt1 | 206.9 | 6 | 166.67 |
whisper-large-v3-turbo | INT8-CW | prompt1 | 225.5 | 6.3 | 158.73 |
whisper-large-v3-turbo | INT8-CW | prompt0 | 184.1 | 6.9 | 144.93 |
whisper-large-v3-turbo | INT4-MIXED | prompt0 | 183 | 7 | 142.86 |
whisper-large-v3-turbo | INT4-MIXED | prompt1 | 232.7 | 7 | 142.86 |
minicpm4-0.5b | INT4-MIXED | 32 | 28.3 | 7.3 | 136.99 |
minicpm4-0.5b | INT4-MIXED | 1024 | 44.1 | 7.4 | 135.14 |
minicpm4-0.5b | INT4-MIXED | 32 | 28.4 | 7.4 | 135.14 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 24.9 | 7.4 | 135.14 |
minicpm4-0.5b | INT4-MIXED | 1024 | 45.6 | 7.5 | 133.33 |
minicpm4-0.5b | INT8-CW | 32 | 30.5 | 7.6 | 131.58 |
minicpm4-0.5b | INT4-MIXED | 1024 | 53.1 | 7.7 | 129.87 |
minicpm4-0.5b | INT4-MIXED | 32 | 29.9 | 7.7 | 129.87 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 28.4 | 7.7 | 129.87 |
gemma-3-270m | INT4-MIXED | 1024 | 33.7 | 7.8 | 128.21 |
minicpm4-0.5b | INT8-CW | 1024 | 52.4 | 7.8 | 128.21 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 46.2 | 7.8 | 128.21 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 55 | 8 | 125.00 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 29.9 | 8 | 125.00 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 26.4 | 8 | 125.00 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 87.4 | 8 | 125.00 |
gemma-3-270m | INT4-MIXED | 32 | 29.2 | 8.1 | 123.46 |
distil-large-v2 | FP16 | prompt0 | 164.7 | 8.2 | 121.95 |
distil-large-v2 | FP16 | prompt1 | 210 | 8.2 | 121.95 |
gemma-3-270m | FP16 | 32 | 25.4 | 8.2 | 121.95 |
gemma-3-270m | FP16 | 1024 | 32 | 8.3 | 120.48 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 28.7 | 8.5 | 117.65 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 1024 | 53.2 | 8.6 | 116.28 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 45.7 | 8.7 | 114.94 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 112.5 | 8.7 | 114.94 |
whisper-large-v3-turbo | FP16 | prompt0 | 195.6 | 8.7 | 114.94 |
gemma-3-270m | INT8-CW | 1024 | 34.6 | 8.8 | 113.64 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 32 | 30.8 | 8.8 | 113.64 |
whisper-large-v3-turbo | FP16 | prompt1 | 251.5 | 8.9 | 112.36 |
gemma-3-270m | INT8-CW | 32 | 29.1 | 9.2 | 108.70 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 21.5 | 9.5 | 105.26 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 22 | 9.6 | 104.17 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 84 | 10 | 100.00 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 98.2 | 10.2 | 98.04 |
whisper-small | INT4-MIXED | prompt0 | 94.8 | 10.3 | 97.09 |
whisper-small | INT4-MIXED | prompt1 | 137.9 | 10.5 | 95.24 |
minicpm4-0.5b | FP16 | 32 | 24.8 | 10.9 | 91.74 |
whisper-small | INT4-MIXED | prompt0 | 99.6 | 10.9 | 91.74 |
whisper-small | INT4-MIXED | prompt1 | 145.1 | 10.9 | 91.74 |
whisper-small | INT8-CW | prompt0 | 99.5 | 11 | 90.91 |
whisper-small | INT8-CW | prompt1 | 143.3 | 11 | 90.91 |
minicpm4-0.5b | FP16 | 1024 | 60.3 | 11.3 | 88.50 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 34.1 | 11.6 | 86.21 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 32.8 | 11.6 | 86.21 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 33.6 | 11.8 | 84.75 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 33.7 | 11.8 | 84.75 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 33.8 | 11.9 | 84.03 |
gemma-3-1b-it | INT4-MIXED | 1024 | 85.5 | 12 | 83.33 |
gemma-3-1b-it | INT4-MIXED | 1024 | 87.6 | 12 | 83.33 |
nanollava | INT4-MIXED | 760 | 98.8 | 12.1 | 82.64 |
qwen2.5-coder-0.5b-instruct | FP16 | 32 | 27 | 12.1 | 82.64 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 36 | 12.1 | 82.64 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 27.6 | 12.1 | 82.64 |
whisper-small | FP16 | prompt0 | 96.9 | 12.1 | 82.64 |
whisper-small | FP16 | prompt1 | 139.5 | 12.1 | 82.64 |
nanollava | INT8-CW | 760 | 94.7 | 12.2 | 81.97 |
nanollava | INT4-MIXED | 1752 | 190.2 | 12.3 | 81.30 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 135.5 | 12.3 | 81.30 |
qwen2.5-coder-0.5b-instruct | FP16 | 1024 | 54 | 12.3 | 81.30 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 137.1 | 12.4 | 80.65 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 135.8 | 12.4 | 80.65 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 125.8 | 12.4 | 80.65 |
nanollava | INT8-CW | 1752 | 184.1 | 12.5 | 80.00 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 135.7 | 12.6 | 79.37 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 38.1 | 12.7 | 78.74 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 100.8 | 12.7 | 78.74 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 143.5 | 12.8 | 78.13 |
gemma-3-1b-it | INT4-MIXED | 1024 | 93.1 | 12.9 | 77.52 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 36.6 | 12.9 | 77.52 |
nanollava | FP16 | 760 | 104.7 | 13 | 76.92 |
gemma-3-1b-it | INT4-MIXED | 32 | 38.6 | 13.1 | 76.34 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 144 | 13.3 | 75.19 |
whisper-small | INT4-MIXED | prompt1 | 147.6 | 13.3 | 75.19 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 142.8 | 13.5 | 74.07 |
gemma-3-1b-it | INT4-MIXED | 32 | 37 | 13.6 | 73.53 |
whisper-small | INT4-MIXED | prompt0 | 102.7 | 13.6 | 73.53 |
gemma-3-1b-it | INT4-MIXED | 32 | 39.5 | 14.1 | 70.92 |
gemma-3-1b-it | INT8-CW | 1024 | 99.1 | 14.3 | 69.93 |
nanollava | FP16 | 1752 | 193 | 14.3 | 69.93 |
llama-3.2-1b-instruct | INT8-CW | 32 | 23.2 | 14.7 | 68.03 |
gemma-3-1b-it | INT8-CW | 32 | 43.5 | 14.9 | 67.11 |
glm-edge-1.5b-chat | INT4-MIXED | 1024 | 205 | 15 | 66.67 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 37.5 | 15.1 | 66.23 |
glm-edge-1.5b-chat | INT4-MIXED | 32 | 56.1 | 15.1 | 66.23 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 101.5 | 15.5 | 64.52 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 155.6 | 15.9 | 62.89 |
minicpm-1b-sft | INT4-MIXED | 31 | 59.6 | 16.5 | 60.61 |
minicpm-1b-sft | INT4-MIXED | 1014 | 137.9 | 16.8 | 59.52 |
minicpm-1b-sft | INT4-MIXED | 31 | 60.3 | 17 | 58.82 |
minicpm-1b-sft | INT4-MIXED | 1014 | 161.3 | 17.1 | 58.48 |
qwen3.5-0.8b | INT4-MIXED | 85 | 147.3 | 17.3 | 57.80 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 281 | 17.5 | 57.14 |
gemma-2b-it | INT4-MIXED | 32 | 26.9 | 17.8 | 56.18 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 38.3 | 17.9 | 55.87 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 294.4 | 18 | 55.56 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 38 | 18.1 | 55.25 |
minicpm-1b-sft | INT4-MIXED | 1014 | 134 | 18.2 | 54.95 |
minicpm-1b-sft | INT4-MIXED | 31 | 60.6 | 18.2 | 54.95 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 38.3 | 18.2 | 54.95 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 32 | 37.9 | 18.2 | 54.95 |
qwen3.5-0.8b | INT4-MIXED | 85 | 158 | 18.2 | 54.95 |
glm-edge-1.5b-chat | INT8-CW | 32 | 56.3 | 18.5 | 54.05 |
qwen3.5-0.8b | INT8-CW | 1024 | 296.7 | 18.5 | 54.05 |
gemma-2b-it | INT4-MIXED | 1024 | 171 | 18.6 | 53.76 |
phi-2 | INT4-MIXED | 32 | 60.2 | 18.6 | 53.76 |
gemma-2b-it | INT4-MIXED | 32 | 32.2 | 18.7 | 53.48 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 59.5 | 18.7 | 53.48 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 59.6 | 18.7 | 53.48 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 132.6 | 19 | 52.63 |
minicpm-1b-sft | INT8-CW | 31 | 66.5 | 19.1 | 52.36 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 139.8 | 19.1 | 52.36 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 145.4 | 19.1 | 52.36 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 1024 | 141.9 | 19.1 | 52.36 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 292.5 | 19.1 | 52.36 |
qwen3.5-0.8b | INT8-CW | 85 | 158.7 | 19.3 | 51.81 |
gemma-2b-it | INT4-MIXED | 1024 | 203.6 | 19.4 | 51.55 |
glm-edge-1.5b-chat | INT8-CW | 1024 | 211.3 | 19.7 | 50.76 |
gemma-2-2b | INT4-MIXED | 33 | 39 | 20 | 50.00 |
phi-2 | INT4-MIXED | 32 | 63.3 | 20.2 | 49.50 |
minicpm-1b-sft | INT8-CW | 1014 | 150.6 | 20.4 | 49.02 |
qwen3.5-0.8b | INT4-MIXED | 85 | 150.1 | 20.5 | 48.78 |
gemma-2-2b | INT4-MIXED | 33 | 44.8 | 20.9 | 47.85 |
phi-2 | INT4-MIXED | 1024 | 309.7 | 20.9 | 47.85 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 303.8 | 21 | 47.62 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 304.5 | 21 | 47.62 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 59.4 | 21.2 | 47.17 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 53 | 21.2 | 47.17 |
gemma-2-2b | INT4-MIXED | 1025 | 209.2 | 21.3 | 46.95 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 74.1 | 21.3 | 46.95 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 67.9 | 21.3 | 46.95 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 185.2 | 21.4 | 46.73 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 43.9 | 21.7 | 46.08 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 75.8 | 22.1 | 45.25 |
gemma-2-2b | INT4-MIXED | 1025 | 227.1 | 22.2 | 45.05 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 193.9 | 22.2 | 45.05 |
phi-2 | INT4-MIXED | 1024 | 383.8 | 22.5 | 44.44 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 234.8 | 22.5 | 44.44 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 43.9 | 22.6 | 44.25 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 58.4 | 22.6 | 44.25 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 240.1 | 22.6 | 44.25 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 65.9 | 22.8 | 43.86 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 44.2 | 22.9 | 43.67 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 243.1 | 23.3 | 42.92 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 336.7 | 23.5 | 42.55 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 49.5 | 23.7 | 42.19 |
gemma-3-1b-it | FP16 | 1024 | 108.2 | 23.8 | 42.02 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 49 | 23.8 | 42.02 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 50 | 23.8 | 42.02 |
qwen3.5-0.8b | FP16 | 1024 | 294 | 23.8 | 42.02 |
qwen3.5-0.8b | FP16 | 85 | 122.2 | 23.8 | 42.02 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 427.9 | 24 | 41.67 |
gemma-3-1b-it | FP16 | 32 | 40.5 | 24.1 | 41.49 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 254 | 24.3 | 41.15 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 277.5 | 24.6 | 40.65 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 49.7 | 24.7 | 40.49 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 50.2 | 24.8 | 40.32 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 332.3 | 25 | 40.00 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 80.6 | 25.1 | 39.84 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 50.7 | 25.2 | 39.68 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 180.9 | 25.7 | 38.91 |
tiny-llama-1.1b-chat | FP16 | 32 | 30.3 | 26.2 | 38.17 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 373.5 | 26.5 | 37.74 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 375.3 | 26.6 | 37.59 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 372.5 | 26.7 | 37.45 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 50.1 | 26.8 | 37.31 |
qwen3-4b | INT4-MIXED | 32 | 58.6 | 26.8 | 37.31 |
tiny-llama-1.1b-chat | FP16 | 1024 | 113.8 | 26.9 | 37.17 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 53.5 | 27.1 | 36.90 |
internvl2-4b | INT4-MIXED | 297 | 221.1 | 27.3 | 36.63 |
internvl2-4b | INT4-MIXED | 297 | 215.9 | 27.5 | 36.36 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 63.7 | 27.5 | 36.36 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 391.6 | 27.6 | 36.23 |
qwen3-4b | INT4-MIXED | 32 | 60 | 27.6 | 36.23 |
phi-4-mini-instruct | INT4-MIXED | 32 | 64.5 | 27.7 | 36.10 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 391.1 | 28 | 35.71 |
gemma-4-e2b-it | INT4-MIXED | 274 | 426.9 | 28.1 | 35.59 |
afm-4.5b | INT4-MIXED | 32 | 51.9 | 28.2 | 35.46 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 488.9 | 28.2 | 35.46 |
gpt-oss-20b | INT4-MIXED | 32 | 207.2 | 28.3 | 35.34 |
phi-4-mini-instruct | INT4-MIXED | 32 | 64.5 | 28.4 | 35.21 |
gemma-2b-it | INT8-CW | 32 | 36.2 | 28.5 | 35.09 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 65.6 | 28.5 | 35.09 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 66.4 | 28.7 | 34.84 |
gemma-3-4b-it | INT4-MIXED | 32 | 82.9 | 28.9 | 34.60 |
gemma-4-e2b-it | INT4-MIXED | 274 | 457.9 | 28.9 | 34.60 |
gemma-2b-it | INT8-CW | 1024 | 211.4 | 29.4 | 34.01 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 489 | 29.5 | 33.90 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 375.3 | 29.5 | 33.90 |
qwen3-4b | INT4-MIXED | 1024 | 347 | 29.5 | 33.90 |
afm-4.5b | INT4-MIXED | 1024 | 391.1 | 29.7 | 33.67 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 524.3 | 29.7 | 33.67 |
gemma-3-4b-it | INT4-MIXED | 32 | 85.3 | 29.8 | 33.56 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 548.9 | 29.8 | 33.56 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 375.6 | 29.8 | 33.56 |
llama-3.2-1b-instruct | FP16 | 32 | 35 | 29.9 | 33.44 |
whisper-large-v3 | INT4-MIXED | prompt0 | 285.2 | 29.9 | 33.44 |
gemma-3-4b-it | INT4-MIXED | 32 | 86 | 30.1 | 33.22 |
qwen3-4b | INT4-MIXED | 1024 | 384 | 30.1 | 33.22 |
internvl2-4b | INT4-MIXED | 1027 | 588.9 | 30.2 | 33.11 |
glm-edge-4b-chat | INT4-MIXED | 32 | 95 | 30.3 | 33.00 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 387.9 | 30.3 | 33.00 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 607.6 | 30.4 | 32.89 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 391.1 | 30.4 | 32.89 |
gemma-3-4b-it | INT4-MIXED | 1024 | 370.1 | 30.5 | 32.79 |
internvl2-4b | INT4-MIXED | 1027 | 589.7 | 30.5 | 32.79 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 407.8 | 30.5 | 32.79 |
llama-3.2-1b-instruct | FP16 | 1024 | 135.5 | 30.6 | 32.68 |
gpt-oss-20b | INT4-MIXED | 1024 | 671.5 | 30.7 | 32.57 |
whisper-large-v3 | INT4-MIXED | prompt1 | 333.6 | 30.7 | 32.57 |
gemma-2-2b | INT8-CW | 33 | 47.4 | 30.8 | 32.47 |
phi-2 | INT8-CW | 32 | 66.9 | 31.5 | 31.75 |
whisper-large-v3 | INT8-CW | prompt0 | 292.8 | 31.5 | 31.75 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 65.3 | 31.6 | 31.65 |
stablelm-3b-4e1t | INT8-CW | 32 | 66.5 | 31.6 | 31.65 |
gemma-3-4b-it | INT4-MIXED | 1024 | 404.9 | 31.8 | 31.45 |
whisper-large-v3 | INT8-CW | prompt1 | 343.8 | 31.9 | 31.35 |
gemma-3-4b-it | INT4-MIXED | 1024 | 406.7 | 32.1 | 31.15 |
glm-edge-4b-chat | INT4-MIXED | 1024 | 571.8 | 32.4 | 30.86 |
minicpm-1b-sft | FP16 | 31 | 60.5 | 32.5 | 30.77 |
gemma-2-2b | INT8-CW | 1025 | 240.2 | 32.6 | 30.67 |
chatglm3-6b | INT4-MIXED | 32 | 45.1 | 33.1 | 30.21 |
gemma-4-e2b-it | INT4-MIXED | 1024 | 601.1 | 33.4 | 29.94 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 40.1 | 33.5 | 29.85 |
qwen2.5-coder-1.5b-instruct | FP16 | 32 | 40.6 | 33.5 | 29.85 |
qwen2.5-1.5b-instruct | FP16 | 32 | 40.1 | 33.6 | 29.76 |
stablelm-3b-4e1t | INT8-CW | 1024 | 340.3 | 33.8 | 29.59 |
gemma-4-e2b-it | INT4-MIXED | 1024 | 582.4 | 33.9 | 29.50 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 58.2 | 33.9 | 29.50 |
minicpm-1b-sft | FP16 | 1014 | 184.3 | 34 | 29.41 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 334.3 | 34 | 29.41 |
phi-2 | INT8-CW | 1024 | 343.4 | 34.2 | 29.24 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 191.3 | 34.5 | 28.99 |
qwen2.5-coder-1.5b-instruct | FP16 | 1024 | 165.6 | 34.6 | 28.90 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 167.1 | 34.8 | 28.74 |
gpt-oss-20b | INT4-MIXED | 32 | 218.1 | 34.8 | 28.74 |
chatglm3-6b | INT4-MIXED | 1024 | 470.8 | 35.2 | 28.41 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 276.6 | 35.2 | 28.41 |
chatglm3-6b | INT4-MIXED | 32 | 49.7 | 35.4 | 28.25 |
gemma-4-e2b-it | INT8-CW | 274 | 461.1 | 35.6 | 28.09 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 462.4 | 35.6 | 28.09 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 542.7 | 35.6 | 28.09 |
whisper-large-v3 | FP16 | prompt0 | 276.7 | 35.8 | 27.93 |
whisper-large-v3 | FP16 | prompt1 | 315.9 | 36.2 | 27.62 |
gpt-oss-20b | INT4-MIXED | 1024 | 701.4 | 36.3 | 27.55 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1130.2 | 36.3 | 27.55 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1249.6 | 36.3 | 27.55 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 53.2 | 36.8 | 27.17 |
codellama-7b | INT4-MIXED | 32 | 52.4 | 36.9 | 27.10 |
chatglm3-6b | INT4-MIXED | 1024 | 558 | 37.2 | 26.88 |
llama-3.2-3b-instruct | INT8-CW | 32 | 46.8 | 37.3 | 26.81 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 284.3 | 38.5 | 25.97 |
biomistral-7b-slerp | INT4-MIXED | 7 | 45.4 | 38.7 | 25.84 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 54 | 38.8 | 25.77 |
zephyr-7b-beta | INT4-MIXED | 32 | 54.1 | 38.8 | 25.77 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 55.6 | 38.9 | 25.71 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 53.7 | 39.1 | 25.58 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 54.2 | 39.2 | 25.51 |
glm-edge-1.5b-chat | FP16 | 32 | 52.9 | 39.3 | 25.45 |
codellama-7b | INT4-MIXED | 32 | 55.9 | 39.4 | 25.38 |
qwen3_8b_eagle3 | INT4-MIXED | 32 | 65.7 | 39.7 | 25.19 |
falcon-7b-instruct | INT4-MIXED | 32 | 56.1 | 40.2 | 24.88 |
glm-edge-1.5b-chat | FP16 | 1024 | 218.3 | 40.4 | 24.75 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 530.3 | 40.6 | 24.63 |
codellama-7b | INT4-MIXED | 1024 | 523.7 | 40.7 | 24.57 |
minicpm3-4b | INT4-MIXED | 32 | 178.7 | 40.7 | 24.57 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 524.2 | 40.8 | 24.51 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 59 | 40.9 | 24.45 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 58.7 | 40.9 | 24.45 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 6482 | 40.9 | 24.45 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 645.3 | 41 | 24.39 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 6208.2 | 41 | 24.39 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 42.5 | 41 | 24.39 |
minicpm3-4b | INT4-MIXED | 32 | 174.9 | 41.1 | 24.33 |
biomistral-7b-slerp | INT4-MIXED | 7 | 46.8 | 41.3 | 24.21 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 545 | 41.3 | 24.21 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 540.6 | 41.3 | 24.21 |
zephyr-7b-beta | INT4-MIXED | 1024 | 535.1 | 41.3 | 24.21 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 59.2 | 41.4 | 24.15 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 59 | 41.4 | 24.15 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 58.1 | 41.4 | 24.15 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 540.4 | 41.5 | 24.10 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 58.5 | 41.5 | 24.10 |
minicpm3-4b | INT4-MIXED | 32 | 174.7 | 41.6 | 24.04 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 6404.3 | 41.8 | 23.92 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 6597.4 | 42 | 23.81 |
falcon-7b-instruct | INT4-MIXED | 1024 | 570.2 | 42.1 | 23.75 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 69.4 | 42.1 | 23.75 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 6814.4 | 42.3 | 23.64 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 6576.2 | 42.3 | 23.64 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 1439.4 | 42.4 | 23.58 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 585.5 | 42.5 | 23.53 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 56 | 42.5 | 23.53 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 56.8 | 42.5 | 23.53 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 57.2 | 42.6 | 23.47 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 1299.1 | 42.6 | 23.47 |
llama-3-8b-instruct | INT4-MIXED | 32 | 60.4 | 42.9 | 23.31 |
gemma-4-e2b-it | INT8-CW | 1024 | 651.1 | 43 | 23.26 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 58.3 | 43 | 23.26 |
codellama-7b | INT4-MIXED | 1024 | 610.4 | 43.1 | 23.20 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 56.9 | 43.1 | 23.20 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 56.5 | 43.1 | 23.20 |
phi-3.5-mini-instruct | INT8-CW | 32 | 56.8 | 43.1 | 23.20 |
qwen3_8b_eagle3 | INT4-MIXED | 1024 | 637.5 | 43.2 | 23.15 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 44.4 | 43.2 | 23.15 |
minicpm-v-2_6 | INT4-MIXED | 228 | 544.7 | 43.3 | 23.09 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 576.6 | 43.3 | 23.09 |
minicpm3-4b | INT4-MIXED | 1024 | 723.6 | 43.4 | 23.04 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 576.7 | 43.4 | 23.04 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 489.2 | 43.4 | 23.04 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 59.8 | 43.5 | 22.99 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 60 | 43.5 | 22.99 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 60.3 | 43.6 | 22.94 |
minicpm-o-2_6 | INT4-MIXED | 238 | 559.6 | 43.6 | 22.94 |
qwen2-7b-instruct | INT4-MIXED | 32 | 59.5 | 43.6 | 22.94 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 486.4 | 43.6 | 22.94 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 489.8 | 43.7 | 22.88 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 59.9 | 43.8 | 22.83 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 695.9 | 43.9 | 22.78 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 648 | 43.9 | 22.78 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 60.4 | 44 | 22.73 |
minicpm3-4b | INT4-MIXED | 1024 | 860.6 | 44 | 22.73 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 647.7 | 44 | 22.73 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 644.5 | 44 | 22.73 |
qwen2-7b-instruct | INT4-MIXED | 32 | 60.2 | 44.1 | 22.68 |
qwen3-8b | INT4-MIXED | 32 | 71 | 44.1 | 22.68 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 60 | 44.2 | 22.62 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 59.7 | 44.3 | 22.57 |
bloomz-7b1 | INT4-MIXED | 32 | 60.8 | 44.4 | 22.52 |
minicpm3-4b | INT4-MIXED | 1024 | 756.3 | 44.5 | 22.47 |
minicpm4-8b | INT4-MIXED | 32 | 63.8 | 44.5 | 22.47 |
falcon-7b-instruct | INT4-MIXED | 32 | 62.5 | 44.7 | 22.37 |
llama-3-8b-instruct | INT4-MIXED | 32 | 62.6 | 44.7 | 22.37 |
minicpm-v-2_6 | INT4-MIXED | 228 | 554.2 | 44.7 | 22.37 |
phi-4-mini-reasoning | INT8-CW | 32 | 68.5 | 44.7 | 22.37 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 62.4 | 44.8 | 22.32 |
gemma-3-4b-it | INT8-CW | 32 | 89.8 | 44.9 | 22.27 |
llama-3-8b-instruct | INT4-MIXED | 32 | 62.9 | 45 | 22.22 |
phi-4-mini-instruct | INT8-CW | 32 | 69.1 | 45 | 22.22 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 550.2 | 45 | 22.22 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 544.7 | 45 | 22.22 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 547 | 45.1 | 22.17 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 527.9 | 45.1 | 22.17 |
minicpm-v-2_6 | INT4-MIXED | 228 | 579.9 | 45.1 | 22.17 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 522.2 | 45.1 | 22.17 |
minicpm-o-2_6 | INT4-MIXED | 238 | 570 | 45.3 | 22.08 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 583.6 | 45.4 | 22.03 |
internvl2-4b | INT8-CW | 297 | 215.6 | 45.4 | 22.03 |
llama-3-8b-instruct | INT4-MIXED | 32 | 62.6 | 45.4 | 22.03 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 580.4 | 45.4 | 22.03 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 587.9 | 45.5 | 21.98 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 61.3 | 45.6 | 21.93 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 544.8 | 45.8 | 21.83 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 490.4 | 45.8 | 21.83 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 587.3 | 45.8 | 21.83 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 484.2 | 45.9 | 21.79 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 487.7 | 45.9 | 21.79 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 589 | 46 | 21.74 |
qwen3-4b | INT8-CW | 32 | 64.1 | 46 | 21.74 |
minicpm-v-4_5 | INT4-MIXED | 217 | 593.1 | 46.1 | 21.69 |
phi-4-mini-instruct | INT8-CW | 1024 | 420.8 | 46.4 | 21.55 |
phi-4-mini-reasoning | INT8-CW | 1024 | 421 | 46.4 | 21.55 |
minicpm4-8b | INT4-MIXED | 1024 | 577.6 | 46.5 | 21.51 |
qwen3-8b | INT4-MIXED | 32 | 70.6 | 46.5 | 21.51 |
bloomz-7b1 | INT4-MIXED | 32 | 65.9 | 46.7 | 21.41 |
minicpm4-8b | INT4-MIXED | 32 | 70.4 | 46.8 | 21.37 |
qwen3-8b | INT4-MIXED | 1024 | 565.3 | 46.8 | 21.37 |
falcon-7b-instruct | INT4-MIXED | 1024 | 666.3 | 46.9 | 21.32 |
qwen3-8b | INT4-MIXED | 32 | 73.4 | 46.9 | 21.32 |
gemma-3-4b-it | INT8-CW | 1024 | 418.2 | 47 | 21.28 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 578.7 | 47.2 | 21.19 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 582.2 | 47.3 | 21.14 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 577 | 47.3 | 21.14 |
phi-3.5-vision-instruct | INT8-CW | 802 | 501.9 | 47.3 | 21.14 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 549.1 | 47.6 | 21.01 |
afm-4.5b | INT8-CW | 32 | 60.1 | 47.7 | 20.96 |
bloomz-7b1 | INT4-MIXED | 1024 | 686.7 | 47.7 | 20.96 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 568.9 | 48 | 20.83 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 648.1 | 48.2 | 20.75 |
minicpm4-8b | INT4-MIXED | 32 | 69.2 | 48.3 | 20.70 |
internvl2-4b | INT8-CW | 1027 | 606.8 | 48.4 | 20.66 |
afm-4.5b | INT8-CW | 1024 | 385.7 | 48.5 | 20.62 |
minicpm-v-4_5 | INT4-MIXED | 217 | 614.8 | 48.5 | 20.62 |
qwen3-4b | INT8-CW | 1024 | 407.4 | 48.6 | 20.58 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 633.2 | 48.7 | 20.53 |
minicpm4-8b | INT4-MIXED | 1024 | 620.6 | 48.9 | 20.45 |
qwen3-8b | INT4-MIXED | 1024 | 600.7 | 49.1 | 20.37 |
qwen3-8b | INT4-MIXED | 1024 | 675.1 | 49.8 | 20.08 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 67.1 | 49.9 | 20.04 |
zephyr-7b-beta | INT4-MIXED | 32 | 64.6 | 49.9 | 20.04 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 150.7 | 50.1 | 19.96 |
bloomz-7b1 | INT4-MIXED | 1024 | 797 | 50.2 | 19.92 |
glm-edge-4b-chat | INT8-CW | 32 | 96 | 50.3 | 19.88 |
minicpm4-8b | INT4-MIXED | 1024 | 713.5 | 50.5 | 19.80 |
gemma-7b-it | INT4-MIXED | 32 | 71.7 | 51 | 19.61 |
baichuan2-7b-chat | INT4-MIXED | 32 | 66.5 | 51.1 | 19.57 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 615.4 | 51.4 | 19.46 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 73.4 | 51.9 | 19.27 |
glm-edge-4b-chat | INT8-CW | 1024 | 586.2 | 52.4 | 19.08 |
zephyr-7b-beta | INT4-MIXED | 1024 | 570.5 | 52.4 | 19.08 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 671.6 | 52.5 | 19.05 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 73 | 53.1 | 18.83 |
fara-7b | INT4-MIXED | 32 | 174.9 | 53.5 | 18.69 |
gemma-7b-it | INT4-MIXED | 32 | 76.1 | 53.9 | 18.55 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 150.8 | 54 | 18.52 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 610.3 | 54.1 | 18.48 |
phi-4-multimodal-instruct | INT8-CW | 578 | 497.1 | 54.4 | 18.38 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 674.8 | 54.6 | 18.32 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 153.9 | 54.8 | 18.25 |
phi-4-multimodal-instruct | INT8-CW | 786 | 602.7 | 54.9 | 18.21 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 646.9 | 55 | 18.18 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 703.5 | 55 | 18.18 |
gemma-7b-it | INT4-MIXED | 1024 | 645 | 55.1 | 18.15 |
qwen3.5-9b | INT4-MIXED | 83 | 251.7 | 55.1 | 18.15 |
ltx-video | INT4-MIXED | 11 | 55.7 | 55.2 | 18.12 |
llava-next-video-7b-hf | INT4-MIXED | 2945 | 3184.7 | 56.1 | 17.83 |
minicpm3-4b | INT8-CW | 32 | 200.7 | 56.1 | 17.83 |
fara-7b | INT4-MIXED | 1024 | 839 | 56.2 | 17.79 |
glm-4-9b-chat-hf | INT4-MIXED | 1024 | 785.2 | 56.2 | 17.79 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 1239.2 | 56.2 | 17.79 |
qwen3.5-9b | INT4-MIXED | 1024 | 1000.9 | 56.3 | 17.76 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 73.4 | 56.4 | 17.73 |
phi-4-multimodal-instruct | INT8-CW | 1570 | 1378 | 56.5 | 17.70 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 73.2 | 56.6 | 17.67 |
phi-2 | FP16 | 32 | 65.1 | 57.2 | 17.48 |
gemma-7b-it | INT4-MIXED | 1024 | 753 | 57.8 | 17.30 |
ltx-video | INT8-CW | 11 | 57.6 | 57.8 | 17.30 |
gemma-4-e2b-it | FP16 | 274 | 504.3 | 57.9 | 17.27 |
stable-zephyr-3b-dpo | FP16 | 32 | 65.1 | 58 | 17.24 |
gemma-2-9b-it | INT4-MIXED | 32 | 80.4 | 58.2 | 17.18 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 718.3 | 58.7 | 17.04 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 719.3 | 58.8 | 17.01 |
qwen3.5-9b | INT4-MIXED | 83 | 259.5 | 58.8 | 17.01 |
gemma-2-2b | FP16 | 33 | 64 | 58.9 | 16.98 |
stablelm-3b-4e1t | FP16 | 32 | 65 | 58.9 | 16.98 |
qwen3.5-9b | INT4-MIXED | 1024 | 1162.2 | 59 | 16.95 |
gemma-2-9b-it | INT4-MIXED | 32 | 82.7 | 59.8 | 16.72 |
qwen3-vl-4b-instruct | INT8-CW | 4907 | 6586.4 | 59.8 | 16.72 |
gemma-2-2b | FP16 | 1025 | 334.1 | 60.3 | 16.58 |
qwen3-vl-4b-instruct | INT8-CW | 4937 | 6840.8 | 60.4 | 16.56 |
phi-2 | FP16 | 1024 | 433.8 | 61.1 | 16.37 |
gemma-2-9b-it | INT4-MIXED | 32 | 80.7 | 61.2 | 16.34 |
stable-zephyr-3b-dpo | FP16 | 1024 | 411.6 | 61.7 | 16.21 |
minicpm3-4b | INT8-CW | 1024 | 910.9 | 61.8 | 16.18 |
stablelm-3b-4e1t | FP16 | 1024 | 411.9 | 61.8 | 16.18 |
gemma-2b-it | FP16 | 32 | 67.2 | 61.9 | 16.16 |
gemma-2-9b-it | INT4-MIXED | 1024 | 724.6 | 62.2 | 16.08 |
gemma-2b-it | FP16 | 1024 | 256.5 | 62.5 | 16.00 |
gemma-2-9b-it | INT4-MIXED | 1024 | 750.2 | 64.3 | 15.55 |
gemma-2-9b-it | INT4-MIXED | 1024 | 814.2 | 65.5 | 15.27 |
gemma-4-e2b-it | FP16 | 1024 | 739 | 65.6 | 15.24 |
lcm-dreamshaper-v7 | INT8-HYBRID | 1024 | 70.5 | 66.2 | 15.11 |
lcm-dreamshaper-v7 | INT8-HYBRID | 32 | 69.7 | 66.6 | 15.02 |
chatglm3-6b | INT8-CW | 32 | 80.5 | 66.9 | 14.95 |
llama-3.2-3b-instruct | FP16 | 32 | 72.5 | 67.6 | 14.79 |
qwen3_8b_eagle3 | INT8-CW | 1024 | 673.7 | 67.6 | 14.79 |
dolly-v2-12b | INT4-MIXED | 32 | 102.1 | 68 | 14.71 |
chatglm3-6b | INT8-CW | 1024 | 548.8 | 68.7 | 14.56 |
llama-3.2-3b-instruct | FP16 | 1024 | 403.5 | 69.4 | 14.41 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 96.6 | 69.9 | 14.31 |
qwen2.5-coder-3b-instruct | FP16 | 32 | 78 | 72.8 | 13.74 |
dolly-v2-12b | INT4-MIXED | 1024 | 1248.3 | 73 | 13.70 |
gemma-3-12b-it | INT4-MIXED | 32 | 138.4 | 73 | 13.70 |
qwen3_8b_eagle3 | INT8-CW | 32 | 105.5 | 73.1 | 13.68 |
falcon-7b-instruct | INT8-CW | 32 | 89.3 | 74.1 | 13.50 |
qwen2.5-coder-3b-instruct | FP16 | 1024 | 362.8 | 74.5 | 13.42 |
qwen2.5-7b-instruct | INT8-CW | 32 | 94.4 | 74.7 | 13.39 |
qwen2.5-7b-instruct-1m | INT8-CW | 32 | 94.4 | 74.7 | 13.39 |
qwen2.5-7b-instruct | INT8-CW | 32 | 95.2 | 74.9 | 13.35 |
qwen2-7b-instruct | INT8-CW | 32 | 94.8 | 75 | 13.33 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 1015.5 | 75.5 | 13.25 |
deepseek-r1-distill-qwen-7b | INT8-CW | 32 | 94.9 | 75.8 | 13.19 |
minicpm-v-2_6 | INT8-CW | 228 | 551.9 | 75.9 | 13.18 |
falcon-7b-instruct | INT8-CW | 1024 | 814 | 76 | 13.16 |
codellama-7b | INT8-CW | 32 | 88.2 | 76.2 | 13.12 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 581.9 | 76.2 | 13.12 |
qwen2-7b-instruct | INT8-CW | 1024 | 580.4 | 76.3 | 13.11 |
llama-2-7b-chat-hf | INT8-CW | 32 | 87.9 | 76.4 | 13.09 |
qwen2.5-7b-instruct | INT8-CW | 1024 | 581.1 | 76.4 | 13.09 |
gemma-3-12b-it | INT4-MIXED | 32 | 145.6 | 76.7 | 13.04 |
phi-3.5-mini-instruct | FP16 | 32 | 84.5 | 76.7 | 13.04 |
phi-3-mini-128k-instruct | FP16 | 32 | 84.7 | 76.8 | 13.02 |
gemma-3-12b-it | INT4-MIXED | 1024 | 1042.3 | 76.9 | 13.00 |
phi-3-mini-4k-instruct | FP16 | 32 | 84.8 | 76.9 | 13.00 |
deepseek-r1-distill-qwen-7b | INT8-CW | 1024 | 585.3 | 77.1 | 12.97 |
qwen2.5-7b-instruct-1m | INT8-CW | 1024 | 576.2 | 77.2 | 12.95 |
minicpm-o-2_6 | INT8-CW | 238 | 564.2 | 77.4 | 12.92 |
phi-4 | INT4-MIXED | 32 | 111.2 | 77.4 | 12.92 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 80.1 | 77.9 | 12.84 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 79 | 78.1 | 12.80 |
phi-4-reasoning | INT4-MIXED | 32 | 114.1 | 78.9 | 12.67 |
internvl2-4b | FP16 | 297 | 283.7 | 79.1 | 12.64 |
codellama-7b | INT8-CW | 1024 | 580.4 | 79.6 | 12.56 |
qwen1.5-14b-chat | INT4-MIXED | 32 | 114.9 | 79.6 | 12.56 |
llama-2-7b-chat-hf | INT8-CW | 1024 | 593.3 | 79.7 | 12.55 |
lcm-dreamshaper-v7 | INT8-CW | 1024 | 80.4 | 79.8 | 12.53 |
lcm-dreamshaper-v7 | INT8-CW | 32 | 80.7 | 79.8 | 12.53 |
phi-4-mini-instruct | FP16 | 32 | 89.1 | 79.8 | 12.53 |
phi-4-mini-reasoning | FP16 | 32 | 89.6 | 79.9 | 12.52 |
gemma-3-12b-it | INT4-MIXED | 1024 | 1162.9 | 80.8 | 12.38 |
phi-3.5-vision-instruct | FP16 | 802 | 678.1 | 80.8 | 12.38 |
phi-4-reasoning | INT4-MIXED | 32 | 122.9 | 80.9 | 12.36 |
phi-3.5-mini-instruct | FP16 | 1024 | 575.1 | 81 | 12.35 |
mistral-7b-instruct-v0.1 | INT8-CW | 32 | 95.3 | 81.1 | 12.33 |
phi-3-mini-128k-instruct | FP16 | 1024 | 576.8 | 81.1 | 12.33 |
phi-4 | INT4-MIXED | 1024 | 1142.3 | 81.1 | 12.33 |
zephyr-7b-beta | INT8-CW | 32 | 94.4 | 81.1 | 12.33 |
neural-chat-7b-v3-3 | INT8-CW | 32 | 94.2 | 81.2 | 12.32 |
mistral-7b-instruct-v0.3 | INT8-CW | 32 | 94.1 | 81.3 | 12.30 |
mistral-7b-instruct-v0.2 | INT8-CW | 32 | 93.6 | 81.4 | 12.29 |
phi-3-mini-4k-instruct | FP16 | 1024 | 572.6 | 81.4 | 12.29 |
internvl2-4b | FP16 | 1027 | 850 | 81.6 | 12.25 |
phi-4 | INT4-MIXED | 32 | 123.8 | 81.6 | 12.25 |
phi-3.5-vision-instruct | FP16 | 1032 | 866.4 | 81.8 | 12.22 |
biomistral-7b-slerp | INT8-CW | 7 | 85.7 | 81.9 | 12.21 |
phi-4-mini-instruct | FP16 | 1024 | 552.3 | 82 | 12.20 |
phi-4-mini-reasoning | FP16 | 1024 | 557 | 82.2 | 12.17 |
phi-4-reasoning | INT4-MIXED | 1024 | 1161.8 | 82.7 | 12.09 |
baichuan2-7b-chat | INT8-CW | 32 | 93.4 | 82.8 | 12.08 |
gemma-3-4b-it | FP16 | 32 | 105.2 | 82.8 | 12.08 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 32 | 113.2 | 83.1 | 12.03 |
mistral-7b-instruct-v0.3 | INT8-CW | 1024 | 609.4 | 83.4 | 11.99 |
fara-7b | INT8-CW | 32 | 215.4 | 83.5 | 11.98 |
mistral-7b-instruct-v0.1 | INT8-CW | 1025 | 656.8 | 83.5 | 11.98 |
zephyr-7b-beta | INT8-CW | 1024 | 605.3 | 83.5 | 11.98 |
qwen2.5-vl-7b-instruct | INT8-CW | 32 | 191.2 | 83.7 | 11.95 |
mistral-7b-instruct-v0.2 | INT8-CW | 1024 | 606.4 | 83.8 | 11.93 |
neural-chat-7b-v3-3 | INT8-CW | 1024 | 603.8 | 83.8 | 11.93 |
bloomz-7b1 | INT8-CW | 32 | 96.2 | 84.3 | 11.86 |
gemma-3-4b-it | FP16 | 1024 | 550.8 | 84.3 | 11.86 |
fara-7b | INT8-CW | 1024 | 835 | 84.9 | 11.78 |
llama-3.1-8b-instruct | INT8-CW | 32 | 98.2 | 85 | 11.76 |
llama-3-8b-instruct | INT8-CW | 32 | 98.5 | 85.2 | 11.74 |
phi-4 | INT4-MIXED | 1024 | 1326.6 | 85.3 | 11.72 |
qwen1.5-14b-chat | INT4-MIXED | 1024 | 1166 | 85.4 | 11.71 |
baichuan2-7b-chat | INT8-CW | 1024 | 645.6 | 85.5 | 11.70 |
phi-4-reasoning | INT4-MIXED | 1024 | 1206.1 | 85.6 | 11.68 |
qwen3-4b | FP16 | 32 | 93.8 | 85.7 | 11.67 |
phi-4-reasoning | INT4-MIXED | 32 | 123.5 | 86.2 | 11.60 |
deepseek-r1-distill-llama-8b | INT8-CW | 32 | 98.5 | 86.3 | 11.59 |
qwen2.5-vl-7b-instruct | INT8-CW | 1024 | 697.1 | 86.7 | 11.53 |
bloomz-7b1 | INT8-CW | 1024 | 798.3 | 86.9 | 11.51 |
deepseek-r1-distill-qwen-14b | INT4-MIXED | 1024 | 1217.4 | 87.2 | 11.47 |
glm-edge-4b-chat | FP16 | 32 | 100.5 | 87.2 | 11.47 |
llama-3-8b-instruct | INT8-CW | 1024 | 619.3 | 87.6 | 11.42 |
llama-3.1-8b-instruct | INT8-CW | 1024 | 614.4 | 87.6 | 11.42 |
qwen3-4b | FP16 | 1024 | 553.4 | 88.5 | 11.30 |
deepseek-r1-distill-llama-8b | INT8-CW | 1024 | 616.3 | 88.8 | 11.26 |
llava-v1.6-mistral-7b-hf | INT8-CW | 2944 | 2916.3 | 88.8 | 11.26 |
starcoder | INT4-MIXED | 32 | 140.3 | 89.2 | 11.21 |
glm-edge-4b-chat | FP16 | 1024 | 704.6 | 89.7 | 11.15 |
llama-2-13b-chat-hf | INT4-MIXED | 32 | 120.4 | 89.7 | 11.15 |
phi-4-reasoning | INT4-MIXED | 1024 | 1371.4 | 89.9 | 11.12 |
starcoder2-15b | INT4-MIXED | 32 | 144.6 | 90.7 | 11.03 |
afm-4.5b | FP16 | 32 | 100.3 | 90.9 | 11.00 |
qwen3-8b | INT8-CW | 32 | 101.4 | 91.3 | 10.95 |
afm-4.5b | FP16 | 1024 | 498 | 91.8 | 10.89 |
minicpm-v-4_5 | INT8-CW | 217 | 586.8 | 91.9 | 10.88 |
starcoder | INT4-MIXED | 1024 | 1437.6 | 92.8 | 10.78 |
qwen3-8b | INT8-CW | 1024 | 657.4 | 92.9 | 10.76 |
llava-next-video-7b-hf | INT8-CW | 2945 | 3258.6 | 93.1 | 10.74 |
gemma-7b-it | INT8-CW | 32 | 114.5 | 93.2 | 10.73 |
llama-2-13b-chat-hf | INT4-MIXED | 1024 | 1204 | 93.9 | 10.65 |
minicpm4-8b | INT8-CW | 32 | 106.7 | 94.6 | 10.57 |
starcoder2-15b | INT4-MIXED | 1024 | 1455.4 | 94.6 | 10.57 |
phi-4-multimodal-instruct | FP16 | 578 | 629.7 | 94.7 | 10.56 |
phi-4-multimodal-instruct | FP16 | 786 | 767.6 | 95 | 10.53 |
minicpm3-4b | FP16 | 32 | 177.5 | 95.7 | 10.45 |
minicpm4-8b | INT8-CW | 1024 | 716.2 | 96.5 | 10.36 |
glm-4-9b-chat-hf | INT8-CW | 32 | 114.3 | 97.3 | 10.28 |
lcm-dreamshaper-v7 | FP16 | 32 | 86.1 | 97.3 | 10.28 |
gemma-7b-it | INT8-CW | 1024 | 757.2 | 97.7 | 10.24 |
lcm-dreamshaper-v7 | FP16 | 1024 | 94.1 | 97.7 | 10.24 |
phi-4-multimodal-instruct | FP16 | 1570 | 1675 | 97.9 | 10.21 |
phi-4-multimodal-instruct | FP16 | 1362 | 1524.4 | 98.6 | 10.14 |
Topology | Precision | Input Size | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec |
|---|---|---|---|---|---|
minicpm4-0.5b | INT4-MIXED | 32 | 22 | 7.3 | 136.99 |
minicpm4-0.5b | INT4-MIXED | 32 | 21.9 | 7.5 | 133.33 |
minicpm4-0.5b | INT4-MIXED | 32 | 23.4 | 7.7 | 129.87 |
gemma-3-270m | INT4-MIXED | 1024 | 74.2 | 8.2 | 121.95 |
minicpm4-0.5b | INT4-MIXED | 1024 | 111.5 | 8.4 | 119.05 |
gemma-3-270m | INT4-MIXED | 32 | 23.7 | 8.5 | 117.65 |
minicpm4-0.5b | INT4-MIXED | 1024 | 116.9 | 8.6 | 116.28 |
minicpm4-0.5b | INT4-MIXED | 1024 | 142.4 | 8.8 | 113.64 |
gemma-3-270m | INT8-CW | 1024 | 71.1 | 9 | 111.11 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 22.9 | 9 | 111.11 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 24.1 | 9.1 | 109.89 |
gemma-3-270m | INT8-CW | 32 | 23.5 | 9.2 | 108.70 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 32 | 24.1 | 9.3 | 107.53 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 116.7 | 9.8 | 102.04 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 118.7 | 10 | 100.00 |
qwen2.5-coder-0.5b-instruct | INT4-MIXED | 1024 | 147.3 | 10.1 | 99.01 |
minicpm4-0.5b | INT8-CW | 32 | 23.5 | 10.4 | 96.15 |
whisper-small | INT4-MIXED | prompt0 | 140.5 | 10.5 | 95.24 |
whisper-small | INT4-MIXED | prompt1 | 196.1 | 11 | 90.91 |
whisper-small | INT4-MIXED | prompt1 | 204 | 11.4 | 87.72 |
minicpm4-0.5b | INT8-CW | 1024 | 129.2 | 11.5 | 86.96 |
whisper-small | INT4-MIXED | prompt0 | 143.7 | 11.5 | 86.96 |
whisper-small | INT4-MIXED | prompt0 | 157.4 | 11.6 | 86.21 |
whisper-small | INT8-CW | prompt1 | 209.3 | 11.6 | 86.21 |
whisper-small | INT4-MIXED | prompt1 | 223.1 | 11.7 | 85.47 |
whisper-small | INT8-CW | prompt0 | 170 | 11.8 | 84.75 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 32 | 24.9 | 12.1 | 82.64 |
gemma-3-270m | FP16 | 32 | 23.6 | 12.3 | 81.30 |
nanollava | INT4-MIXED | 760 | 250.2 | 12.4 | 80.65 |
whisper-large-v3-turbo | INT8-CW | prompt1 | 484.9 | 12.4 | 80.65 |
whisper-large-v3-turbo | INT4-MIXED | prompt1 | 508.3 | 12.7 | 78.74 |
whisper-large-v3-turbo | INT8-CW | prompt0 | 421.2 | 12.7 | 78.74 |
whisper-large-v3-turbo | INT4-MIXED | prompt0 | 452.6 | 12.8 | 78.13 |
gemma-3-270m | FP16 | 1024 | 79.8 | 12.9 | 77.52 |
qwen2.5-coder-0.5b-instruct | INT8-CW | 1024 | 135.3 | 12.9 | 77.52 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 30 | 13.2 | 75.76 |
nanollava | INT8-CW | 760 | 239.5 | 13.7 | 72.99 |
qwen3.5-0.8b | INT4-MIXED | 85 | 92.7 | 14.3 | 69.93 |
nanollava | INT4-MIXED | 1752 | 485.2 | 14.4 | 69.44 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 380.5 | 14.4 | 69.44 |
qwen3.5-0.8b | INT4-MIXED | 85 | 92.4 | 14.4 | 69.44 |
whisper-small | FP16 | prompt0 | 143.4 | 14.6 | 68.49 |
whisper-small | FP16 | prompt1 | 204.3 | 14.7 | 68.03 |
qwen3.5-0.8b | INT4-MIXED | 85 | 97.4 | 14.8 | 67.57 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 384.7 | 14.8 | 67.57 |
qwen3.5-0.8b | INT4-MIXED | 1024 | 434.7 | 15.1 | 66.23 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 240.6 | 15.1 | 66.23 |
nanollava | INT8-CW | 1752 | 458.6 | 15.8 | 63.29 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 34.3 | 16 | 62.50 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 33.1 | 16.4 | 60.98 |
whisper-large-v3-turbo | FP16 | prompt0 | 454.7 | 16.4 | 60.98 |
whisper-large-v3-turbo | FP16 | prompt1 | 521.8 | 16.4 | 60.98 |
llama-3.2-1b-instruct | INT4-MIXED | 32 | 33.8 | 16.6 | 60.24 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 226.8 | 17.7 | 56.50 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 320.6 | 17.9 | 55.87 |
gemma-3-1b-it | INT4-MIXED | 32 | 33.3 | 18 | 55.56 |
llama-3.2-1b-instruct | INT4-MIXED | 1024 | 271 | 18 | 55.56 |
qwen3.5-0.8b | INT8-CW | 85 | 98.6 | 18 | 55.56 |
qwen3.5-0.8b | INT8-CW | 1024 | 398.2 | 18.6 | 53.76 |
gemma-3-1b-it | INT4-MIXED | 1024 | 217.3 | 18.8 | 53.19 |
gemma-3-1b-it | INT4-MIXED | 32 | 34.9 | 18.8 | 53.19 |
gemma-3-1b-it | INT4-MIXED | 32 | 35 | 19 | 52.63 |
minicpm4-0.5b | FP16 | 32 | 23.5 | 19.4 | 51.55 |
gemma-3-1b-it | INT4-MIXED | 1024 | 220.5 | 19.7 | 50.76 |
gemma-3-1b-it | INT4-MIXED | 1024 | 253.8 | 19.9 | 50.25 |
minicpm-1b-sft | INT4-MIXED | 31 | 57.6 | 20 | 50.00 |
glm-edge-1.5b-chat | INT4-MIXED | 32 | 48.5 | 20.3 | 49.26 |
minicpm4-0.5b | FP16 | 1024 | 143.7 | 20.4 | 49.02 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 41.3 | 20.7 | 48.31 |
minicpm-1b-sft | INT4-MIXED | 31 | 45.4 | 21 | 47.62 |
minicpm-1b-sft | INT4-MIXED | 31 | 47.7 | 21.4 | 46.73 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 39.9 | 21.4 | 46.73 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 45.5 | 21.7 | 46.08 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 45.3 | 21.8 | 45.87 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 44.7 | 21.8 | 45.87 |
nanollava | FP16 | 760 | 285.1 | 21.9 | 45.66 |
qwen2.5-coder-0.5b-instruct | FP16 | 32 | 27.5 | 22 | 45.45 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 32 | 45.4 | 22 | 45.45 |
glm-edge-1.5b-chat | INT4-MIXED | 1024 | 559.9 | 22.1 | 45.25 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 325.4 | 22.2 | 45.05 |
qwen2.5-coder-0.5b-instruct | FP16 | 1024 | 156.5 | 22.5 | 44.44 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 357.9 | 23.2 | 43.10 |
minicpm-1b-sft | INT4-MIXED | 1014 | 358.6 | 23.2 | 43.10 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 359.1 | 23.2 | 43.10 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 360 | 23.2 | 43.10 |
qwen2.5-coder-1.5b-instruct | INT4-MIXED | 1024 | 377.7 | 23.4 | 42.74 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 272.3 | 23.4 | 42.74 |
qwen2.5-1.5b-instruct | INT4-MIXED | 32 | 47.9 | 23.5 | 42.55 |
gemma-3-1b-it | INT8-CW | 32 | 44.5 | 23.9 | 41.84 |
nanollava | FP16 | 1752 | 512.2 | 23.9 | 41.84 |
minicpm-1b-sft | INT4-MIXED | 1014 | 369.9 | 24 | 41.67 |
minicpm-1b-sft | INT4-MIXED | 1014 | 420.4 | 24.7 | 40.49 |
gemma-3-1b-it | INT8-CW | 1024 | 258.7 | 24.8 | 40.32 |
qwen2.5-1.5b-instruct | INT4-MIXED | 1024 | 381.7 | 24.8 | 40.32 |
llama-3.2-1b-instruct | INT8-CW | 32 | 44.2 | 25.2 | 39.68 |
llama-3.2-1b-instruct | INT8-CW | 1024 | 260.8 | 26.7 | 37.45 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 32 | 62.6 | 30.8 | 32.47 |
qwen3.5-0.8b | FP16 | 85 | 100.3 | 31.3 | 31.95 |
glm-edge-1.5b-chat | INT8-CW | 32 | 62.8 | 31.7 | 31.55 |
minicpm-1b-sft | INT8-CW | 31 | 59 | 31.8 | 31.45 |
qwen3.5-0.8b | FP16 | 1024 | 421.9 | 31.8 | 31.45 |
deepseek-r1-distill-qwen-1.5b | INT4-MIXED | 1024 | 369.8 | 32.2 | 31.06 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 32 | 62.2 | 33.1 | 30.21 |
qwen2.5-1.5b-instruct | INT8-CW | 32 | 61.6 | 33.1 | 30.21 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 32 | 62.6 | 33.2 | 30.12 |
glm-edge-1.5b-chat | INT8-CW | 1024 | 560.5 | 33.5 | 29.85 |
gemma-2b-it | INT4-MIXED | 32 | 54.6 | 33.7 | 29.67 |
phi-2 | INT4-MIXED | 32 | 65 | 33.7 | 29.67 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 63.6 | 34 | 29.41 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 64.4 | 34 | 29.41 |
deepseek-r1-distill-qwen-1.5b | INT8-CW | 1024 | 359.4 | 34.5 | 28.99 |
qwen2.5-1.5b-instruct | INT8-CW | 1024 | 361.9 | 34.5 | 28.99 |
qwen2.5-coder-1.5b-instruct | INT8-CW | 1024 | 360.5 | 34.5 | 28.99 |
minicpm-1b-sft | INT8-CW | 1014 | 410.9 | 34.9 | 28.65 |
gemma-2b-it | INT4-MIXED | 1024 | 400.7 | 35.3 | 28.33 |
gemma-2b-it | INT4-MIXED | 32 | 66.3 | 35.4 | 28.25 |
gemma-2-2b | INT4-MIXED | 33 | 57.3 | 37 | 27.03 |
gemma-2b-it | INT4-MIXED | 1024 | 493.6 | 37 | 27.03 |
whisper-large-v3 | INT4-MIXED | prompt0 | 579.9 | 37.4 | 26.74 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 78.8 | 38 | 26.32 |
whisper-large-v3 | INT4-MIXED | prompt1 | 642.5 | 38.2 | 26.18 |
phi-2 | INT4-MIXED | 1024 | 770 | 38.4 | 26.04 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 835.6 | 38.7 | 25.84 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 837.3 | 38.8 | 25.77 |
gemma-2-2b | INT4-MIXED | 33 | 58.4 | 39.3 | 25.45 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 69.1 | 39.8 | 25.13 |
phi-2 | INT4-MIXED | 32 | 84.2 | 40 | 25.00 |
gemma-2-2b | INT4-MIXED | 1025 | 539.7 | 40.1 | 24.94 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 71.5 | 40.2 | 24.88 |
tiny-llama-1.1b-chat | FP16 | 32 | 49.9 | 40.7 | 24.57 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 82 | 40.9 | 24.45 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 81.6 | 40.9 | 24.45 |
llama-3.2-3b-instruct | INT4-MIXED | 32 | 81.9 | 41.5 | 24.10 |
tiny-llama-1.1b-chat | FP16 | 1024 | 336.1 | 42.2 | 23.70 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 536.7 | 42.3 | 23.64 |
gemma-2-2b | INT4-MIXED | 1025 | 616.3 | 42.4 | 23.58 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 940.7 | 42.7 | 23.42 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 589.4 | 42.8 | 23.36 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 88.9 | 42.9 | 23.31 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 556 | 43.2 | 23.15 |
gemma-3-1b-it | FP16 | 32 | 52.9 | 43.3 | 23.09 |
whisper-large-v3 | INT8-CW | prompt0 | 551.3 | 43.3 | 23.09 |
whisper-large-v3 | INT8-CW | prompt1 | 608.2 | 43.8 | 22.83 |
gemma-3-1b-it | FP16 | 1024 | 307.9 | 43.9 | 22.78 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 622.1 | 43.9 | 22.78 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 80.2 | 44.2 | 22.62 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 81.5 | 44.3 | 22.57 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 79.1 | 44.3 | 22.57 |
llama-3.2-3b-instruct | INT4-MIXED | 1024 | 706.3 | 44.4 | 22.52 |
phi-2 | INT4-MIXED | 1024 | 1016.2 | 44.7 | 22.37 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 96.5 | 45.6 | 21.93 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 96.8 | 45.7 | 21.88 |
phi-3-mini-128k-instruct | INT4-MIXED | 32 | 101.7 | 46.3 | 21.60 |
gemma-4-e2b-it | INT4-MIXED | 274 | 816.8 | 47 | 21.28 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 952.3 | 47.6 | 21.01 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 32 | 95.5 | 48 | 20.83 |
llama-3.2-1b-instruct | FP16 | 32 | 57.7 | 48.4 | 20.66 |
internvl2-4b | INT4-MIXED | 297 | 496.7 | 48.8 | 20.49 |
gemma-4-e2b-it | INT4-MIXED | 274 | 870.6 | 48.9 | 20.45 |
internvl2-4b | INT4-MIXED | 297 | 535.7 | 49.3 | 20.28 |
phi-4-mini-instruct | INT4-MIXED | 32 | 85.9 | 49.6 | 20.16 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 88.8 | 49.7 | 20.12 |
llama-3.2-1b-instruct | FP16 | 1024 | 331.3 | 49.9 | 20.04 |
qwen3-4b | INT4-MIXED | 32 | 88.3 | 50.1 | 19.96 |
qwen2.5-coder-3b-instruct | INT4-MIXED | 1024 | 1029.7 | 50.3 | 19.88 |
gemma-2b-it | INT8-CW | 32 | 88.5 | 50.5 | 19.80 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 910.5 | 50.6 | 19.76 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 911.3 | 50.6 | 19.76 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 911.1 | 50.7 | 19.72 |
phi-4-mini-instruct | INT4-MIXED | 32 | 102 | 50.8 | 19.69 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 100.3 | 51 | 19.61 |
phi-4-mini-reasoning | INT4-MIXED | 32 | 101.8 | 51.3 | 19.49 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 948 | 51.9 | 19.27 |
qwen3-4b | INT4-MIXED | 32 | 104.1 | 51.9 | 19.27 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 951.2 | 52.1 | 19.19 |
gemma-2b-it | INT8-CW | 1024 | 483.4 | 52.2 | 19.16 |
gemma-3-4b-it | INT4-MIXED | 32 | 96.1 | 52.3 | 19.12 |
glm-edge-4b-chat | INT4-MIXED | 32 | 119.6 | 52.6 | 19.01 |
phi-3-mini-128k-instruct | INT4-MIXED | 1024 | 1162.5 | 52.7 | 18.98 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 905.2 | 52.9 | 18.90 |
phi-3.5-vision-instruct | INT4-MIXED | 802 | 1189 | 53 | 18.87 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 905.2 | 53 | 18.87 |
gemma-3-4b-it | INT4-MIXED | 32 | 105.3 | 53.8 | 18.59 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 115.2 | 53.9 | 18.55 |
afm-4.5b | INT4-MIXED | 32 | 104.8 | 54.1 | 18.48 |
phi-3.5-mini-instruct | INT4-MIXED | 32 | 117.6 | 54.1 | 18.48 |
gemma-3-4b-it | INT4-MIXED | 32 | 106.2 | 54.2 | 18.45 |
phi-4-mini-instruct | INT4-MIXED | 1024 | 938.4 | 54.2 | 18.45 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 943.8 | 54.3 | 18.42 |
internvl2-4b | INT4-MIXED | 1027 | 1355.1 | 54.6 | 18.32 |
phi-4-mini-reasoning | INT4-MIXED | 1024 | 1019.6 | 54.7 | 18.28 |
internvl2-4b | INT4-MIXED | 1027 | 1381.9 | 54.9 | 18.21 |
phi-3.5-vision-instruct | INT4-MIXED | 1032 | 1423.9 | 54.9 | 18.21 |
qwen3-4b | INT4-MIXED | 1024 | 822.2 | 55 | 18.18 |
phi-2 | INT8-CW | 32 | 105.5 | 55.7 | 17.95 |
gemma-3-4b-it | INT4-MIXED | 1024 | 869.7 | 55.9 | 17.89 |
gemma-2-2b | INT8-CW | 33 | 106 | 56 | 17.86 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 106 | 56.3 | 17.76 |
stablelm-3b-4e1t | INT8-CW | 32 | 106.1 | 56.3 | 17.76 |
qwen3-4b | INT4-MIXED | 1024 | 860.8 | 56.5 | 17.70 |
glm-edge-4b-chat | INT4-MIXED | 1024 | 1419.2 | 56.7 | 17.64 |
minicpm-1b-sft | FP16 | 31 | 68 | 56.8 | 17.61 |
afm-4.5b | INT4-MIXED | 1024 | 923.1 | 57.4 | 17.42 |
gemma-3-4b-it | INT4-MIXED | 1024 | 892.1 | 57.5 | 17.39 |
gemma-3-4b-it | INT4-MIXED | 1024 | 957.5 | 57.9 | 17.27 |
glm-edge-1.5b-chat | FP16 | 32 | 75.7 | 58.5 | 17.09 |
gemma-2-2b | INT8-CW | 1025 | 680.4 | 59 | 16.95 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 1284.3 | 60.2 | 16.61 |
minicpm-1b-sft | FP16 | 1014 | 494.4 | 60.3 | 16.58 |
minicpm3-4b | INT4-MIXED | 32 | 121.2 | 60.3 | 16.58 |
phi-2 | INT8-CW | 1024 | 889 | 60.4 | 16.56 |
phi-3.5-mini-instruct | INT4-MIXED | 1024 | 1397.6 | 60.4 | 16.56 |
stablelm-3b-4e1t | INT8-CW | 1024 | 949.6 | 60.9 | 16.42 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 961.6 | 61 | 16.39 |
glm-edge-1.5b-chat | FP16 | 1024 | 606.8 | 61.1 | 16.37 |
qwen2.5-coder-3b-instruct | INT8-CW | 32 | 112 | 61.6 | 16.23 |
deepseek-r1-distill-qwen-1.5b | FP16 | 32 | 74.6 | 61.9 | 16.16 |
qwen2.5-1.5b-instruct | FP16 | 32 | 75.3 | 61.9 | 16.16 |
qwen2.5-coder-1.5b-instruct | FP16 | 32 | 74.9 | 62 | 16.13 |
minicpm3-4b | INT4-MIXED | 32 | 131.1 | 62.4 | 16.03 |
minicpm3-4b | INT4-MIXED | 32 | 134.2 | 63 | 15.87 |
qwen2.5-1.5b-instruct | FP16 | 1024 | 452.8 | 63.2 | 15.82 |
qwen2.5-coder-1.5b-instruct | FP16 | 1024 | 455.1 | 63.2 | 15.82 |
deepseek-r1-distill-qwen-1.5b | FP16 | 1024 | 455.1 | 63.3 | 15.80 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 1062.4 | 63.6 | 15.72 |
qwen2.5-coder-3b-instruct | INT8-CW | 1024 | 668.2 | 63.7 | 15.70 |
whisper-large-v3 | FP16 | prompt0 | 588.1 | 64.1 | 15.60 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 1271.9 | 64.3 | 15.55 |
whisper-large-v3 | FP16 | prompt1 | 657.1 | 64.4 | 15.53 |
chatglm3-6b | INT4-MIXED | 32 | 112.2 | 65.4 | 15.29 |
llama-3.2-3b-instruct | INT8-CW | 32 | 114.2 | 65.7 | 15.22 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 2567.3 | 66.4 | 15.06 |
gemma-4-e2b-it | INT4-MIXED | 1024 | 1346.4 | 66.5 | 15.04 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 2829.5 | 67 | 14.93 |
chatglm3-6b | INT4-MIXED | 32 | 141.6 | 68.4 | 14.62 |
gemma-4-e2b-it | INT4-MIXED | 1024 | 1410 | 68.6 | 14.58 |
gemma-4-e2b-it | INT8-CW | 274 | 936.3 | 68.6 | 14.58 |
llama-3.2-3b-instruct | INT8-CW | 1024 | 740.9 | 68.6 | 14.58 |
chatglm3-6b | INT4-MIXED | 1024 | 1166.7 | 68.8 | 14.53 |
chatglm3-6b | INT4-MIXED | 1024 | 1265.5 | 71.3 | 14.03 |
minicpm3-4b | INT4-MIXED | 1024 | 2123 | 72.8 | 13.74 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 128.7 | 72.9 | 13.72 |
codellama-7b | INT4-MIXED | 32 | 124.7 | 73.2 | 13.66 |
minicpm3-4b | INT4-MIXED | 1024 | 2173.9 | 74.9 | 13.35 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 14047.4 | 75.4 | 13.26 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 14752.4 | 75.4 | 13.26 |
biomistral-7b-slerp | INT4-MIXED | 7 | 84.6 | 75.5 | 13.25 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 157.6 | 75.5 | 13.25 |
minicpm3-4b | INT4-MIXED | 1024 | 2302.7 | 75.5 | 13.25 |
qwen3_8b_eagle3 | INT4-MIXED | 32 | 192.1 | 75.9 | 13.18 |
falcon-7b-instruct | INT4-MIXED | 32 | 134.9 | 76.1 | 13.14 |
phi-4-multimodal-instruct | INT4-MIXED | 578 | 1310.8 | 76.1 | 13.14 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 81.5 | 76.1 | 13.14 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 135.2 | 76.2 | 13.12 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 135.7 | 76.2 | 13.12 |
zephyr-7b-beta | INT4-MIXED | 32 | 134 | 76.3 | 13.11 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 137.2 | 76.4 | 13.09 |
codellama-7b | INT4-MIXED | 32 | 158.8 | 76.5 | 13.07 |
phi-4-multimodal-instruct | INT4-MIXED | 786 | 1558.9 | 76.8 | 13.02 |
phi-3-mini-128k-instruct | INT8-CW | 32 | 142.2 | 77 | 12.99 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 14219.3 | 77 | 12.99 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 144.4 | 77.1 | 12.97 |
phi-3.5-mini-instruct | INT8-CW | 32 | 144.1 | 77.1 | 12.97 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 14906.6 | 77.1 | 12.97 |
qwen3-vl-4b-instruct | INT4-MIXED | 4937 | 15704.7 | 78 | 12.82 |
qwen3-vl-4b-instruct | INT4-MIXED | 4907 | 14768.1 | 78.1 | 12.80 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 169 | 78.7 | 12.71 |
phi-4-mini-instruct | INT8-CW | 32 | 138.5 | 78.7 | 12.71 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 168.9 | 79 | 12.66 |
phi-4-mini-reasoning | INT8-CW | 32 | 139.1 | 79 | 12.66 |
neural-chat-7b-v3-3 | INT4-MIXED | 32 | 171.3 | 79.5 | 12.58 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 32 | 173.6 | 79.6 | 12.56 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 32 | 172.5 | 79.6 | 12.56 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 32 | 173.5 | 79.7 | 12.55 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 1176.4 | 79.8 | 12.53 |
phi-4-multimodal-instruct | INT4-MIXED | 1362 | 3183 | 79.9 | 12.52 |
codellama-7b | INT4-MIXED | 1024 | 1180.7 | 80 | 12.50 |
falcon-7b-instruct | INT4-MIXED | 1024 | 1270.5 | 80 | 12.50 |
gemma-3-4b-it | INT8-CW | 32 | 152.9 | 80 | 12.50 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 151.9 | 80.1 | 12.48 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 146.9 | 80.1 | 12.48 |
phi-4-multimodal-instruct | INT4-MIXED | 1570 | 3482.3 | 80.2 | 12.47 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 1270.3 | 80.4 | 12.44 |
zephyr-7b-beta | INT4-MIXED | 1024 | 1281.7 | 80.4 | 12.44 |
stable-diffusion-xl-1.0-inpainting-0.1 | INT8-CW | 32 | 83.7 | 80.5 | 12.42 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 1261.5 | 80.7 | 12.39 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 1267 | 80.7 | 12.39 |
qwen3-4b | INT8-CW | 32 | 147.6 | 81.4 | 12.29 |
biomistral-7b-slerp | INT4-MIXED | 7 | 91.1 | 81.7 | 12.24 |
minicpm-v-2_6 | INT4-MIXED | 228 | 1203.8 | 81.7 | 12.24 |
minicpm-o-2_6 | INT4-MIXED | 238 | 1197.9 | 81.9 | 12.21 |
internvl2-4b | INT8-CW | 297 | 624 | 82 | 12.20 |
phi-4-mini-instruct | INT8-CW | 1024 | 1055.2 | 82 | 12.20 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 1282.3 | 82.3 | 12.15 |
phi-4-mini-reasoning | INT8-CW | 1024 | 1077 | 82.4 | 12.14 |
qwen2-7b-instruct | INT4-MIXED | 32 | 170.7 | 82.6 | 12.11 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 173.4 | 82.7 | 12.09 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 172.1 | 82.8 | 12.08 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 1360.8 | 82.9 | 12.06 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 1358.3 | 83.1 | 12.03 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 1228.6 | 83.1 | 12.03 |
phi-3-mini-128k-instruct | INT8-CW | 1024 | 1227.5 | 83.2 | 12.02 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 1225.6 | 83.2 | 12.02 |
llama-3-8b-instruct | INT4-MIXED | 32 | 143.7 | 83.3 | 12.00 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 147.7 | 83.3 | 12.00 |
codellama-7b | INT4-MIXED | 1024 | 1412.5 | 83.4 | 11.99 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 32 | 174.2 | 83.4 | 11.99 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 1227.1 | 83.4 | 11.99 |
phi-3.5-mini-instruct | INT8-CW | 1024 | 1214.6 | 83.4 | 11.99 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 32 | 176.1 | 83.4 | 11.99 |
mistral-7b-instruct-v0.2 | INT4-MIXED | 1024 | 1546.4 | 83.6 | 11.96 |
phi-3.5-vision-instruct | INT8-CW | 802 | 1269.2 | 83.7 | 11.95 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 143.7 | 83.8 | 11.93 |
gemma-3-4b-it | INT8-CW | 1024 | 1020 | 83.8 | 11.93 |
qwen2-7b-instruct | INT4-MIXED | 32 | 176 | 83.8 | 11.93 |
qwen2.5-7b-instruct | INT4-MIXED | 32 | 176.4 | 83.8 | 11.93 |
neural-chat-7b-v3-3 | INT4-MIXED | 1024 | 1544.8 | 83.9 | 11.92 |
mistral-7b-instruct-v0.1 | INT4-MIXED | 1025 | 1667.5 | 84 | 11.90 |
mistral-7b-instruct-v0.3 | INT4-MIXED | 1024 | 1535.1 | 84 | 11.90 |
minicpm-v-2_6 | INT4-MIXED | 228 | 1231.1 | 84.3 | 11.86 |
qwen3_8b_eagle3 | INT4-MIXED | 1024 | 1632.2 | 84.7 | 11.81 |
phi-3.5-vision-instruct | INT8-CW | 1032 | 1567.2 | 85.3 | 11.72 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 1303.6 | 85.5 | 11.70 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 1302.8 | 85.6 | 11.68 |
minicpm-v-2_6 | INT4-MIXED | 228 | 1232.2 | 85.6 | 11.68 |
minicpm-o-2_6 | INT4-MIXED | 238 | 1227.5 | 85.7 | 11.67 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 1294.4 | 85.7 | 11.67 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 32 | 175.4 | 85.9 | 11.64 |
llama-3-8b-instruct | INT4-MIXED | 32 | 178.4 | 86 | 11.63 |
qwen3-8b | INT4-MIXED | 32 | 146.8 | 86 | 11.63 |
llama-3-8b-instruct | INT4-MIXED | 32 | 175.9 | 86.1 | 11.61 |
llama-3.1-8b-instruct | INT4-MIXED | 32 | 174.7 | 86.1 | 11.61 |
qwen3-4b | INT8-CW | 1024 | 997.8 | 86.1 | 11.61 |
minicpm4-8b | INT4-MIXED | 32 | 157.6 | 86.3 | 11.59 |
deepseek-r1-distill-qwen-7b | INT4-MIXED | 1024 | 1480.4 | 86.4 | 11.57 |
qwen2.5-7b-instruct-1m | INT4-MIXED | 1024 | 1475.2 | 86.4 | 11.57 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 239.3 | 86.4 | 11.57 |
qwen2.5-7b-instruct | INT4-MIXED | 1024 | 1482.6 | 86.5 | 11.56 |
qwen2-7b-instruct | INT4-MIXED | 1024 | 1486.3 | 86.9 | 11.51 |
glm-edge-4b-chat | INT8-CW | 32 | 166.8 | 87 | 11.49 |
llama-3-8b-instruct | INT4-MIXED | 32 | 179 | 87 | 11.49 |
falcon-7b-instruct | INT4-MIXED | 32 | 186.2 | 87.6 | 11.42 |
minicpm-v-4_5 | INT4-MIXED | 217 | 1233.2 | 87.6 | 11.42 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 1274.9 | 87.7 | 11.40 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 1279.6 | 87.8 | 11.39 |
bloomz-7b1 | INT4-MIXED | 32 | 140.6 | 87.9 | 11.38 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 1305.5 | 87.9 | 11.38 |
qwen3-8b | INT4-MIXED | 32 | 184.7 | 88.8 | 11.26 |
gemma-4-e2b-it | INT8-CW | 1024 | 1452 | 89 | 11.24 |
minicpm4-8b | INT4-MIXED | 32 | 190.2 | 89.1 | 11.22 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 1400.2 | 89.3 | 11.20 |
afm-4.5b | INT8-CW | 32 | 168.6 | 89.4 | 11.19 |
internvl2-4b | INT8-CW | 1027 | 1554.7 | 89.4 | 11.19 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 257.1 | 89.4 | 11.19 |
fara-7b | INT4-MIXED | 32 | 271.7 | 89.8 | 11.14 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 32 | 258.9 | 90 | 11.11 |
minicpm4-8b | INT4-MIXED | 32 | 190 | 90.1 | 11.10 |
deepseek-r1-distill-llama-8b | INT4-MIXED | 1024 | 1377 | 90.3 | 11.07 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 1370.4 | 90.3 | 11.07 |
llama-3.1-8b-instruct | INT4-MIXED | 1024 | 1369.7 | 90.3 | 11.07 |
bloomz-7b1 | INT4-MIXED | 32 | 170.8 | 90.4 | 11.06 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 1481.2 | 90.4 | 11.06 |
minicpm4-8b | INT4-MIXED | 1024 | 1345.2 | 90.4 | 11.06 |
zephyr-7b-beta | INT4-MIXED | 32 | 181.7 | 90.6 | 11.04 |
glm-edge-4b-chat | INT8-CW | 1024 | 1488.2 | 91.1 | 10.98 |
llama-3-8b-instruct | INT4-MIXED | 1024 | 1562.5 | 91.3 | 10.95 |
minicpm-v-4_5 | INT4-MIXED | 217 | 1550.6 | 91.4 | 10.94 |
qwen3-8b | INT4-MIXED | 1024 | 1310.6 | 91.4 | 10.94 |
falcon-7b-instruct | INT4-MIXED | 1024 | 1604.6 | 91.6 | 10.92 |
qwen3-8b | INT4-MIXED | 32 | 189.4 | 91.9 | 10.88 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 1465.9 | 92.2 | 10.85 |
afm-4.5b | INT8-CW | 1024 | 986.9 | 92.8 | 10.78 |
minicpm4-8b | INT4-MIXED | 1024 | 1465.9 | 93 | 10.75 |
qwen2.5-vl-7b-instruct | INT4-MIXED | 1024 | 1640 | 93.1 | 10.74 |
fara-7b | INT4-MIXED | 1024 | 1974.8 | 93.4 | 10.71 |
qwen3-8b | INT4-MIXED | 1024 | 1395.6 | 94 | 10.64 |
bloomz-7b1 | INT4-MIXED | 1024 | 1606.9 | 94.2 | 10.62 |
minicpm4-8b | INT4-MIXED | 1024 | 1677.2 | 94.3 | 10.60 |
minicpm3-4b | INT8-CW | 32 | 181 | 94.4 | 10.59 |
qwen3.5-9b | INT4-MIXED | 83 | 334 | 94.8 | 10.55 |
zephyr-7b-beta | INT4-MIXED | 1024 | 1418.2 | 94.9 | 10.54 |
phi-4-multimodal-instruct | INT8-CW | 578 | 1249.2 | 96.4 | 10.37 |
qwen3.5-9b | INT4-MIXED | 1024 | 2018 | 96.4 | 10.37 |
phi-4-multimodal-instruct | INT8-CW | 786 | 1510.7 | 96.7 | 10.34 |
qwen3-8b | INT4-MIXED | 1024 | 1622.1 | 96.8 | 10.33 |
bloomz-7b1 | INT4-MIXED | 1024 | 1927.3 | 96.9 | 10.32 |
glm-4-9b-chat-hf | INT4-MIXED | 32 | 168.3 | 98.1 | 10.19 |
gemma-2b-it | FP16 | 32 | 115.3 | 98.9 | 10.11 |
baichuan2-7b-chat | INT4-MIXED | 32 | 192.8 | 99.3 | 10.07 |
gemma-2b-it | FP16 | 1024 | 654.6 | 99.7 | 10.03 |
phi-4-multimodal-instruct | INT8-CW | 1362 | 2977 | 100 | 10.00 |
All models listed here were tested with the following parameters:
Framework: PyTorch
Beam: 1
Batch size: 1