Most Efficient Large Language Models for AI PC#
This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2024.4, 20 Nov. 2024.
The tables below list the key performance indicators for inference on built-in GPUs.
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd tok/sec |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 833.1 | 15.6 | 3.9 | 256.4 |
opt-125m-gptq | INT4-MIXED | 1024 | 955.9 | 553.8 | 4.8 | 208.3 |
bloomz-560m | INT4-MIXED | 32 | 1457.5 | 48.5 | 11.1 | 90.1 |
qwen2-0.5b | INT4-MIXED | 32 | 1167.8 | 95.7 | 11.5 | 87.0 |
qwen2-0.5b | INT4-MIXED | 1024 | 1266 | 2330.3 | 12.7 | 78.7 |
qwen2-0.5b | INT8-CW | 32 | 1496.3 | 90.5 | 12.8 | 78.1 |
bloomz-560m | INT8-CW | 32 | 1724.2 | 84 | 13.9 | 71.9 |
qwen2-0.5b | INT8-CW | 1024 | 1593 | 2370.7 | 14 | 71.4 |
bloomz-560m | INT4-MIXED | 1024 | 1691 | 2005.3 | 15.2 | 65.8 |
qwen2-0.5b | FP16 | 32 | 2989.8 | 94.6 | 15.9 | 62.9 |
bloomz-560m | INT8-CW | 1024 | 1941 | 2343.4 | 16.1 | 62.1 |
qwen2-0.5b | FP16 | 1024 | 3088.1 | 2376.8 | 17.4 | 57.5 |
bloomz-560m | FP16 | 32 | 3857 | 86.7 | 17.5 | 57.1 |
bloomz-560m | FP16 | 1024 | 4085.6 | 2373.4 | 19.8 | 50.5 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 1738.9 | 237.4 | 20 | 50.0 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 2471.2 | 224.6 | 22.6 | 44.2 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 1929.3 | 5993 | 22.7 | 44.1 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 2661.8 | 6238.8 | 25.2 | 39.7 |
qwen2-1.5b | INT4-MIXED | 32 | 2429 | 312.8 | 28.4 | 35.2 |
tiny-llama-1.1b-chat | FP16 | 32 | 4834.9 | 231.7 | 28.9 | 34.6 |
tiny-llama-1.1b-chat | FP16 | 1024 | 5023.2 | 6191.5 | 31.7 | 31.5 |
qwen2-1.5b | INT4-MIXED | 1024 | 2600.3 | 7597.3 | 31.8 | 31.4 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 3982.1 | 348.4 | 32.1 | 31.2 |
qwen2-1.5b | INT8-CW | 32 | 3619 | 301 | 32.7 | 30.6 |
qwen2-1.5b | INT8-CW | 1024 | 3790.3 | 7990.5 | 34.6 | 28.9 |
stablelm-3b-4e1t | INT4-MIXED | 1023 | 4455.4 | 11963.2 | 39.2 | 25.5 |
minicpm-1b-sft | INT4-MIXED | 31 | 5815.4 | 214.3 | 40.1 | 24.9 |
qwen2-1.5b | FP16 | 32 | 7582.3 | 304.4 | 42.2 | 23.7 |
minicpm-1b-sft | INT8-CW | 31 | 6609.6 | 210.6 | 43.3 | 23.1 |
qwen2-1.5b | FP16 | 1024 | 7753.4 | 7915.3 | 44.2 | 22.6 |
gemma-2b-it | INT4-MIXED | 32 | 3728.2 | 523 | 46.2 | 21.6 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 3689.3 | 656.5 | 47.4 | 21.1 |
gemma-2b-it | INT4-MIXED | 1024 | 4207.3 | 11867.9 | 47.5 | 21.1 |
minicpm-1b-sft | FP16 | 31 | 8999.8 | 222.2 | 49.1 | 20.4 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 3448.1 | 1028.9 | 49.6 | 20.2 |
dolly-v2-3b | INT4-MIXED | 32 | 3448.4 | 714.8 | 49.9 | 20.0 |
gemma-2b-it | INT8-CW | 32 | 5423.2 | 488.8 | 51 | 19.6 |
gemma-2b-it | INT8-CW | 1024 | 5902.7 | 12434.4 | 52.3 | 19.1 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 5630.3 | 694.5 | 54.4 | 18.4 |
phi-2 | INT4-MIXED | 32 | 3732.9 | 723.2 | 54.5 | 18.3 |
phi-2 | INT8-CW | 32 | 5600.4 | 747 | 55.7 | 18.0 |
dolly-v2-3b | INT8-CW | 32 | 5589.7 | 1009.8 | 55.9 | 17.9 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 5590.1 | 698.9 | 55.9 | 17.9 |
stablelm-3b-4e1t | INT8-CW | 32 | 5630.1 | 660.7 | 56.1 | 17.8 |
dolly-v2-3b | INT4-MIXED | 1024 | 3984.5 | 15502.8 | 56.5 | 17.7 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1023 | 3915.6 | 15363.9 | 56.6 | 17.7 |
llama-2-7b-gptq | INT4-MIXED | 32 | 8618.5 | 782.9 | 56.9 | 17.6 |
phi-2 | INT4-MIXED | 1024 | 4251.3 | 15317 | 61 | 16.4 |
phi-2 | INT8-CW | 1024 | 6119.4 | 15886.6 | 62 | 16.1 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1023 | 6056.9 | 15984.9 | 62.2 | 16.1 |
dolly-v2-3b | INT8-CW | 1024 | 6124.9 | 16099.7 | 62.5 | 16.0 |
stablelm-3b-4e1t | INT8-CW | 1023 | 6097.1 | 16206.9 | 62.5 | 16.0 |
gemma-2b-it | FP16 | 32 | 12208.2 | 501.4 | 65.5 | 15.3 |
llama-3-8b | INT4-MIXED | 33 | 8741.2 | 869 | 65.7 | 15.2 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 9468.1 | 26350.7 | 66.1 | 15.1 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 8561 | 773.7 | 67 | 14.9 |
gemma-2b-it | FP16 | 1024 | 12687.8 | 12168.7 | 67.1 | 14.9 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 8588.7 | 1020.6 | 67.4 | 14.8 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 8626.8 | 1100 | 69.4 | 14.4 |
phi-2 | FP16 | 32 | 11385.9 | 693.8 | 70.2 | 14.2 |
dolly-v2-3b | FP16 | 32 | 11359 | 688.5 | 70.5 | 14.2 |
stable-zephyr-3b-dpo | FP16 | 32 | 11432.9 | 648.5 | 70.6 | 14.2 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 11364 | 692.4 | 70.7 | 14.1 |
stablelm-3b-4e1t | FP16 | 32 | 11432.6 | 649 | 71.1 | 14.1 |
llama-3-8b | INT4-MIXED | 1025 | 9254.8 | 29700.3 | 71.9 | 13.9 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 9121.9 | 29492.9 | 73.3 | 13.6 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 7646.1 | 952.6 | 75.7 | 13.2 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 10458.7 | 29022.2 | 75.9 | 13.2 |
zephyr-7b-beta | INT4-MIXED | 32 | 9217.5 | 1196.6 | 76.2 | 13.1 |
phi-2 | FP16 | 1024 | 11902.2 | 15868 | 77 | 13.0 |
dolly-v2-3b | FP16 | 1024 | 11892.5 | 15987.1 | 77.1 | 13.0 |
baichuan2-7b-chat | INT4-MIXED | 32 | 9440.3 | 1118.1 | 77.3 | 12.9 |
red-pajama-incite-chat-3b-v1 | FP16 | 1023 | 11829.1 | 16008.7 | 77.3 | 12.9 |
stablelm-3b-4e1t | FP16 | 1023 | 11897.5 | 16030 | 77.7 | 12.9 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 4961.9 | 968.8 | 78.2 | 12.8 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 9478.1 | 28958.6 | 78.6 | 12.7 |
zephyr-7b-beta | INT4-MIXED | 1024 | 9764.2 | 30982 | 82.3 | 12.2 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 8255.7 | 23200.5 | 83.1 | 12.0 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 5570.2 | 22277.1 | 85.7 | 11.7 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 10305.2 | 29010 | 86.4 | 11.6 |
phi-3-mini-4k-instruct | FP16 | 32 | 15292.6 | 934.7 | 96.4 | 10.4 |
qwen-7b-chat | INT4-MIXED | 32 | 10964.7 | 1413 | 97.8 | 10.2 |
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd tok/sec |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 1150.2 | 35.1 | 8.2 | 122.0 |
opt-125m-gptq | INT4-MIXED | 1024 | 1228 | 67 | 8.2 | 122.0 |
qwen2-0.5b | INT4-MIXED | 1024 | 1596.2 | 83.6 | 14.4 | 69.4 |
qwen2-0.5b | INT4-MIXED | 32 | 1675.6 | 63.6 | 14.9 | 67.1 |
qwen2-0.5b | INT8-CW | 32 | 1857.5 | 56.9 | 15 | 66.7 |
qwen2-0.5b | INT8-CW | 1024 | 1663.5 | 87 | 15 | 66.7 |
bloomz-560m | INT8-CW | 32 | 1761.1 | 62.4 | 15.1 | 66.2 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 1687.9 | 158.7 | 15.3 | 65.4 |
bloomz-560m | INT4-MIXED | 32 | 1894.2 | 40.1 | 15.4 | 64.9 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 1833 | 74.5 | 15.7 | 63.7 |
bloomz-560m | INT8-CW | 1024 | 1689.2 | 146.2 | 15.8 | 63.3 |
bloomz-560m | INT4-MIXED | 1024 | 1791 | 150.1 | 16.4 | 61.0 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 2132.3 | 35.6 | 18.1 | 55.2 |
bloomz-560m | FP16 | 32 | 2395 | 36 | 18.4 | 54.3 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 1986.4 | 149.3 | 19.2 | 52.1 |
bloomz-560m | FP16 | 1024 | 2344.4 | 157.4 | 19.3 | 51.8 |
qwen2-1.5b | INT4-MIXED | 1024 | 2175.1 | 184.9 | 20.4 | 49.0 |
qwen2-1.5b | INT4-MIXED | 32 | 2066.2 | 94.9 | 20.6 | 48.5 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 2599.8 | 118.1 | 25 | 40.0 |
qwen2-1.5b | INT8-CW | 32 | 2377.4 | 83.3 | 25.1 | 39.8 |
qwen2-1.5b | INT8-CW | 1024 | 2483.3 | 189.6 | 25.3 | 39.5 |
gemma-2b-it | INT4-MIXED | 32 | 2594.3 | 181.4 | 26.1 | 38.3 |
phi-2 | INT4-MIXED | 32 | 2912.4 | 77.7 | 26.8 | 37.3 |
gemma-2b-it | INT4-MIXED | 1024 | 2594.4 | 248.2 | 26.9 | 37.2 |
dolly-v2-3b | INT4-MIXED | 32 | 2610.3 | 141.3 | 27 | 37.0 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 2956.2 | 149.2 | 27.4 | 36.5 |
minicpm-1b-sft | INT4-MIXED | 31 | 2625.8 | 159.2 | 28.1 | 35.6 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1023 | 3069.7 | 413.5 | 28.2 | 35.5 |
minicpm-1b-sft | INT8-CW | 31 | 2868.2 | 74.1 | 28.9 | 34.6 |
dolly-v2-3b | INT4-MIXED | 1024 | 3081.5 | 386 | 29.4 | 34.0 |
phi-2 | INT4-MIXED | 1024 | 3136.2 | 340 | 29.6 | 33.8 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 3035.9 | 150.5 | 30.6 | 32.7 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 3373.2 | 57.9 | 32.6 | 30.7 |
stablelm-3b-4e1t | INT4-MIXED | 1023 | 3296.5 | 456.2 | 34.4 | 29.1 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 3707.1 | 432 | 36.1 | 27.7 |
gemma-2b-it | INT8-CW | 32 | 3370.5 | 203.8 | 36.6 | 27.3 |
minicpm-1b-sft | FP16 | 31 | 3679.6 | 80.6 | 36.9 | 27.1 |
gemma-2b-it | INT8-CW | 1024 | 3503.2 | 258.5 | 37.9 | 26.4 |
dolly-v2-3b | INT8-CW | 32 | 3893.3 | 142.9 | 39.4 | 25.4 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 3760.7 | 117.2 | 39.4 | 25.4 |
phi-2 | INT8-CW | 32 | 3765.6 | 121 | 39.7 | 25.2 |
stablelm-3b-4e1t | INT8-CW | 32 | 3641.2 | 123 | 39.9 | 25.1 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 3743.3 | 120.1 | 39.9 | 25.1 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1023 | 4083.1 | 422.9 | 41.9 | 23.9 |
dolly-v2-3b | INT8-CW | 1024 | 4211.5 | 384.1 | 42.2 | 23.7 |
phi-2 | INT8-CW | 1024 | 4096.8 | 367.2 | 42.5 | 23.5 |
stablelm-3b-4e1t | INT8-CW | 1023 | 4086.6 | 459.9 | 43.5 | 23.0 |
llama-2-7b-gptq | INT4-MIXED | 32 | 4754.8 | 75.1 | 46.2 | 21.6 |
codegen25-7b | INT4-MIXED | 32 | 4738.5 | 74.9 | 46.9 | 21.3 |
gpt-j-6b | INT4-MIXED | 32 | 4506.5 | 221.4 | 47.3 | 21.1 |
decilm-7b-instruct | INT4-MIXED | 36 | 4794.9 | 199.3 | 48.5 | 20.6 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 5615.8 | 100.5 | 49.8 | 20.1 |
falcon-7b-instruct | INT4-MIXED | 32 | 4738 | 79.9 | 50.7 | 19.7 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 4589.9 | 83 | 50.8 | 19.7 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 5246 | 640 | 52.1 | 19.2 |
llama-3-8b | INT4-MIXED | 33 | 5475.8 | 114.7 | 52.2 | 19.2 |
codegen25-7b | INT4-MIXED | 1024 | 5241.9 | 643.7 | 52.5 | 19.0 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 5015.3 | 94.6 | 52.6 | 19.0 |
qwen2-7b | INT4-MIXED | 32 | 5330.7 | 86.3 | 52.7 | 19.0 |
gpt-j-6b | INT4-MIXED | 1024 | 4926.5 | 867.2 | 53.2 | 18.8 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 5100.7 | 78.7 | 54.2 | 18.5 |
llama-3-8b | INT4-MIXED | 33 | 5527.1 | 114.9 | 54.3 | 18.4 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 4959.2 | 450.6 | 54.6 | 18.3 |
falcon-7b-instruct | INT4-MIXED | 1024 | 4863.4 | 660.5 | 54.9 | 18.2 |
qwen2-7b | INT4-MIXED | 1024 | 5375.4 | 659.8 | 55.4 | 18.1 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 5286.8 | 662.8 | 55.6 | 18.0 |
llama-3-8b | INT4-MIXED | 1025 | 5601 | 992.5 | 56.1 | 17.8 |
llama-3-8b | INT4-MIXED | 1025 | 5646.8 | 1047.1 | 56.7 | 17.6 |
baichuan2-7b-chat | INT4-MIXED | 32 | 5913.7 | 86.5 | 57.2 | 17.5 |
zephyr-7b-beta | INT4-MIXED | 32 | 5339.7 | 88.5 | 58.2 | 17.2 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 6315.8 | 664.2 | 60.1 | 16.6 |
glm-4-9b-chat | INT4-MIXED | 32 | 6349.7 | 86.5 | 60.5 | 16.5 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 5592.7 | 856.8 | 60.9 | 16.4 |
zephyr-7b-beta | INT4-MIXED | 1024 | 5459.1 | 898.6 | 61.6 | 16.2 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 6410.3 | 942.2 | 63.5 | 15.7 |
gemma-7b-it | INT4-MIXED | 32 | 5816.3 | 104.5 | 63.5 | 15.7 |
glm-4-9b-chat | INT4-MIXED | 1024 | 6368.8 | 1128.2 | 63.8 | 15.7 |
llama-3.1-8b | INT4-MIXED | 32 | 6315.3 | 97.4 | 65 | 15.4 |
llama-3.1-8b | INT4-MIXED | 1024 | 6421.8 | 902.9 | 68.2 | 14.7 |
gemma-7b-it | INT4-MIXED | 1024 | 6233.2 | 1052.7 | 68.7 | 14.6 |
qwen-7b-chat | INT4-MIXED | 32 | 7320.5 | 132.3 | 68.8 | 14.5 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 6318.9 | 79.2 | 70.7 | 14.1 |
phi-2 | FP16 | 32 | 6330.2 | 83.2 | 70.8 | 14.1 |
dolly-v2-3b | FP16 | 32 | 6327.2 | 92.7 | 71.9 | 13.9 |
stable-zephyr-3b-dpo | FP16 | 32 | 6356.4 | 79.8 | 72.2 | 13.9 |
stablelm-3b-4e1t | FP16 | 32 | 6261.9 | 74.6 | 72.6 | 13.8 |
phi-2 | FP16 | 1024 | 6654.4 | 379.3 | 73.9 | 13.5 |
red-pajama-incite-chat-3b-v1 | FP16 | 1023 | 6640.3 | 442.6 | 74.4 | 13.4 |
dolly-v2-3b | FP16 | 1024 | 6653.9 | 441.9 | 74.9 | 13.4 |
qwen-7b-chat | INT4-MIXED | 1024 | 7814.1 | 909.4 | 75.5 | 13.2 |
stablelm-3b-4e1t | FP16 | 1023 | 6575.3 | 449.5 | 75.8 | 13.2 |
falcon-7b-instruct | INT8-CW | 32 | 7487.6 | 109.4 | 84.3 | 11.9 |
gpt-j-6b | INT8-CW | 32 | 6918.7 | 185.3 | 85.3 | 11.7 |
llama-2-7b-chat-hf | INT8-CW | 32 | 7494.7 | 110.6 | 87.9 | 11.4 |
qwen2-7b | INT8-CW | 32 | 8177.7 | 117.8 | 88.2 | 11.3 |
falcon-7b-instruct | INT8-CW | 1024 | 7621.2 | 675.4 | 88.3 | 11.3 |
codegen25-7b | INT8-CW | 32 | 7582.1 | 114.6 | 89 | 11.2 |
qwen2-7b | INT8-CW | 1024 | 8226.2 | 842 | 90.4 | 11.1 |
gpt-j-6b | INT8-CW | 1024 | 7353.1 | 1093.9 | 90.8 | 11.0 |
phi-3-medium-4k-instruct | INT4-MIXED | 38 | 8184.1 | 270.2 | 90.8 | 11.0 |
qwen-7b-chat | INT8-CW | 32 | 9223.8 | 138.4 | 91.3 | 11.0 |
baichuan2-7b-chat | INT8-CW | 32 | 8188.4 | 122.9 | 91.8 | 10.9 |
phi-3-mini-4k-instruct | FP16 | 32 | 8311.5 | 98.2 | 92 | 10.9 |
llama-2-7b-chat-hf | INT8-CW | 1024 | 7984.3 | 874.9 | 92.8 | 10.8 |
mistral-7b-v0.1 | INT8-CW | 32 | 7908.6 | 116.3 | 93.1 | 10.7 |
baichuan2-13b-chat | INT4-MIXED | 32 | 10016.5 | 165.7 | 93.2 | 10.7 |
zephyr-7b-beta | INT8-CW | 32 | 7812.6 | 117 | 93.4 | 10.7 |
codegen25-7b | INT8-CW | 1024 | 8074.3 | 870.2 | 94 | 10.6 |
decilm-7b-instruct | INT8-CW | 36 | 7885.2 | 181.4 | 94.9 | 10.5 |
mistral-7b-v0.1 | INT8-CW | 1024 | 8023.7 | 906.4 | 95.7 | 10.4 |
zephyr-7b-beta | INT8-CW | 1024 | 7930.8 | 915.2 | 96.3 | 10.4 |
phi-3-medium-4k-instruct | INT4-MIXED | 1061 | 8384.5 | 2225.7 | 96.7 | 10.3 |
baichuan2-7b-chat | INT8-CW | 1024 | 8678.3 | 956.7 | 96.8 | 10.3 |
llama-3.1-8b | INT8-CW | 32 | 8615.4 | 121.6 | 97.7 | 10.2 |
llama-3-8b | INT8-CW | 33 | 8615.1 | 131.3 | 97.7 | 10.2 |
phi-3-mini-4k-instruct | FP16 | 1024 | 8695.2 | 509 | 99.9 | 10.0 |
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd tok/sec |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 1116 | 25.8 | 8.1 | 123.5 |
opt-125m-gptq | INT4-MIXED | 1024 | 1187.1 | 75.2 | 8.2 | 122.0 |
qwen2-0.5b | INT4-MIXED | 32 | 1587.4 | 45.1 | 15.4 | 64.9 |
qwen2-0.5b | INT4-MIXED | 1024 | 1587.8 | 228.2 | 15.6 | 64.1 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 1704.2 | 42.4 | 17.6 | 56.8 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 1616.3 | 489.2 | 18.9 | 52.9 |
qwen2-0.5b | INT8-CW | 32 | 1477.3 | 51.5 | 20.2 | 49.5 |
qwen2-0.5b | INT8-CW | 1024 | 1592 | 263.7 | 20.6 | 48.5 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 1855.6 | 60.2 | 20.7 | 48.3 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 1992.6 | 618.2 | 21.7 | 46.1 |
qwen2-1.5b | INT4-MIXED | 32 | 2024.2 | 59.6 | 23.1 | 43.3 |
bloomz-560m | FP16 | 1024 | 2773.1 | 647.8 | 23.8 | 42.0 |
qwen2-1.5b | INT4-MIXED | 1024 | 2177.7 | 577.4 | 23.8 | 42.0 |
bloomz-560m | FP16 | 32 | 2582.7 | 44.2 | 25.1 | 39.8 |
dolly-v2-3b | INT4-MIXED | 32 | 2507.9 | 79.8 | 29.4 | 34.0 |
phi-2 | INT4-MIXED | 32 | 2568.9 | 74.6 | 29.7 | 33.7 |
qwen2-1.5b | INT8-CW | 32 | 2577.3 | 81.6 | 30.5 | 32.8 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 2489.4 | 69.9 | 30.5 | 32.8 |
minicpm-1b-sft | INT4-MIXED | 31 | 2442.1 | 84.7 | 31 | 32.3 |
qwen2-1.5b | INT8-CW | 1024 | 2739.8 | 773.3 | 31.2 | 32.1 |
gemma-2b-it | INT4-MIXED | 32 | 2998.2 | 103.5 | 31.4 | 31.8 |
dolly-v2-3b | INT4-MIXED | 1024 | 2508.1 | 1396.6 | 32 | 31.3 |
gemma-2b-it | INT4-MIXED | 1024 | 3171.5 | 822.3 | 32.2 | 31.1 |
phi-2 | INT4-MIXED | 1024 | 2940.5 | 1395.3 | 32.2 | 31.1 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1023 | 2489.6 | 1435.5 | 33.1 | 30.2 |
minicpm-1b-sft | INT8-CW | 31 | 2818.6 | 86.9 | 33.4 | 29.9 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 2638.2 | 87.4 | 33.8 | 29.6 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 2750.5 | 89.4 | 35.6 | 28.1 |
stablelm-3b-4e1t | INT4-MIXED | 1023 | 3115.5 | 1473.1 | 38.1 | 26.2 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 3039.1 | 109.2 | 40.4 | 24.8 |
phi-2 | INT8-CW | 32 | 3599.7 | 107.5 | 42.1 | 23.8 |
gemma-2b-it | INT8-CW | 32 | 3845.4 | 111.3 | 42.2 | 23.7 |
dolly-v2-3b | INT8-CW | 32 | 3596.4 | 110.1 | 42.5 | 23.5 |
gemma-2b-it | INT8-CW | 1024 | 3844.6 | 1183 | 43 | 23.3 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 3590 | 111 | 43.3 | 23.1 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 3467.6 | 1721.6 | 43.5 | 23.0 |
stablelm-3b-4e1t | INT8-CW | 32 | 3582.8 | 111 | 44.3 | 22.6 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 3607.2 | 110.2 | 44.5 | 22.5 |
phi-2 | INT8-CW | 1024 | 3982 | 1508 | 44.6 | 22.4 |
dolly-v2-3b | INT8-CW | 1024 | 3596.5 | 1529.1 | 44.9 | 22.3 |
minicpm-1b-sft | FP16 | 31 | 3769.9 | 84 | 45.4 | 22.0 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1023 | 3952 | 2064.5 | 45.7 | 21.9 |
stablelm-3b-4e1t | INT8-CW | 1023 | 3934.5 | 2286.3 | 46.8 | 21.4 |
gpt-j-6b | INT4-MIXED | 32 | 4443.5 | 159.3 | 56.7 | 17.6 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 4545 | 117.1 | 57.6 | 17.4 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 4810.4 | 2068.8 | 60.5 | 16.5 |
gpt-j-6b | INT4-MIXED | 1024 | 4746.4 | 2397 | 60.6 | 16.5 |
falcon-7b-instruct | INT4-MIXED | 32 | 5014 | 203.7 | 61.3 | 16.3 |
qwen2-7b | INT4-MIXED | 32 | 5269.4 | 203.8 | 62.3 | 16.1 |
codegen25-7b | INT4-MIXED | 32 | 4641.1 | 170.6 | 63.5 | 15.7 |
llama-2-7b-gptq | INT4-MIXED | 32 | 4597.3 | 172.1 | 63.5 | 15.7 |
falcon-7b-instruct | INT4-MIXED | 1024 | 5230.6 | 2695.3 | 63.6 | 15.7 |
qwen2-7b | INT4-MIXED | 1024 | 5370.8 | 2505.9 | 63.9 | 15.6 |
decilm-7b-instruct | INT4-MIXED | 36 | 4614.2 | 301.1 | 65.3 | 15.3 |
codegen25-7b | INT4-MIXED | 1024 | 4641.9 | 2629.6 | 67.4 | 14.8 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 4928.1 | 2584.3 | 67.6 | 14.8 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 4928.5 | 180.9 | 69.2 | 14.5 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 4985.7 | 160.3 | 69.5 | 14.4 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 5426.7 | 188.3 | 69.5 | 14.4 |
llama-3-8b | INT4-MIXED | 33 | 5473.4 | 285.7 | 70 | 14.3 |
flan-t5-xxl | INT4-MIXED | 33 | 19293.8 | 211.7 | 70.1 | 14.3 |
llama-3-8b | INT4-MIXED | 33 | 5389.2 | 281 | 70.8 | 14.1 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 5225.4 | 2713.3 | 71.8 | 13.9 |
zephyr-7b-beta | INT4-MIXED | 32 | 5306.1 | 177.9 | 72.1 | 13.9 |
llama-3-8b | INT4-MIXED | 1025 | 5615.2 | 2937.8 | 72.4 | 13.8 |
llama-3-8b | INT4-MIXED | 1025 | 5531.7 | 2815.4 | 73.2 | 13.7 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 5319.5 | 2736.2 | 73.6 | 13.6 |
phi-2 | FP16 | 32 | 6197 | 104.6 | 74.7 | 13.4 |
zephyr-7b-beta | INT4-MIXED | 1024 | 5306.4 | 2802.3 | 74.7 | 13.4 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 5934.9 | 2606.9 | 75 | 13.3 |
dolly-v2-3b | FP16 | 32 | 6195.1 | 105.3 | 75.3 | 13.3 |
baichuan2-7b-chat | INT4-MIXED | 32 | 5837.9 | 188.5 | 76.8 | 13.0 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 6178.6 | 118 | 76.8 | 13.0 |
gemma-7b-it | INT4-MIXED | 32 | 6495.9 | 230.6 | 77 | 13.0 |
stablelm-3b-4e1t | FP16 | 32 | 6174.2 | 105.9 | 77.1 | 13.0 |
stable-zephyr-3b-dpo | FP16 | 32 | 6217.8 | 107.9 | 77.2 | 13.0 |
glm-4-9b-chat | INT4-MIXED | 32 | 6333.4 | 225 | 77.3 | 12.9 |
phi-2 | FP16 | 1024 | 6411.5 | 2065.2 | 77.3 | 12.9 |
dolly-v2-3b | FP16 | 1024 | 6410.1 | 2075 | 77.7 | 12.9 |
llama-3.1-8b | INT4-MIXED | 32 | 6324.6 | 182.2 | 78.8 | 12.7 |
red-pajama-incite-chat-3b-v1 | FP16 | 1023 | 6394.2 | 2752.4 | 79.2 | 12.6 |
stablelm-3b-4e1t | FP16 | 1023 | 6386.9 | 2953.3 | 79.5 | 12.6 |
glm-4-9b-chat | INT4-MIXED | 1024 | 6439.5 | 3282.2 | 80 | 12.5 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 6174.1 | 2752.6 | 80.6 | 12.4 |
gemma-7b-it | INT4-MIXED | 1024 | 6795.4 | 3118.3 | 80.6 | 12.4 |
llama-3.1-8b | INT4-MIXED | 1024 | 6324.8 | 2865.7 | 81.3 | 12.3 |
gpt-j-6b | INT8-CW | 32 | 6793.2 | 167.6 | 85 | 11.8 |
qwen-7b-chat | INT4-MIXED | 32 | 7274.8 | 168.8 | 85.2 | 11.7 |
gpt-j-6b | INT8-CW | 1024 | 6793.3 | 2668.4 | 88.8 | 11.3 |
qwen-7b-chat | INT4-MIXED | 1024 | 7610.3 | 2991.9 | 90.6 | 11.0 |
flan-t5-xxl | INT4-MIXED | 1139 | 23514 | 540.8 | 94.9 | 10.5 |
falcon-7b-instruct | INT8-CW | 32 | 7764.1 | 181.3 | 95.5 | 10.5 |
llama-2-7b-chat-hf | INT8-CW | 32 | 7330.9 | 172 | 96.1 | 10.4 |
falcon-7b-instruct | INT8-CW | 1024 | 7987.4 | 3072.8 | 98.1 | 10.2 |
qwen2-7b | INT8-CW | 32 | 8175.3 | 211.3 | 99.6 | 10.0 |
All models listed here were tested with the following parameters:
Framework: PyTorch
Beam: 1
Batch size: 1