Most Efficient Large Language Models for AI PC#
This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.0, 06 March 2025 (7-155H and 7-268V) and OpenVINO 2024.6, 13 Dec. 2024 (9-288V).
The tables below list the key performance indicators for inference on built-in GPUs.
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd tok/sec |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 833.1 | 15.6 | 3.9 | 256.4 |
opt-125m-gptq | INT4-MIXED | 1024 | 955.9 | 553.8 | 4.8 | 208.3 |
bloomz-560m | INT4-MIXED | 32 | 1457.5 | 48.5 | 11.1 | 90.1 |
qwen2-0.5b | INT4-MIXED | 32 | 1167.8 | 95.7 | 11.5 | 87.0 |
qwen2-0.5b | INT4-MIXED | 1024 | 1266 | 2330.3 | 12.7 | 78.7 |
qwen2-0.5b | INT8-CW | 32 | 1496.3 | 90.5 | 12.8 | 78.1 |
bloomz-560m | INT8-CW | 32 | 1724.2 | 84 | 13.9 | 71.9 |
qwen2-0.5b | INT8-CW | 1024 | 1593 | 2370.7 | 14 | 71.4 |
bloomz-560m | INT4-MIXED | 1024 | 1691 | 2005.3 | 15.2 | 65.8 |
qwen2-0.5b | FP16 | 32 | 2989.8 | 94.6 | 15.9 | 62.9 |
bloomz-560m | INT8-CW | 1024 | 1941 | 2343.4 | 16.1 | 62.1 |
qwen2-0.5b | FP16 | 1024 | 3088.1 | 2376.8 | 17.4 | 57.5 |
bloomz-560m | FP16 | 32 | 3857 | 86.7 | 17.5 | 57.1 |
bloomz-560m | FP16 | 1024 | 4085.6 | 2373.4 | 19.8 | 50.5 |
tiny-llama-1.1b-chat | INT4-MIXED | 32 | 1738.9 | 237.4 | 20 | 50.0 |
tiny-llama-1.1b-chat | INT8-CW | 32 | 2471.2 | 224.6 | 22.6 | 44.2 |
tiny-llama-1.1b-chat | INT4-MIXED | 1024 | 1929.3 | 5993 | 22.7 | 44.1 |
tiny-llama-1.1b-chat | INT8-CW | 1024 | 2661.8 | 6238.8 | 25.2 | 39.7 |
qwen2-1.5b | INT4-MIXED | 32 | 2429 | 312.8 | 28.4 | 35.2 |
tiny-llama-1.1b-chat | FP16 | 32 | 4834.9 | 231.7 | 28.9 | 34.6 |
tiny-llama-1.1b-chat | FP16 | 1024 | 5023.2 | 6191.5 | 31.7 | 31.5 |
qwen2-1.5b | INT4-MIXED | 1024 | 2600.3 | 7597.3 | 31.8 | 31.4 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 3982.1 | 348.4 | 32.1 | 31.2 |
qwen2-1.5b | INT8-CW | 32 | 3619 | 301 | 32.7 | 30.6 |
qwen2-1.5b | INT8-CW | 1024 | 3790.3 | 7990.5 | 34.6 | 28.9 |
stablelm-3b-4e1t | INT4-MIXED | 1023 | 4455.4 | 11963.2 | 39.2 | 25.5 |
minicpm-1b-sft | INT4-MIXED | 31 | 5815.4 | 214.3 | 40.1 | 24.9 |
qwen2-1.5b | FP16 | 32 | 7582.3 | 304.4 | 42.2 | 23.7 |
minicpm-1b-sft | INT8-CW | 31 | 6609.6 | 210.6 | 43.3 | 23.1 |
qwen2-1.5b | FP16 | 1024 | 7753.4 | 7915.3 | 44.2 | 22.6 |
gemma-2b-it | INT4-MIXED | 32 | 3728.2 | 523 | 46.2 | 21.6 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 3689.3 | 656.5 | 47.4 | 21.1 |
gemma-2b-it | INT4-MIXED | 1024 | 4207.3 | 11867.9 | 47.5 | 21.1 |
minicpm-1b-sft | FP16 | 31 | 8999.8 | 222.2 | 49.1 | 20.4 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 3448.1 | 1028.9 | 49.6 | 20.2 |
dolly-v2-3b | INT4-MIXED | 32 | 3448.4 | 714.8 | 49.9 | 20.0 |
gemma-2b-it | INT8-CW | 32 | 5423.2 | 488.8 | 51 | 19.6 |
gemma-2b-it | INT8-CW | 1024 | 5902.7 | 12434.4 | 52.3 | 19.1 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 5630.3 | 694.5 | 54.4 | 18.4 |
phi-2 | INT4-MIXED | 32 | 3732.9 | 723.2 | 54.5 | 18.3 |
phi-2 | INT8-CW | 32 | 5600.4 | 747 | 55.7 | 18.0 |
dolly-v2-3b | INT8-CW | 32 | 5589.7 | 1009.8 | 55.9 | 17.9 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 5590.1 | 698.9 | 55.9 | 17.9 |
stablelm-3b-4e1t | INT8-CW | 32 | 5630.1 | 660.7 | 56.1 | 17.8 |
dolly-v2-3b | INT4-MIXED | 1024 | 3984.5 | 15502.8 | 56.5 | 17.7 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1023 | 3915.6 | 15363.9 | 56.6 | 17.7 |
llama-2-7b-gptq | INT4-MIXED | 32 | 8618.5 | 782.9 | 56.9 | 17.6 |
phi-2 | INT4-MIXED | 1024 | 4251.3 | 15317 | 61 | 16.4 |
phi-2 | INT8-CW | 1024 | 6119.4 | 15886.6 | 62 | 16.1 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1023 | 6056.9 | 15984.9 | 62.2 | 16.1 |
dolly-v2-3b | INT8-CW | 1024 | 6124.9 | 16099.7 | 62.5 | 16.0 |
stablelm-3b-4e1t | INT8-CW | 1023 | 6097.1 | 16206.9 | 62.5 | 16.0 |
gemma-2b-it | FP16 | 32 | 12208.2 | 501.4 | 65.5 | 15.3 |
llama-3-8b | INT4-MIXED | 33 | 8741.2 | 869 | 65.7 | 15.2 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 9468.1 | 26350.7 | 66.1 | 15.1 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 8561 | 773.7 | 67 | 14.9 |
gemma-2b-it | FP16 | 1024 | 12687.8 | 12168.7 | 67.1 | 14.9 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 8588.7 | 1020.6 | 67.4 | 14.8 |
llama-2-7b-chat-hf | INT4-MIXED | 32 | 8626.8 | 1100 | 69.4 | 14.4 |
phi-2 | FP16 | 32 | 11385.9 | 693.8 | 70.2 | 14.2 |
dolly-v2-3b | FP16 | 32 | 11359 | 688.5 | 70.5 | 14.2 |
stable-zephyr-3b-dpo | FP16 | 32 | 11432.9 | 648.5 | 70.6 | 14.2 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 11364 | 692.4 | 70.7 | 14.1 |
stablelm-3b-4e1t | FP16 | 32 | 11432.6 | 649 | 71.1 | 14.1 |
llama-3-8b | INT4-MIXED | 1025 | 9254.8 | 29700.3 | 71.9 | 13.9 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 9121.9 | 29492.9 | 73.3 | 13.6 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 7646.1 | 952.6 | 75.7 | 13.2 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 10458.7 | 29022.2 | 75.9 | 13.2 |
zephyr-7b-beta | INT4-MIXED | 32 | 9217.5 | 1196.6 | 76.2 | 13.1 |
phi-2 | FP16 | 1024 | 11902.2 | 15868 | 77 | 13.0 |
dolly-v2-3b | FP16 | 1024 | 11892.5 | 15987.1 | 77.1 | 13.0 |
baichuan2-7b-chat | INT4-MIXED | 32 | 9440.3 | 1118.1 | 77.3 | 12.9 |
red-pajama-incite-chat-3b-v1 | FP16 | 1023 | 11829.1 | 16008.7 | 77.3 | 12.9 |
stablelm-3b-4e1t | FP16 | 1023 | 11897.5 | 16030 | 77.7 | 12.9 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 4961.9 | 968.8 | 78.2 | 12.8 |
llama-2-7b-chat-hf | INT4-MIXED | 1024 | 9478.1 | 28958.6 | 78.6 | 12.7 |
zephyr-7b-beta | INT4-MIXED | 1024 | 9764.2 | 30982 | 82.3 | 12.2 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 8255.7 | 23200.5 | 83.1 | 12.0 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 5570.2 | 22277.1 | 85.7 | 11.7 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 10305.2 | 29010 | 86.4 | 11.6 |
phi-3-mini-4k-instruct | FP16 | 32 | 15292.6 | 934.7 | 96.4 | 10.4 |
qwen-7b-chat | INT4-MIXED | 32 | 10964.7 | 1413 | 97.8 | 10.2 |
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec (2nd lat^(-1)) | |||
---|---|---|---|---|---|---|---|---|---|
tiny-llama-1.1b-chat | INT4 | 32 | 2176.5 | 31.7 | 9.6 | 104.1666667 | |||
tiny-llama-1.1b-chat | INT4 | 1024 | 2261 | 132.4 | 10.2 | 98.03921569 | |||
bloomz-560m | INT4 | 1024 | 2103 | 67.4 | 10.3 | 97.08737864 | |||
bloomz-560m | INT4 | 32 | 1880.6 | 33.7 | 10.5 | 95.23809524 | |||
qwen2-0.5b | INT4 | 1024 | 1679.5 | 63.2 | 10.8 | 92.59259259 | |||
qwen2-0.5b | INT4 | 32 | 1577.1 | 36.3 | 10.9 | 91.74311927 | |||
bloomz-560m | INT8 | 32 | 2015.6 | 30.3 | 10.9 | 91.74311927 | |||
qwen2-0.5b | INT8 | 32 | 1869.8 | 31.7 | 11 | 90.90909091 | |||
bloomz-560m | INT8 | 1024 | 2230.8 | 67.3 | 11.4 | 87.71929825 | |||
qwen2-0.5b | INT8 | 1024 | 1951.1 | 68 | 11.9 | 84.03361345 | |||
tiny-llama-1.1b-chat | INT8 | 32 | 2687.2 | 28.6 | 12.9 | 77.51937984 | |||
qwen2-1.5b | INT4 | 1024 | 2368.7 | 167.6 | 13.5 | 74.07407407 | |||
tiny-llama-1.1b-chat | INT8 | 1024 | 2530.6 | 127.8 | 13.7 | 72.99270073 | |||
qwen2-1.5b | INT4 | 32 | 2480.4 | 43.1 | 13.9 | 71.94244604 | |||
bloomz-560m | FP16 | 32 | 2654.3 | 29.6 | 14.5 | 68.96551724 | |||
bloomz-560m | FP16 | 1024 | 2880.8 | 75.8 | 15.8 | 63.29113924 | |||
qwen2-1.5b | INT8 | 32 | 2994.7 | 37.7 | 18.9 | 52.91005291 | |||
red-pajama-incite-chat-3b-v1 | INT4 | 32 | 3240.5 | 53.2 | 19.2 | 52.08333333 | |||
qwen2-1.5b | INT8 | 1024 | 2893.3 | 163.6 | 19.6 | 51.02040816 | |||
gemma-2b-it | INT4 | 32 | 3245 | 188.4 | 20.1 | 49.75124378 | |||
minicpm-1b-sft | INT4 | 31 | 3024.6 | 68.3 | 20.4 | 49.01960784 | |||
dolly-v2-3b | INT4 | 32 | 3301 | 66.3 | 20.4 | 49.01960784 | |||
gemma-2b-it | INT4 | 1024 | 3022.4 | 231.3 | 21.5 | 46.51162791 | |||
red-pajama-incite-chat-3b-v1 | INT4 | 1023 | 3400.6 | 397.9 | 22.1 | 45.24886878 | |||
minicpm-1b-sft | INT4 | 1014 | 2902.7 | 266.7 | 22.1 | 45.24886878 | |||
dolly-v2-3b | INT4 | 1024 | 3442.1 | 377.2 | 23.1 | 43.29004329 | |||
minicpm-1b-sft | INT8 | 31 | 3330 | 62.7 | 23.2 | 43.10344828 | |||
minicpm-1b-sft | INT8 | 1014 | 3259.1 | 211 | 24.8 | 40.32258065 | |||
phi-3-mini-4k-instruct | INT4 | 32 | 3662.9 | 44.3 | 26.3 | 38.02281369 | |||
phi-3-mini-4k-instruct | INT4 | 1024 | 4000.4 | 417.4 | 29.9 | 33.44481605 | |||
gemma-2b-it | INT8 | 32 | 3739.4 | 191.2 | 30.5 | 32.78688525 | |||
red-pajama-incite-chat-3b-v1 | INT8 | 32 | 4338.9 | 48.7 | 31.2 | 32.05128205 | |||
minicpm-1b-sft | FP16 | 31 | 4195.6 | 63.5 | 31.5 | 31.74603175 | |||
dolly-v2-3b | INT8 | 32 | 4438.1 | 63.2 | 32 | 31.25 | |||
gemma-2b-it | INT8 | 1024 | 3910.8 | 248.1 | 32 | 31.25 | |||
minicpm-1b-sft | FP16 | 1014 | 4123.4 | 229.6 | 33.3 | 30.03003003 | |||
red-pajama-incite-chat-3b-v1 | INT8 | 1023 | 4503.5 | 405.2 | 34.1 | 29.3255132 | |||
chatglm3-6b | INT4 | 32 | 4909.8 | 52.2 | 34.2 | 29.23976608 | |||
dolly-v2-3b | INT8 | 1024 | 4626 | 379.1 | 35.3 | 28.3286119 | |||
chatglm3-6b | INT4 | 1024 | 5049.9 | 638.2 | 36 | 27.77777778 | |||
llama-2-7b-gptq | INT4 | 32 | 5045.7 | 63.8 | 38.9 | 25.70694087 | |||
codegen25-7b | INT4 | 32 | 5366.8 | 66 | 39.2 | 25.51020408 | |||
decilm-7b-instruct | INT4 | 36 | 4964.3 | 123.3 | 39.9 | 25.06265664 | |||
chatglm3-6b-gptq | INT4 | 32 | 5344 | 70.6 | 40.7 | 24.57002457 | |||
decilm-7b-instruct | INT4 | 1091 | 4965 | 795.7 | 42.1 | 23.75296912 | |||
qwen-7b-chat-gptq | INT4 | 32 | 5958.7 | 68.5 | 42.7 | 23.41920375 | |||
chatglm3-6b-gptq | INT4 | 1024 | 6003.4 | 645.5 | 42.7 | 23.41920375 | |||
phi-3-mini-4k-instruct | INT8 | 32 | 5021.3 | 51.3 | 43 | 23.25581395 | |||
qwen2-7b | INT4 | 32 | 5653.2 | 69.4 | 43.8 | 22.83105023 | |||
llama-2-7b-gptq | INT4 | 1024 | 5767 | 752.6 | 43.9 | 22.77904328 | |||
codegen25-7b | INT4 | 1024 | 5757.8 | 811.5 | 44.2 | 22.62443439 | |||
falcon-7b-instruct | INT4 | 32 | 5062.6 | 65.2 | 44.2 | 22.62443439 | |||
llama-3-8b | INT4 | 33 | 5800.1 | 72.2 | 44.3 | 22.57336343 | |||
llama-3-8b | INT4 | 33 | 5867.8 | 73 | 45 | 22.22222222 | |||
mistral-7b-v0.1 | INT4 | 32 | 5750.6 | 72.9 | 45.4 | 22.02643172 | |||
qwen2-7b | INT4 | 1024 | 5654 | 762.2 | 45.7 | 21.88183807 | |||
llama-3-8b | INT4 | 1025 | 5800.7 | 743.1 | 46.3 | 21.59827214 | |||
phi-3-mini-4k-instruct | INT8 | 1024 | 5021.9 | 441.7 | 46.7 | 21.41327623 | |||
llama-2-7b-chat-hf | INT4 | 32 | 5726.6 | 68.6 | 47 | 21.27659574 | |||
llama-3-8b | INT4 | 1025 | 5868.4 | 761.1 | 47.5 | 21.05263158 | |||
mistral-7b-v0.1 | INT4 | 1024 | 5607.6 | 741.7 | 47.9 | 20.87682672 | |||
falcon-7b-instruct | INT4 | 1024 | 5063.2 | 645.9 | 48.2 | 20.74688797 | |||
qwen-7b-chat-gptq | INT4 | 1024 | 6647.1 | 757.2 | 48.6 | 20.57613169 | |||
zephyr-7b-beta | INT4 | 32 | 6088 | 73.6 | 49.6 | 20.16129032 | |||
baichuan2-7b-chat | INT4 | 32 | 6268.1 | 74.6 | 50.7 | 19.72386588 | |||
glm-4-9b-chat | INT4 | 32 | 6987.4 | 79.1 | 51.5 | 19.41747573 | |||
llama-2-7b-chat-hf | INT4 | 1024 | 6149.7 | 923.4 | 51.7 | 19.34235977 | |||
zephyr-7b-beta | INT4 | 1024 | 6016.3 | 745.6 | 51.9 | 19.26782274 | |||
gemma-7b-it | INT4 | 32 | 6418.3 | 89.4 | 53.1 | 18.83239171 | |||
baichuan2-7b-chat | INT4 | 1024 | 6268.6 | 734.8 | 55.7 | 17.95332136 | |||
llama-3.1-8b | INT4 | 32 | 6640.3 | 87.3 | 56 | 17.85714286 | |||
glm-4-9b-chat | INT4 | 1024 | 6831.9 | 846.7 | 56.4 | 17.73049645 | |||
llama-3.1-8b | INT4 | 1024 | 6904.8 | 933.2 | 58.9 | 16.97792869 | |||
qwen-7b-chat | INT4 | 32 | 7172.3 | 77.7 | 59.9 | 16.69449082 | |||
gemma-7b-it | INT4 | 1024 | 6821 | 749.7 | 60 | 16.66666667 | |||
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 6798.6 | 66.3 | 64.6 | 15.47987616 | |||
dolly-v2-3b | FP16 | 32 | 6737.8 | 68.2 | 65.5 | 15.26717557 | |||
chatglm3-6b | INT8 | 32 | 7478.5 | 77.6 | 66.3 | 15.08295626 | |||
qwen-7b-chat | INT4 | 1024 | 7851.6 | 758.8 | 66.5 | 15.03759398 | |||
red-pajama-incite-chat-3b-v1 | FP16 | 1023 | 7121.4 | 434.8 | 67.7 | 14.77104874 | |||
dolly-v2-3b | FP16 | 1024 | 6738.7 | 402.8 | 68.5 | 14.59854015 | |||
chatglm3-6b | INT8 | 1024 | 7397.6 | 654.2 | 68.8 | 14.53488372 | |||
falcon-7b-instruct | INT8 | 32 | 7912.4 | 84.6 | 73.8 | 13.5501355 | |||
qwen2-7b | INT8 | 32 | 8567.4 | 92.1 | 75 | 13.33333333 | |||
llama-2-7b-chat-hf | INT8 | 32 | 7821.1 | 85.2 | 77.1 | 12.97016861 | |||
codegen25-7b | INT8 | 32 | 8002.4 | 86.9 | 77.7 | 12.87001287 | |||
baichuan2-13b-chat | INT4 | 32 | 10428.4 | 156.1 | 78.6 | 12.72264631 | |||
qwen2-7b | INT8 | 1024 | 8797 | 885.1 | 78.6 | 12.72264631 | |||
phi-3-medium-4k-instruct | INT4 | 38 | 8810.9 | 156.8 | 78.7 | 12.7064803 | |||
decilm-7b-instruct | INT8 | 36 | 8369.7 | 119.9 | 78.8 | 12.69035533 | |||
falcon-7b-instruct | INT8 | 1024 | 7912.4 | 969.3 | 79.7 | 12.54705144 | |||
baichuan2-7b-chat | INT8 | 32 | 8498.5 | 94.6 | 81 | 12.34567901 | |||
zephyr-7b-beta | INT8 | 32 | 8232.9 | 94.6 | 81.7 | 12.23990208 | |||
mistral-7b-v0.1 | INT8 | 32 | 8644 | 95 | 81.8 | 12.22493888 | |||
qwen-7b-chat | INT8 | 32 | 8975.1 | 92.7 | 81.8 | 12.22493888 | |||
decilm-7b-instruct | INT8 | 1091 | 8208.8 | 1025.7 | 82.3 | 12.15066829 | |||
llama-2-7b-chat-hf | INT8 | 1024 | 7821.4 | 759.4 | 82.5 | 12.12121212 | |||
codegen25-7b | INT8 | 1024 | 8003 | 923.1 | 83.5 | 11.9760479 | |||
phi-3-mini-4k-instruct | FP16 | 32 | 8751.9 | 88.8 | 84.9 | 11.77856302 | |||
mistral-7b-v0.1 | INT8 | 1024 | 8488 | 781.6 | 85.1 | 11.75088132 | |||
phi-3-medium-4k-instruct | INT4 | 1061 | 8946.2 | 2039.2 | 85.3 | 11.72332943 | |||
zephyr-7b-beta | INT8 | 1024 | 8487.4 | 826.4 | 85.3 | 11.72332943 | |||
llama-3.1-8b | INT8 | 32 | 9039 | 98.6 | 85.9 | 11.64144354 | |||
llama-3-8b | INT8 | 33 | 9040.6 | 102.9 | 86.1 | 11.61440186 | |||
baichuan2-7b-chat | INT8 | 1024 | 9331.5 | 836.8 | 86.5 | 11.56069364 | |||
qwen-7b-chat | INT8 | 1024 | 9642.8 | 740.6 | 88.7 | 11.27395716 | |||
phi-3-mini-4k-instruct | FP16 | 1024 | 9173.4 | 490.2 | 89 | 11.23595506 | |||
llama-3-8b | INT8 | 1025 | 9041 | 1025.4 | 89.4 | 11.18568233 | |||
llama-3.1-8b | INT8 | 1024 | 9302 | 885.9 | 89.6 | 11.16071429 | |||
starcoder | INT4 | 32 | 9298.4 | 142.3 | 93.6 | 10.68376068 | |||
gemma-7b-it | INT8 | 32 | 9662.7 | 114 | 94 | 10.63829787 | |||
glm-4-9b-chat | INT8 | 32 | 10381.6 | 110.6 | 98 | 10.20408163 | |||
gemma-7b-it | INT8 | 1024 | 10351.1 | 1005.5 | 99.4 | 10.06036217 | |||
glm-4-9b-chat | INT8 | 1024 | 10545.1 | 1116.4 | 101.1 | 9.891196835 | |||
lcm-dreamshaper-v7 | INT8 | 32 | 4719.6 | 117.5 | 107.7 | 9.285051068 | |||
lcm-dreamshaper-v7 | INT8 | 1024 | 5279.1 | 119.4 | 108.1 | 9.250693802 | |||
lcm-dreamshaper-v7 | FP16 | 32 | 4907.3 | 118.5 | 109.7 | 9.115770283 | |||
lcm-dreamshaper-v7 | FP16 | 1024 | 5530.5 | 122.3 | 109.9 | 9.099181074 | |||
lcm-dreamshaper-v7 | INT4 | 1024 | 5443.8 | 121.4 | 110 | 9.090909091 | |||
flan-t5-xxl | INT4 | 33 | 13636.5 | 85.8 | 110.2 | 9.074410163 | |||
lcm-dreamshaper-v7 | INT4 | 32 | 4790.7 | 120.3 | 110.9 | 9.017132552 | |||
flan-t5-xxl | INT8 | 33 | 23408.6 | 471.4 | 128.3 | 7.794232268 | |||
starcoder | INT4 | 1024 | 9716.5 | 1953.1 | 141.7 | 7.05716302 | |||
phi-3-medium-4k-instruct | INT8 | 38 | 14470.9 | 219.9 | 149.9 | 6.671114076 | |||
phi-3-medium-4k-instruct | INT8 | 1061 | 14471.2 | 2300 | 154.3 | 6.4808814 | |||
decilm-7b-instruct | FP16 | 1091 | 14729.5 | 1426.8 | 162.3 | 6.161429452 | |||
llama-3-8b | FP16 | 33 | 15955.7 | 237.9 | 168.8 | 5.924170616 | |||
llama-3-8b | FP16 | 1025 | 16095.6 | 1384 | 172.7 | 5.790387956 | |||
starcoder | INT8 | 32 | 15761.5 | 205.7 | 180.5 | 5.540166205 | |||
stable-diffusion-v2-1 | INT8 | 1024 | 5484.5 | 202.4 | 181.3 | 5.515719801 | |||
stable-diffusion-v2-1 | INT8 | 32 | 4830.1 | 194.9 | 181.9 | 5.497526113 | |||
stable-diffusion-v2-1 | FP16 | 32 | 5580 | 202.3 | 182.7 | 5.473453749 | |||
stable-diffusion-v2-1 | FP16 | 1024 | 6018 | 207.6 | 183.6 | 5.446623094 | |||
stable-diffusion-v1-5 | INT8 | 32 | 4732.6 | 218.9 | 207.8 | 4.812319538 | |||
stable-diffusion-v1-5 | INT8 | 1024 | 5212.1 | 219.8 | 208.5 | 4.79616307 | |||
stable-diffusion-v1-5 | FP16 | 1024 | 5524.4 | 227.1 | 210.8 | 4.743833017 | |||
stable-diffusion-v1-5 | FP16 | 32 | 4856.2 | 220.4 | 211.4 | 4.730368969 | |||
decilm-7b-instruct | FP16 | 36 | 15039.6 | 223.1 | 226.3 | 4.418912947 | |||
starcoder | INT8 | 1024 | 15763.6 | 2118.6 | 229.5 | 4.357298475 | |||
flan-t5-xxl | INT4 | 1139 | 16177.9 | 270.1 | 236.5 | 4.22832981 | |||
flan-t5-xxl | INT8 | 1139 | 26075.7 | 322.8 | 263.1 | 3.800836184 | |||
baichuan2-13b-chat | INT4 | 1024 | 12977.1 | 1796.4 | 279.2 | 3.581661891 | |||
baichuan2-13b-chat | INT8 | 32 | 15417 | 197.2 | 334.2 | 2.992220227 | |||
llama-3.1-8b | FP16 | 32 | 16112.3 | 853.2 | 410.5 | 2.436053593 | |||
llama-3.1-8b | FP16 | 1024 | 17452.2 | 1166.4 | 418.4 | 2.390057361 | |||
baichuan2-13b-chat | INT8 | 1024 | 15611.5 | 1891.2 | 446.3 | 2.240645306 | |||
phi-3-medium-4k-instruct | FP16 | 38 | 27161.6 | 2440.2 | 2280.7 | 0.438461876 | |||
phi-3-medium-4k-instruct | FP16 | 1061 | 27505.4 | 3536.3 | 2285.2 | 0.43759846 |
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec (2nd lat^(-1)) | |||
---|---|---|---|---|---|---|---|---|---|
bloomz-560m | INT4 | 32 | 2123 | 36.1 | 12.5 | 80 | |||
bloomz-560m | INT4 | 1024 | 2123.6 | 195 | 13.7 | 72.99270073 | |||
tiny-llama-1.1b-chat | INT4 | 32 | 2249.2 | 36.8 | 13.9 | 71.94244604 | |||
tiny-llama-1.1b-chat | INT4 | 1024 | 2249.9 | 427.8 | 15 | 66.66666667 | |||
qwen2-0.5b | INT4 | 32 | 1800.7 | 44.7 | 15.4 | 64.93506494 | |||
bloomz-560m | INT8 | 32 | 2273.5 | 39.5 | 15.4 | 64.93506494 | |||
qwen2-0.5b | INT4 | 1024 | 1801.1 | 185.9 | 15.5 | 64.51612903 | |||
bloomz-560m | INT8 | 1024 | 2471.6 | 213.3 | 15.8 | 63.29113924 | |||
qwen2-0.5b | INT8 | 32 | 2000.1 | 37.9 | 18.2 | 54.94505495 | |||
qwen2-0.5b | INT8 | 1024 | 2135.9 | 218 | 18.7 | 53.47593583 | |||
bloomz-560m | FP16 | 32 | 3069.2 | 39.1 | 19.7 | 50.76142132 | |||
qwen2-1.5b | INT4 | 32 | 2750.3 | 47.6 | 20 | 50 | |||
tiny-llama-1.1b-chat | INT8 | 32 | 2441.6 | 49.4 | 20.5 | 48.7804878 | |||
qwen2-1.5b | INT4 | 1024 | 2575.9 | 531.2 | 20.9 | 47.84688995 | |||
bloomz-560m | FP16 | 1024 | 3057.5 | 232.7 | 21 | 47.61904762 | |||
tiny-llama-1.1b-chat | INT8 | 1024 | 2431.7 | 523.6 | 21.5 | 46.51162791 | |||
dolly-v2-3b | INT4 | 32 | 3178.8 | 75.4 | 27.1 | 36.900369 | |||
minicpm-1b-sft | INT4 | 31 | 3131.5 | 74 | 27.6 | 36.23188406 | |||
red-pajama-incite-chat-3b-v1 | INT4 | 32 | 3057.5 | 67.1 | 27.6 | 36.23188406 | |||
gemma-2b-it | INT4 | 32 | 3460.7 | 97.9 | 28.5 | 35.0877193 | |||
minicpm-1b-sft | INT4 | 1014 | 3132 | 732.4 | 29 | 34.48275862 | |||
qwen2-1.5b | INT8 | 32 | 3126.4 | 77.4 | 29.3 | 34.12969283 | |||
gemma-2b-it | INT4 | 1024 | 3461.4 | 796.3 | 29.4 | 34.01360544 | |||
qwen2-1.5b | INT8 | 1024 | 3126.8 | 660.3 | 30.1 | 33.22259136 | |||
dolly-v2-3b | INT4 | 1024 | 3179 | 1171.9 | 31.8 | 31.44654088 | |||
minicpm-1b-sft | INT8 | 31 | 3496 | 77.9 | 31.9 | 31.34796238 | |||
red-pajama-incite-chat-3b-v1 | INT4 | 1023 | 3057.7 | 1211 | 32.8 | 30.48780488 | |||
minicpm-1b-sft | INT8 | 1014 | 3433.2 | 783.7 | 33.6 | 29.76190476 | |||
phi-3-mini-4k-instruct | INT4 | 32 | 3534.8 | 96.6 | 36.6 | 27.32240437 | |||
red-pajama-incite-chat-3b-v1 | INT8 | 32 | 4099.8 | 107.3 | 42.3 | 23.64066194 | |||
gemma-2b-it | INT8 | 32 | 4478.7 | 103.1 | 42.4 | 23.58490566 | |||
minicpm-1b-sft | FP16 | 31 | 4157.5 | 75.7 | 42.7 | 23.41920375 | |||
phi-3-mini-4k-instruct | INT4 | 1024 | 3535.3 | 1521.7 | 42.8 | 23.36448598 | |||
dolly-v2-3b | INT8 | 32 | 4143.7 | 102 | 43.1 | 23.20185615 | |||
gemma-2b-it | INT8 | 1024 | 4478.9 | 936.2 | 43.3 | 23.09468822 | |||
minicpm-1b-sft | FP16 | 1014 | 4329.7 | 876.6 | 44.8 | 22.32142857 | |||
red-pajama-incite-chat-3b-v1 | INT8 | 1023 | 4412.8 | 1815.9 | 44.9 | 22.27171492 | |||
dolly-v2-3b | INT8 | 1024 | 4143.8 | 1276.4 | 45.6 | 21.92982456 | |||
chatglm3-6b | INT4 | 32 | 4746.8 | 149.6 | 50.6 | 19.76284585 | |||
chatglm3-6b | INT4 | 1024 | 4747 | 2279.1 | 52.6 | 19.01140684 | |||
flan-t5-xxl | INT4 | 33 | 13681.2 | 91.7 | 53.6 | 18.65671642 | |||
phi-3-mini-4k-instruct | INT8 | 32 | 5041.3 | 110.9 | 56.9 | 17.57469244 | |||
llama-2-7b-gptq | INT4 | 32 | 5115.9 | 168.1 | 57.8 | 17.30103806 | |||
chatglm3-6b-gptq | INT4 | 32 | 5371.4 | 159.5 | 57.8 | 17.30103806 | |||
decilm-7b-instruct | INT4 | 36 | 5415.9 | 230.5 | 58 | 17.24137931 | |||
codegen25-7b | INT4 | 32 | 5110.5 | 161 | 59.1 | 16.92047377 | |||
flan-t5-xxl | INT4 | 1139 | 16627.6 | 455.8 | 59.3 | 16.86340641 | |||
qwen2-7b | INT4 | 32 | 5802.2 | 173.2 | 60.1 | 16.63893511 | |||
phi-3-mini-4k-instruct | INT8 | 1024 | 5041.7 | 1812.4 | 60.2 | 16.61129568 | |||
chatglm3-6b-gptq | INT4 | 1024 | 5748.7 | 2236 | 60.2 | 16.61129568 | |||
falcon-7b-instruct | INT4 | 32 | 5495.1 | 181.3 | 60.3 | 16.58374793 | |||
decilm-7b-instruct | INT4 | 1091 | 5237.4 | 2995.4 | 60.9 | 16.42036125 | |||
qwen2-7b | INT4 | 1024 | 5758.2 | 2445.4 | 61.9 | 16.15508885 | |||
falcon-7b-instruct | INT4 | 1024 | 5682.7 | 2718.5 | 62.6 | 15.97444089 | |||
codegen25-7b | INT4 | 1024 | 5513.9 | 2500.7 | 63.2 | 15.82278481 | |||
mistral-7b-v0.1 | INT4 | 32 | 5475.8 | 178.5 | 64.7 | 15.45595054 | |||
qwen-7b-chat-gptq | INT4 | 32 | 6115.4 | 174.2 | 64.8 | 15.43209877 | |||
llama-3-8b | INT4 | 33 | 5964.2 | 238.4 | 65.2 | 15.33742331 | |||
llama-3-8b | INT4 | 33 | 5870.5 | 239.8 | 65.3 | 15.31393568 | |||
llama-2-7b-chat-hf | INT4 | 32 | 5493.5 | 157.4 | 65.4 | 15.29051988 | |||
llama-2-7b-gptq | INT4 | 1024 | 5802.7 | 2547.3 | 65.4 | 15.29051988 | |||
mistral-7b-v0.1 | INT4 | 1024 | 5476 | 2684.8 | 67.2 | 14.88095238 | |||
llama-3-8b | INT4 | 1025 | 6163.2 | 2842.9 | 67.6 | 14.79289941 | |||
zephyr-7b-beta | INT4 | 32 | 5739.1 | 177.4 | 67.7 | 14.77104874 | |||
llama-3-8b | INT4 | 1025 | 6069.4 | 2741.8 | 67.8 | 14.74926254 | |||
llama-2-7b-chat-hf | INT4 | 1024 | 5494 | 2500.3 | 69.5 | 14.38848921 | |||
zephyr-7b-beta | INT4 | 1024 | 5739.7 | 2671.4 | 71 | 14.08450704 | |||
qwen-7b-chat-gptq | INT4 | 1024 | 6646.3 | 2596.9 | 73 | 13.69863014 | |||
baichuan2-7b-chat | INT4 | 32 | 6385.1 | 159.5 | 73.1 | 13.67989056 | |||
gemma-7b-it | INT4 | 32 | 7297.7 | 221.9 | 73.7 | 13.56852103 | |||
dolly-v2-3b | FP16 | 32 | 6652.1 | 107.1 | 74.2 | 13.47708895 | |||
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 6640.8 | 103.1 | 74.7 | 13.38688086 | |||
llama-3.1-8b | INT4 | 32 | 6797.5 | 182.7 | 76.3 | 13.1061599 | |||
glm-4-9b-chat | INT4 | 32 | 6805.1 | 215.5 | 76.4 | 13.08900524 | |||
baichuan2-7b-chat | INT4 | 1024 | 6385.5 | 2597 | 77.3 | 12.93661061 | |||
gemma-7b-it | INT4 | 1024 | 6974.7 | 3126 | 77.5 | 12.90322581 | |||
dolly-v2-3b | FP16 | 1024 | 6652.2 | 1542.4 | 78.7 | 12.7064803 | |||
red-pajama-incite-chat-3b-v1 | FP16 | 1023 | 7120.4 | 2490.4 | 79.3 | 12.61034048 | |||
llama-3.1-8b | INT4 | 1024 | 7114 | 2807.6 | 79.7 | 12.54705144 | |||
glm-4-9b-chat | INT4 | 1024 | 6805.2 | 3197 | 79.7 | 12.54705144 | |||
qwen-7b-chat | INT4 | 32 | 7255.7 | 156.2 | 81.2 | 12.31527094 | |||
chatglm3-6b | INT8 | 32 | 7308.6 | 154.4 | 85.1 | 11.75088132 | |||
qwen-7b-chat | INT4 | 1024 | 7827.7 | 2693.7 | 86.6 | 11.54734411 | |||
chatglm3-6b | INT8 | 1024 | 7308.9 | 2486 | 87.4 | 11.4416476 | |||
flan-t5-xxl | INT8 | 33 | 20923.9 | 170.5 | 91.7 | 10.90512541 | |||
llama-2-7b-chat-hf | INT8 | 32 | 7838.4 | 157.9 | 94.8 | 10.54852321 | |||
falcon-7b-instruct | INT8 | 32 | 8250 | 175.3 | 95.1 | 10.51524711 | |||
codegen25-7b | INT8 | 32 | 7996.9 | 162.7 | 95.7 | 10.44932079 | |||
falcon-7b-instruct | INT8 | 1024 | 8445.4 | 3055.4 | 97.5 | 10.25641026 | |||
flan-t5-xxl | INT8 | 1139 | 24095.3 | 571.2 | 97.6 | 10.24590164 | |||
qwen2-7b | INT8 | 32 | 8542.4 | 185.5 | 98.2 | 10.18329939 | |||
llama-2-7b-chat-hf | INT8 | 1024 | 7838.6 | 3132.1 | 98.8 | 10.12145749 | |||
qwen2-7b | INT8 | 1024 | 8543.5 | 3124.5 | 99.8 | 10.02004008 | |||
codegen25-7b | INT8 | 1024 | 8453.5 | 3136 | 99.9 | 10.01001001 | |||
decilm-7b-instruct | INT8 | 36 | 8088.5 | 244.9 | 100.7 | 9.930486594 | |||
phi-3-mini-4k-instruct | FP16 | 32 | 8592.5 | 124.5 | 102.9 | 9.718172983 | |||
decilm-7b-instruct | INT8 | 1091 | 8292.4 | 9951.9 | 103.5 | 9.661835749 | |||
qwen-7b-chat | INT8 | 32 | 8991.1 | 169.7 | 103.7 | 9.643201543 | |||
zephyr-7b-beta | INT8 | 32 | 8267.2 | 183.1 | 104.5 | 9.56937799 | |||
mistral-7b-v0.1 | INT8 | 32 | 8269.6 | 184.1 | 104.9 | 9.532888465 | |||
zephyr-7b-beta | INT8 | 1024 | 8268.1 | 3379.7 | 107 | 9.345794393 | |||
mistral-7b-v0.1 | INT8 | 1024 | 8513.8 | 3394.1 | 107.4 | 9.310986965 | |||
phi-3-mini-4k-instruct | FP16 | 1024 | 9157.2 | 2080.8 | 108.4 | 9.225092251 | |||
qwen-7b-chat | INT8 | 1024 | 8991.4 | 3137.5 | 109 | 9.174311927 | |||
llama-3-8b | INT8 | 33 | 9085.1 | 264.9 | 109.4 | 9.140767824 | |||
llama-3.1-8b | INT8 | 32 | 9070.9 | 189.1 | 110.7 | 9.033423668 | |||
baichuan2-13b-chat | INT4 | 32 | 10592.1 | 330.4 | 111.4 | 8.976660682 | |||
llama-3-8b | INT8 | 1025 | 9085.2 | 9900.1 | 111.9 | 8.936550492 | |||
llama-3.1-8b | INT8 | 1024 | 9071 | 3408.2 | 113.2 | 8.833922261 | |||
phi-3-medium-4k-instruct | INT4 | 38 | 9009.6 | 443.3 | 116 | 8.620689655 | |||
phi-3-medium-4k-instruct | INT4 | 1061 | 8935.4 | 5655.5 | 119.9 | 8.34028357 | |||
baichuan2-7b-chat | INT8 | 32 | 8633.7 | 172.7 | 120.5 | 8.298755187 | |||
baichuan2-7b-chat | INT8 | 1024 | 9135.7 | 3192.6 | 124.7 | 8.019246191 | |||
gemma-7b-it | INT8 | 32 | 10087.5 | 223.2 | 125.2 | 7.987220447 | |||
glm-4-9b-chat | INT8 | 32 | 10440 | 224.2 | 125.7 | 7.955449483 | |||
gemma-7b-it | INT8 | 1024 | 9965.1 | 3723.4 | 129.1 | 7.745933385 | |||
glm-4-9b-chat | INT8 | 1024 | 10440.1 | 4054.2 | 129.2 | 7.73993808 | |||
starcoder | INT4 | 32 | 9738.6 | 599.6 | 177.5 | 5.633802817 | |||
flan-t5-xxl | FP16 | 33 | 19273 | 553.7 | 188.1 | 5.316321106 | |||
flan-t5-xxl | FP16 | 1139 | 24887.6 | 999 | 193.1 | 5.178663905 | |||
phi-3-medium-4k-instruct | INT8 | 38 | 14453.1 | 1342.7 | 205.9 | 4.856726566 | |||
phi-3-medium-4k-instruct | INT8 | 1061 | 14287.2 | 19763.6 | 210.9 | 4.741583689 | |||
decilm-7b-instruct | FP16 | 36 | 14215.6 | 465.7 | 222 | 4.504504505 | |||
decilm-7b-instruct | FP16 | 1091 | 14332.5 | 12122.8 | 225.6 | 4.432624113 | |||
starcoder | INT8 | 32 | 8567.4 | 379.1 | 235.4 | 4.24808836 | |||
llama-3.1-8b | FP16 | 32 | 15653.3 | 319.9 | 240.7 | 4.154549231 | |||
starcoder | INT4 | 1024 | 9738.7 | 6736.5 | 241.1 | 4.147656574 | |||
llama-3.1-8b | FP16 | 1024 | 17004.9 | 4679.8 | 245.7 | 4.07000407 | |||
starcoder | INT8 | 1024 | 9829.9 | 8819.9 | 269.2 | 3.714710253 | |||
lcm-dreamshaper-v7 | INT4 | 32 | 5391.5 | 296.1 | 284.2 | 3.518648839 | |||
lcm-dreamshaper-v7 | INT4 | 1024 | 5779.1 | 305.6 | 284.3 | 3.517411185 | |||
lcm-dreamshaper-v7 | FP16 | 1024 | 5967.9 | 304.5 | 284.5 | 3.514938489 | |||
lcm-dreamshaper-v7 | FP16 | 32 | 5238.8 | 295.8 | 284.5 | 3.514938489 | |||
lcm-dreamshaper-v7 | INT8 | 32 | 4974.1 | 314.4 | 301.4 | 3.317850033 | |||
lcm-dreamshaper-v7 | INT8 | 1024 | 5622.3 | 323.9 | 301.7 | 3.314550878 | |||
stable-diffusion-v2-1 | FP16 | 1024 | 5942.7 | 475.7 | 444.7 | 2.248706993 | |||
stable-diffusion-v2-1 | FP16 | 32 | 5197.9 | 466.9 | 445.4 | 2.245172878 | |||
baichuan2-13b-chat | INT4 | 1024 | 12879 | 5213.1 | 448.6 | 2.229157379 | |||
stable-diffusion-v2-1 | INT8 | 32 | 4723.6 | 484 | 455.9 | 2.193463479 | |||
stable-diffusion-v2-1 | INT8 | 1024 | 5458.1 | 489.4 | 456.2 | 2.192021043 | |||
stable-diffusion-v1-5 | FP16 | 1024 | 6573.2 | 576.6 | 550.6 | 1.816200509 | |||
stable-diffusion-v1-5 | FP16 | 32 | 5848.9 | 570.5 | 551.4 | 1.81356547 | |||
stable-diffusion-v1-5 | INT8 | 32 | 5581 | 603.9 | 587.7 | 1.701548409 | |||
stable-diffusion-v1-5 | INT8 | 1024 | 6258.2 | 612.9 | 589.4 | 1.696640652 | |||
phi-3-medium-4k-instruct | FP16 | 38 | 27222.7 | 3293.8 | 1198.9 | 0.834097923 | |||
phi-3-medium-4k-instruct | FP16 | 1061 | 28813.8 | 32882.8 | 1199.7 | 0.833541719 |
All models listed here were tested with the following parameters:
Framework: PyTorch
Beam: 1
Batch size: 1