Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.0, 06 March 2025 (7-155H and 7-268V) and OpenVINO 2024.6, 13 Dec. 2024 (9-288V).

The tables below list the key performance indicators for inference on built-in GPUs.

9-288V

Search:

Topology	Precision	Input Size	2nd latency (ms)
baichuan2-7b-chat	INT4-MIXED	32	77.3
baichuan2-7b-chat	INT4-MIXED	1024	86.4
bloomz-560m	INT4-MIXED	32	11.1
bloomz-560m	INT8-CW	32	13.9
bloomz-560m	INT4-MIXED	1024	15.2
bloomz-560m	INT8-CW	1024	16.1
bloomz-560m	FP16	32	17.5
bloomz-560m	FP16	1024	19.8
dolly-v2-3b	INT4-MIXED	32	49.9
dolly-v2-3b	INT8-CW	32	55.9

Showing 1 to 10 of 82 entries

…

7-268V

Search:

Topology	Precision	Input Size	2nd latency (ms)
baichuan2-13b-chat	INT4	32	78.6
baichuan2-13b-chat	INT4	1024	279.2
baichuan2-13b-chat	INT8	32	334.2
baichuan2-13b-chat	INT8	1024	446.3
baichuan2-7b-chat	INT4	32	50.7
baichuan2-7b-chat	INT4	1024	55.7
baichuan2-7b-chat	INT8	32	81
baichuan2-7b-chat	INT8	1024	86.5
bloomz-560m	INT4	1024	10.3
bloomz-560m	INT4	32	10.5

Showing 1 to 10 of 148 entries

…

7-155H

Search:

Topology	Precision	Input Size	2nd latency (ms)
baichuan2-13b-chat	INT4	32	111.4
baichuan2-13b-chat	INT4	1024	448.6
baichuan2-7b-chat	INT4	32	73.1
baichuan2-7b-chat	INT4	1024	77.3
baichuan2-7b-chat	INT8	32	120.5
baichuan2-7b-chat	INT8	1024	124.7
bloomz-560m	INT4	32	12.5
bloomz-560m	INT4	1024	13.7
bloomz-560m	INT8	32	15.4
bloomz-560m	INT8	1024	15.8

Showing 1 to 10 of 146 entries

…

All models listed here were tested with the following parameters:

Framework: PyTorch
Beam: 1
Batch size: 1

Get system descriptions [PDF]

Get the data in .csv [CSV]