Most Efficient Large Language Models for AI PC#

The table below lists key performance indicators for a selection of Large Language Models running on an Intel® Core™ Ultra 7-165H based system.



Model name:

Throughput: (tokens/sec. 2nd token)

1st token latency (msec)

Max_RSS_memory used. (MB)

Input tokens:

Output tokens:

Model Precision:

Beam:

Batch size:

Framework:

OPT-2.7b

20.2

2757

7084

937

128

INT4

1

1

PT

Phi-3-mini-4k-instruct

19.9

2776

7028

1062

128

INT4

1

1

PT

Orca-mini-3b

19.2

2966

7032

1024

128

INT4

1

1

PT

Phi-2

17.8

2162

7032

1024

128

INT4

1

1

PT

Stable-Zephyr-3b-dpo

17.0

1791

7007

946

128

INT4

1

1

PT

ChatGLM3-6b

16.5

3569

6741

1024

128

INT4

1

1

PT

Dolly-v2-3b

15.8

6891

6731

1024

128

INT4

1

1

PT

Stablelm-3b-4e1t

15.7

2051

7018

1024

128

INT4

1

1

PT

Red-Pajama-Incite-Chat-3b-V1

14.8

6582

7028

1020

128

INT4

1

1

PT

Falcon-7b-instruct

14.5

4552

7033

1049

128

INT4

1

1

PT

Codegen25-7b

13.3

3982

6732

1024

128

INT4

1

1

PT

GPT-j-6b

13.2

7213

6882

1024

128

INT4

1

1

PT

Stablelm-7b

12.8

6339

7013

1020

128

INT4

1

1

PT

Llama-3-8b

12.8

4356

6953

1024

128

INT4

1

1

PT

Llama-2-7b-chat

12.3

4205

6906

1024

128

INT4

1

1

PT

Llama-7b

11.7

4315

6927

1024

128

INT4

1

1

PT

Mistral-7b-v0.1

10.5

4462

7242

1007

128

INT4

1

1

PT

Zephyr-7b-beta

10.5

4500

7039

1024

128

INT4

1

1

PT

Qwen1.5-7b-chat

9.9

4318

7034

1024

128

INT4

1

1

PT

Baichuan2-7b-chat

9.8

4668

6724

1024

128

INT4

1

1

PT

Qwen-7b-chat

9.0

5141

6996

1024

128

INT4

1

1

PT

Vicuna-7b-v1.5

0.0

3982

7022

1024

128

INT4

1

1

PT

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs.

For complete information on the system config, see: Hardware Platforms [PDF]