Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs.

The tables below list the key performance indicators for a selection of Large Language Models, running on an Intel® Core™ Ultra 7-165H based system, on built-in GPUs.

Model name

Throughput: (tokens/sec. 2nd token)

1st token latency (msec)

Max RSS memory used. (MB)

Input tokens

Output tokens

Model Precision

Beam

Batch size

Framework

OPT-2.7b

20.2

2757

7084

937

128

INT4

1

1

PT

Phi-3-mini-4k-instruct

19.9

2776

7028

1062

128

INT4

1

1

PT

Orca-mini-3b

19.2

2966

7032

1024

128

INT4

1

1

PT

Phi-2

17.8

2162

7032

1024

128

INT4

1

1

PT

Stable-Zephyr-3b-dpo

17.0

1791

7007

946

128

INT4

1

1

PT

ChatGLM3-6b

16.5

3569

6741

1024

128

INT4

1

1

PT

Dolly-v2-3b

15.8

6891

6731

1024

128

INT4

1

1

PT

Stablelm-3b-4e1t

15.7

2051

7018

1024

128

INT4

1

1

PT

Red-Pajama-Incite-Chat-3b-V1

14.8

6582

7028

1020

128

INT4

1

1

PT

Falcon-7b-instruct

14.5

4552

7033

1049

128

INT4

1

1

PT

Codegen25-7b

13.3

3982

6732

1024

128

INT4

1

1

PT

GPT-j-6b

13.2

7213

6882

1024

128

INT4

1

1

PT

Stablelm-7b

12.8

6339

7013

1020

128

INT4

1

1

PT

Llama-3-8b

12.8

4356

6953

1024

128

INT4

1

1

PT

Llama-2-7b-chat

12.3

4205

6906

1024

128

INT4

1

1

PT

Llama-7b

11.7

4315

6927

1024

128

INT4

1

1

PT

Mistral-7b-v0.1

10.5

4462

7242

1007

128

INT4

1

1

PT

Zephyr-7b-beta

10.5

4500

7039

1024

128

INT4

1

1

PT

Qwen1.5-7b-chat

9.9

4318

7034

1024

128

INT4

1

1

PT

Baichuan2-7b-chat

9.8

4668

6724

1024

128

INT4

1

1

PT

Qwen-7b-chat

9.0

5141

6996

1024

128

INT4

1

1

PT

Product

Model

Framework

Precision

Node

Request Rate

Throughput [tok/s]

TPOT Mean Latency

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.2

92.75

75.75

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.3

137.89

98.6

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.4

182.68

144.36

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.5

227.02

238.54

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.6

259.06

679.07

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.7

267.24

785.75

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.8

267.77

815.11

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

0.9

270.01

827.09

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

1.0

268.92

840.1

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

2.0

269.6

847.81

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8380

inf

270.55

839.37

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

0.2

92.63

63.23

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

0.4

183.51

105.0

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

0.6

272.59

95.34

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

0.8

359.28

126.61

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

1.0

442.69

169.24

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

1.2

521.61

195.94

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

1.4

589.34

267.43

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

1.6

650.25

291.68

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

1.8

655.39

308.64

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

2.0

680.45

302.09

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8480+

inf

702.42

307.82

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

0.2

92.89

54.69

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

0.4

184.37

77.0

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

0.6

273.06

101.81

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

0.8

360.22

135.38

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

1.0

442.46

170.65

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

1.2

519.5

208.44

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

1.4

590.11

252.86

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

1.6

651.09

286.93

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

1.8

670.74

298.02

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

2.0

684.4

299.41

ovms

meta-llama/Llama-2-7b-chat-hf

PT

INT8-CW

Xeon Platinum 8580

inf

701.91

305.9

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.2

79.24

73.06

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.3

118.42

90.31

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.4

157.04

113.23

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.5

193.85

203.97

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.6

232.36

253.17

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.7

260.56

581.45

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.8

271.97

761.05

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

0.9

273.36

787.74

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

1.0

272.54

811.37

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

2.0

278.07

809.3

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8380

inf

275.71

810.89

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

0.2

78.3

60.37

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

0.4

156.42

69.27

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

0.6

232.27

77.79

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

0.8

307.37

90.07

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

1.0

380.61

104.71

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

1.2

452.18

127.36

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

1.4

519.44

156.18

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

1.6

587.62

169.44

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

1.8

649.94

198.44

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

2.0

707.46

234.44

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8480+

inf

799.46

265.5

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

0.2

78.61

54.12

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

0.4

156.19

70.38

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

0.6

232.36

81.83

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

0.8

307.01

101.66

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

1.0

376.36

139.62

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

1.2

447.75

158.53

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

1.4

519.74

160.26

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

1.6

582.37

190.22

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

1.8

635.46

231.31

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

2.0

698.38

247.77

ovms

meta-llama/Meta-Llama-3-8B-Instruct

PT

INT8-CW

Xeon Platinum 8580

inf

843.51

252.12

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.2

87.18

74.96

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.3

130.74

92.67

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.4

172.94

117.03

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.5

214.71

172.69

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.6

255.45

282.74

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.7

280.38

629.68

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.8

280.55

765.16

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

0.9

289.65

765.65

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

1.0

290.67

783.47

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

2.0

284.14

815.09

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8380

inf

290.39

793.52

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

0.2

88.9

60.04

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

0.4

176.5

70.24

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

0.6

262.04

77.01

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

0.8

346.01

95.29

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

1.0

427.37

114.16

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

1.2

507.86

138.56

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

1.4

582.58

150.72

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

1.6

655.61

166.64

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

1.8

717.9

216.76

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

2.0

774.3

233.49

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8480+

inf

873.93

245.31

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

0.2

88.92

56.33

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

0.4

175.99

72.72

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

0.6

261.96

84.24

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

0.8

346.78

101.67

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

1.0

427.85

128.33

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

1.2

506.17

150.01

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

1.4

581.72

167.61

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

1.6

651.97

190.91

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

1.8

713.2

222.56

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

2.0

771.17

232.08

ovms

mistralai/Mistral-7B-v0.1

PT

INT8-CW

Xeon Platinum 8580

inf

839.74

253.74

For complete information on the system config, see: Hardware Platforms [PDF]

To view the data in an editable form, you can download the .csv files here: