Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.2, 30 June 2025.

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec (2nd lat^(-1))

opt-125m-gptq

INT4-MIXED

32

13.2

3.2

968.7

312.5

opt-125m-gptq

INT4-MIXED

1024

22.2

3.4

1084.8

294.1176471

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

99.1

23.6

3231.4

42.37288136

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

436.7

27.9

3491.7

35.84229391

stable-zephyr-3b-dpo

INT4-MIXED

32

154.3

28.5

3295.2

35.0877193

phi-2

INT4-MIXED

32

156

30.6

3174.3

32.67973856

red-pajama-incite-chat-3b-v1

INT8-CW

32

66.8

33.3

4340.5

30.03003003

stable-zephyr-3b-dpo

INT8-CW

32

58.5

34.1

4370.6

29.3255132

phi-3-mini-4k-instruct

INT4-MIXED

32

90.7

34.5

3631.2

28.98550725

stable-zephyr-3b-dpo

INT4-MIXED

1024

387.2

34.5

3677.8

28.98550725

stablelm-3b-4e1t

INT8-CW

32

84.4

34.6

4364.6

28.9017341

phi-2

INT4-MIXED

1024

386.9

35.8

3430

27.93296089

phi-2

INT8-CW

32

95.4

35.9

4284.6

27.8551532

stablelm-3b-4e1t

INT4-MIXED

32

116.1

36.1

3425.3

27.70083102

red-pajama-incite-chat-3b-v1

INT8-CW

1024

376.2

36.5

4606.7

27.39726027

stable-zephyr-3b-dpo

INT8-CW

1024

332.9

36.9

4645.4

27.100271

stablelm-3b-4e1t

INT8-CW

1024

345

37.8

4629.7

26.45502646

phi-3-mini-4k-instruct

INT4-MIXED

1024

539.5

38

4073.2

26.31578947

phi-2

INT8-CW

1024

329.1

38.2

4525.7

26.17801047

stablelm-3b-4e1t

INT4-MIXED

1024

330.2

39.5

3634.9

25.3164557

chatglm3-6b

INT4-MIXED

32

71.7

44.3

5139.6

22.57336343

flan-t5-xxl

INT4-MIXED

33

100.2

44.8

13574.6

22.32142857

phi-3-mini-4k-instruct

INT8-CW

32

74.8

45.3

5378.8

22.07505519

chatglm3-6b

INT4-MIXED

1024

720.4

46.1

4696.4

21.69197397

phi-3-mini-4k-instruct

INT8-CW

1024

445.9

48.7

5674.8

20.5338809

flan-t5-xxl

INT4-MIXED

1139

367.4

52.5

15343.1

19.04761905

gpt-j-6b

INT4-MIXED

32

191.1

52.9

5170.3

18.90359168

gpt-j-6b

INT4-MIXED

1024

774.2

56.6

6185.7

17.66784452

falcon-7b-instruct

INT4-MIXED

32

110.7

57.5

5449.4

17.39130435

codegen25-7b

INT4-MIXED

32

122.5

58

5387.4

17.24137931

falcon-7b-instruct

INT4-MIXED

1024

789.3

60

5045.2

16.66666667

codegen25-7b

INT4-MIXED

1024

878.6

62.4

5775.7

16.02564103

gemma-7b-it

INT4-MIXED

32

132.3

68.8

6467.9

14.53488372

red-pajama-incite-chat-3b-v1

FP16

32

76.1

69.5

6687.4

14.38848921

chatglm3-6b

INT8-CW

32

97.2

69.8

7527.1

14.32664756

llama-2-7b-gptq

INT4-MIXED

32

107.8

70.1

5019.9

14.26533524

phi-2

FP16

32

110.2

71.4

6930

14.00560224

mistral-7b-v0.1

INT4-MIXED

32

119.6

71.4

5795.7

14.00560224

chatglm3-6b

INT8-CW

1024

510.8

72

7336.2

13.88888889

gemma-7b-it

INT4-MIXED

1024

920.7

72.2

6815

13.85041551

stable-zephyr-3b-dpo

FP16

32

93.5

73.3

6941.9

13.6425648

stablelm-3b-4e1t

FP16

32

127.2

73.5

6932.8

13.60544218

red-pajama-incite-chat-3b-v1

FP16

1024

519.7

73.6

7502.4

13.58695652

flan-t5-xxl

INT8-CW

33

325.1

73.7

23418.2

13.56852103

gpt-j-6b

INT8-CW

32

132.8

74.1

7406.1

13.49527665

mistral-7b-v0.1

INT4-MIXED

1024

601.5

74.4

5570.8

13.44086022

phi-2

FP16

1024

477.2

76.3

7507.7

13.1061599

chatglm3-6b-gptq

INT4-MIXED

32

83.8

77

5596.8

12.98701299

qwen-7b-chat-gptq

INT4-MIXED

32

192

77.2

6241.4

12.95336788

stable-zephyr-3b-dpo

FP16

1024

521.4

77.6

7520.5

12.88659794

llama-2-7b-gptq

INT4-MIXED

1024

568.4

77.6

6556

12.88659794

gpt-j-6b

INT8-CW

1024

627.9

78

8648.4

12.82051282

stablelm-3b-4e1t

FP16

1024

517.9

78.4

7519.9

12.75510204

chatglm3-6b-gptq

INT4-MIXED

1024

544.5

78.9

5419.5

12.67427123

codegen25-7b

INT8-CW

32

110.9

79.7

8283.6

12.54705144

qwen-7b-chat-gptq

INT4-MIXED

1024

762.3

81.5

6633.9

12.26993865

mistral-7b-v0.1

INT8-CW

32

120.4

84

8586

11.9047619

codegen25-7b

INT8-CW

1024

589.2

84

8754.4

11.9047619

flan-t5-xxl

INT8-CW

1139

528.4

84.9

25288

11.77856302

falcon-7b-instruct

INT8-CW

32

124

86.4

8149.2

11.57407407

mistral-7b-v0.1

INT8-CW

1024

603.5

86.8

8421.3

11.52073733

falcon-7b-instruct

INT8-CW

1024

594.1

89

7890.9

11.23595506

baichuan2-13b-chat

INT4-MIXED

32

178.4

93.9

9320.9

10.64962726

gemma-7b-it

INT8-CW

32

149.3

99.7

9596.8

10.03009027

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec (2nd lat^(-1))

opt-125m-gptq

INT4-MIXED

32

13.6

3.4

1184.3

294.12

opt-125m-gptq

INT4-MIXED

1024

17.3

3.8

1302.2

263.16

gemma-2b-it

INT4-MIXED

32

31.4

17.7

3364.6

56.50

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

49.1

17.8

3382.2

56.18

dolly-v2-3b

INT4-MIXED

32

50.8

18.3

3402.1

54.64

gemma-2b-it

INT4-MIXED

1024

189.1

18.6

3294.5

53.76

phi-2

INT4-MIXED

32

57

19.3

3326.9

51.81

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

359.9

19.7

3717

50.76

stable-zephyr-3b-dpo

INT4-MIXED

32

59.7

20.2

3410.6

49.50

dolly-v2-3b

INT4-MIXED

1024

383.1

20.3

3773.8

49.26

phi-2

INT4-MIXED

1024

326.1

21.3

3845.6

46.95

stablelm-3b-4e1t

INT4-MIXED

32

60.8

21.9

3566.3

45.66

stable-zephyr-3b-dpo

INT4-MIXED

1024

330

22.2

3929.3

45.05

stablelm-3b-4e1t

INT4-MIXED

1024

299.8

23.8

4032.5

42.02

phi-3-mini-4k-instruct

INT4-MIXED

32

44.8

25.5

3772.6

39.22

gemma-2b-it

INT8-CW

32

35.5

28

3737.4

35.71

phi-3-mini-4k-instruct

INT4-MIXED

1024

457.5

28.1

4361.9

35.59

gemma-2b-it

INT8-CW

1024

208.9

28.9

3878.2

34.60

red-pajama-incite-chat-3b-v1

INT8-CW

32

39.8

29.8

4317.7

33.56

dolly-v2-3b

INT8-CW

32

46.6

30.5

4426.2

32.79

stablelm-3b-4e1t

INT8-CW

32

48.9

30.8

4444.1

32.47

phi-2

INT8-CW

32

54.1

30.9

4429.9

32.36

stable-zephyr-3b-dpo

INT8-CW

32

49.4

30.9

4456.8

32.36

red-pajama-incite-chat-3b-v1

INT8-CW

1024

324

31.9

4713.6

31.35

dolly-v2-3b

INT8-CW

1024

365.7

32.6

4826.6

30.67

phi-2

INT8-CW

1024

305.9

32.9

4817.9

30.40

stable-zephyr-3b-dpo

INT8-CW

1024

328.1

32.9

4844.5

30.40

stablelm-3b-4e1t

INT8-CW

1024

330.6

32.9

4818.1

30.40

chatglm3-6b

INT4-MIXED

32

52.5

33.7

5202.4

29.67

flan-t5-xxl

INT4-MIXED

33

57.7

34.8

13591.1

28.74

chatglm3-6b

INT4-MIXED

1024

467.2

35.8

5187.5

27.93

codegen25-7b

INT4-MIXED

32

63.8

37.7

5518

26.53

llama-2-7b-gptq

INT4-MIXED

32

61.8

37.8

5352.1

26.46

gpt-j-6b

INT4-MIXED

32

76.5

39.5

5071.7

25.32

flan-t5-xxl

INT4-MIXED

1139

252.9

39.7

15513.3

25.19

chatglm3-6b-gptq

INT4-MIXED

32

69.3

40.5

5781.1

24.69

codegen25-7b

INT4-MIXED

1024

539.9

41.4

5994.8

24.15

phi-3-mini-4k-instruct

INT8-CW

32

53.3

42.2

5452.9

23.70

chatglm3-6b-gptq

INT4-MIXED

1024

427.9

42.8

5642.5

23.36

gpt-j-6b

INT4-MIXED

1024

565.8

43

6405.8

23.26

qwen-7b-chat-gptq

INT4-MIXED

32

77.1

43.1

6371

23.20

falcon-7b-instruct

INT4-MIXED

32

66.7

43.3

5357.6

23.09

llama-2-7b-gptq

INT4-MIXED

1024

438.6

43.6

6815.2

22.94

mistral-7b-v0.1

INT4-MIXED

32

70.7

44.5

5862.7

22.47

phi-3-mini-4k-instruct

INT8-CW

1024

453.4

44.9

5810.4

22.27

falcon-7b-instruct

INT4-MIXED

1024

568.1

45.2

5293.6

22.12

qwen-7b-chat-gptq

INT4-MIXED

1024

712.2

46.8

6839.2

21.37

mistral-7b-v0.1

INT4-MIXED

1024

467

47

5757.8

21.28

zephyr-7b-beta

INT4-MIXED

32

72.1

48.4

6074.7

20.66

baichuan2-7b-chat

INT4-MIXED

32

73.5

49.7

6579.6

20.12

zephyr-7b-beta

INT4-MIXED

1024

509.8

51

6049.1

19.61

gemma-7b-it

INT4-MIXED

32

88.7

51.5

6571.3

19.42

baichuan2-7b-chat

INT4-MIXED

1024

1547.3

54.5

7293.8

18.35

gemma-7b-it

INT4-MIXED

1024

714.8

57

7044.8

17.54

qwen-7b-chat

INT4-MIXED

32

85.7

60

7335.9

16.67

gemma-2b-it

FP16

32

66

62.1

6218.5

16.10

gemma-2b-it

FP16

1024

250.7

62.8

6370.7

15.92

red-pajama-incite-chat-3b-v1

FP16

32

64.9

63.8

6906.6

15.67

dolly-v2-3b

FP16

32

65.6

64.4

6920.6

15.53

phi-2

FP16

32

66.3

64.6

6938.9

15.48

qwen-7b-chat

INT4-MIXED

1024

776.9

64.8

7960.4

15.43

stablelm-3b-4e1t

FP16

32

62.3

65.5

7146.8

15.27

stable-zephyr-3b-dpo

FP16

32

62.7

65.6

7157.5

15.24

chatglm3-6b

INT8-CW

32

77.8

66

7573.1

15.15

red-pajama-incite-chat-3b-v1

FP16

1024

387.5

67.6

7741.9

14.79

phi-2

FP16

1024

364.6

68.3

7748.2

14.64

chatglm3-6b

INT8-CW

1024

465.1

68.4

7521.8

14.62

dolly-v2-3b

FP16

1024

430.9

68.4

7731.2

14.62

stable-zephyr-3b-dpo

FP16

1024

348.8

69.3

7768.6

14.43

stablelm-3b-4e1t

FP16

1024

354.9

69.3

7752

14.43

flan-t5-xxl

INT8-CW

33

300

69.4

23449.5

14.41

falcon-7b-instruct

INT8-CW

32

88

73.5

8224.5

13.61

gpt-j-6b

INT8-CW

32

86.8

75.1

7450.1

13.32

flan-t5-xxl

INT8-CW

1139

368.4

75.4

25452.7

13.26

falcon-7b-instruct

INT8-CW

1024

625.2

76.3

8166.2

13.11

codegen25-7b

INT8-CW

32

89.2

76.5

8423.7

13.07

baichuan2-13b-chat

INT4-MIXED

32

121.9

77.1

9488

12.97

gpt-j-6b

INT8-CW

1024

597.9

78.7

8861.5

12.71

baichuan2-7b-chat

INT8-CW

32

89.2

79.9

8620.5

12.52

mistral-7b-v0.1

INT8-CW

32

94.3

80.5

8715.1

12.42

zephyr-7b-beta

INT8-CW

32

95.4

80.6

8622.5

12.41

codegen25-7b

INT8-CW

1024

554.4

81

8948.7

12.35

qwen-7b-chat

INT8-CW

32

100

82.7

8975.7

12.09

baichuan2-13b-chat

INT4-MIXED

1024

2739.6

83.7

10552.9

11.95

mistral-7b-v0.1

INT8-CW

1024

598.2

83.8

8597.9

11.93

zephyr-7b-beta

INT8-CW

1024

615.2

83.8

8512.3

11.93

baichuan2-7b-chat

INT8-CW

1024

1681.7

84.2

9540.5

11.88

qwen-7b-chat

INT8-CW

1024

942.8

86.9

9861.6

11.51

starcoder

INT4-MIXED

32

145.4

88.2

9617.2

11.34

starcoder

INT4-MIXED

1024

1499

91.9

9452.9

10.88

gemma-7b-it

INT8-CW

32

113.9

92.5

9744.4

10.81

gemma-7b-it

INT8-CW

1024

778.6

96.1

10562

10.41

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec (2nd lat^(-1))

dolly-v2-3b

INT4-MIXED

32

71.8

24

3331.5

41.66666667

phi-2

INT4-MIXED

32

69

24.5

3233.9

40.81632653

gemma-2b-it

INT4-MIXED

32

64.5

24.7

3797.4

40.48582996

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

70.1

24.7

3260.3

40.48582996

gemma-2b-it

INT4-MIXED

1024

765.3

25.6

3730.1

39.0625

stable-zephyr-3b-dpo

INT4-MIXED

32

82.4

26.6

3334.7

37.59398496

dolly-v2-3b

INT4-MIXED

1024

1107.7

27

3750

37.03703704

phi-2

INT4-MIXED

1024

1088.2

27.4

3635.7

36.49635036

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

1089.9

27.8

3651.1

35.97122302

stablelm-3b-4e1t

INT4-MIXED

32

76.2

29.1

3514.4

34.36426117

stable-zephyr-3b-dpo

INT4-MIXED

1024

1119.8

29.6

3712.9

33.78378378

stablelm-3b-4e1t

INT4-MIXED

1024

1095.1

32.1

3823.9

31.15264798

phi-3-mini-4k-instruct

INT4-MIXED

32

97.5

34.9

3850

28.65329513

phi-3-mini-4k-instruct

INT4-MIXED

1024

1440.6

38.2

4169.5

26.17801047

gemma-2b-it

INT8-CW

32

91.4

40.8

4488.2

24.50980392

red-pajama-incite-chat-3b-v1

INT8-CW

32

120.5

41.3

4216.6

24.21307506

stable-zephyr-3b-dpo

INT8-CW

32

91.6

41.5

4233.2

24.09638554

phi-2

INT8-CW

32

102

41.7

4250.5

23.98081535

gemma-2b-it

INT8-CW

1024

814.8

41.8

4469.3

23.92344498

stablelm-3b-4e1t

INT8-CW

32

93.4

41.8

4216

23.92344498

dolly-v2-3b

INT8-CW

32

103.9

42.3

4232.8

23.64066194

red-pajama-incite-chat-3b-v1

INT8-CW

1024

1175.3

44.1

4622.9

22.67573696

stable-zephyr-3b-dpo

INT8-CW

1024

1201.8

44.4

4650.5

22.52252252

stablelm-3b-4e1t

INT8-CW

1024

1192

44.7

4621.4

22.37136465

phi-2

INT8-CW

1024

1164.9

44.8

4669

22.32142857

dolly-v2-3b

INT8-CW

1024

1198.1

45.4

4669.7

22.02643172

flan-t5-xxl

INT4-MIXED

33

87.6

47.9

13562.5

20.87682672

gpt-j-6b

INT4-MIXED

32

137

49.7

5273.5

20.12072435

chatglm3-6b

INT4-MIXED

32

141

50.1

5035.2

19.96007984

chatglm3-6b

INT4-MIXED

1024

2069.7

51.6

4837.8

19.37984496

flan-t5-xxl

INT4-MIXED

1139

464.4

53.5

15673

18.69158879

gpt-j-6b

INT4-MIXED

1024

2300.8

53.7

6412.9

18.62197393

codegen25-7b

INT4-MIXED

32

160.8

54.4

5357.6

18.38235294

llama-2-7b-gptq

INT4-MIXED

32

158.3

54.8

5420.1

18.24817518

chatglm3-6b-gptq

INT4-MIXED

32

160.3

55.2

5814.9

18.11594203

falcon-7b-instruct

INT4-MIXED

32

182.6

55.8

5630

17.92114695

phi-3-mini-4k-instruct

INT8-CW

32

114.1

56.8

5372.2

17.6056338

chatglm3-6b-gptq

INT4-MIXED

1024

2007.3

57.3

5519.6

17.45200698

falcon-7b-instruct

INT4-MIXED

1024

2569.8

57.3

5567.5

17.45200698

codegen25-7b

INT4-MIXED

1024

2422.8

58.7

5911.4

17.03577513

phi-3-mini-4k-instruct

INT8-CW

1024

1582.7

60.1

5715.3

16.63893511

qwen-7b-chat-gptq

INT4-MIXED

32

169.8

61

6144.3

16.39344262

llama-2-7b-gptq

INT4-MIXED

1024

2388.8

62.8

6655

15.92356688

mistral-7b-v0.1

INT4-MIXED

32

178.6

62.8

5675.7

15.92356688

qwen-7b-chat-gptq

INT4-MIXED

1024

2463.5

65.1

6735.6

15.3609831

mistral-7b-v0.1

INT4-MIXED

1024

2563.3

65.2

5739.7

15.33742331

zephyr-7b-beta

INT4-MIXED

32

178.3

66.9

5928.6

14.94768311

baichuan2-7b-chat

INT4-MIXED

32

156.7

68

6595.2

14.70588235

zephyr-7b-beta

INT4-MIXED

1024

2553

69.2

6112.7

14.45086705

gemma-7b-it

INT4-MIXED

32

219.2

69.8

7239.4

14.32664756

gemma-2b-it

FP16

32

94.8

70.1

7443.3

14.26533524

gemma-2b-it

FP16

1024

1399.2

70.9

7254.5

14.10437236

baichuan2-7b-chat

INT4-MIXED

1024

2979.5

72.3

7276.7

13.83125864

gemma-7b-it

INT4-MIXED

1024

3057.5

73.8

7809.5

13.5501355

phi-2

FP16

32

104.6

74

6848.1

13.51351351

red-pajama-incite-chat-3b-v1

FP16

32

109.9

74.3

6824.9

13.4589502

dolly-v2-3b

FP16

32

104.8

74.4

6831.8

13.44086022

stable-zephyr-3b-dpo

FP16

32

100.9

74.8

6876.2

13.36898396

stablelm-3b-4e1t

FP16

32

103.1

74.9

6962.5

13.35113485

phi-2

FP16

1024

1467.3

78.6

7643.7

12.72264631

dolly-v2-3b

FP16

1024

1508.2

79

7644.4

12.65822785

qwen-7b-chat

INT4-MIXED

32

158.1

79

7303.8

12.65822785

red-pajama-incite-chat-3b-v1

FP16

1024

1488.9

79

7617.8

12.65822785

stable-zephyr-3b-dpo

FP16

1024

1460.4

79.5

7673.9

12.57861635

stablelm-3b-4e1t

FP16

1024

1449

79.6

7643.6

12.56281407

qwen-7b-chat

INT4-MIXED

1024

2607.4

83.3

8072.1

12.00480192

gpt-j-6b

INT8-CW

32

152

84

7357.5

11.9047619

chatglm3-6b

INT8-CW

32

154.9

85.7

7465.7

11.66861144

chatglm3-6b

INT8-CW

1024

2664.8

87.1

7432

11.48105626

gpt-j-6b

INT8-CW

1024

2376.3

87.9

8775.2

11.37656428

falcon-7b-instruct

INT8-CW

32

186.8

94.7

8518

10.55966209

flan-t5-xxl

INT8-CW

33

174.8

94.7

19917.6

10.55966209

falcon-7b-instruct

INT8-CW

1024

2775.5

96.1

8317.8

10.40582726

codegen25-7b

INT8-CW

32

161.8

96.2

8295.9

10.3950104

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1