Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.3, 3 September 2025.

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

gemma-2b-it

INT4-MIXED

32

55.3

30

3315.5

33.33

gemma-2b-it

INT4-MIXED

1024

272

30.8

3106.4

32.47

stable-zephyr-3b-dpo

INT8-CW

32

53.2

33.1

4303.3

30.21

dolly-v2-3b

INT4-MIXED

32

147.5

35.7

3304.3

28.01

stable-zephyr-3b-dpo

INT8-CW

1024

322.1

35.8

4605

27.93

phi-2

INT4-MIXED

32

130.3

36.6

3251.2

27.32

stable-zephyr-3b-dpo

INT4-MIXED

32

131.9

38.5

3282.8

25.97

dolly-v2-3b

INT4-MIXED

1024

503.3

38.7

3672.2

25.84

phi-3-mini-4k-instruct

INT4-MIXED

32

95

39

3638.7

25.64

phi-2

INT4-MIXED

1024

473.2

39.8

3553.1

25.13

gemma-2b-it

INT8-CW

32

56.6

40.4

3711.3

24.75

stable-zephyr-3b-dpo

INT4-MIXED

1024

458.7

41.6

3558.9

24.04

gemma-2b-it

INT8-CW

1024

236.5

41.9

3732.8

23.87

phi-3-mini-4k-instruct

INT4-MIXED

1024

574

42.5

4161.3

23.53

stablelm-3b-4e1t

INT4-MIXED

32

125.5

43

3363.8

23.26

dolly-v2-3b

INT8-CW

32

105.8

43.7

4273.8

22.88

phi-2

INT8-CW

32

113.5

44.3

4174.4

22.57

stablelm-3b-4e1t

INT8-CW

32

102.6

46.2

4300.8

21.65

stablelm-3b-4e1t

INT4-MIXED

1024

412.2

47.2

3620.4

21.19

dolly-v2-3b

INT8-CW

1024

428.4

47.9

4617.3

20.88

phi-2

INT8-CW

1024

398.4

48.1

4480.6

20.79

flan-t5-xxl

INT4-MIXED

33

92.1

48.3

13700.3

20.70

chatglm3-6b

INT4-MIXED

32

79.3

49

4991.2

20.41

gpt-j-6b

INT4-MIXED

32

140.4

49.2

5042.7

20.33

stablelm-3b-4e1t

INT8-CW

1024

407.3

49.4

4600.8

20.24

chatglm3-6b

INT4-MIXED

1024

795.6

50.6

4623.4

19.76

gpt-j-6b

INT4-MIXED

1024

722

52.7

6260.2

18.98

flan-t5-xxl

INT4-MIXED

1139

261.1

53.9

15237.8

18.55

phi-3-mini-4k-instruct

INT8-CW

32

80

56.7

5305.6

17.64

phi-3-mini-4k-instruct

INT8-CW

1024

524

60.4

5629.9

16.56

chatglm3-6b

INT8-CW

32

88.6

67.9

7536.3

14.73

chatglm3-6b

INT8-CW

1024

479.2

69.8

7330.4

14.33

gpt-j-6b

INT8-CW

32

99.5

71

7422.3

14.08

falcon-7b-instruct

INT4-MIXED

32

113.6

71.5

5295.8

13.99

falcon-7b-instruct

INT4-MIXED

1024

943.5

73

5040.6

13.70

gpt-j-6b

INT8-CW

1024

557.3

75

8734.2

13.33

baichuan2-7b-chat

INT4-MIXED

32

152.7

75.5

6518.7

13.25

mistral-7b-v0.1

INT4-MIXED

32

137.3

77.6

5731.8

12.89

baichuan2-7b-chat

INT4-MIXED

1024

1583.3

79.1

7189.8

12.64

mistral-7b-v0.1

INT4-MIXED

1024

732.1

79.3

5595.9

12.61

zephyr-7b-beta

INT4-MIXED

32

118.5

80.4

5987

12.44

zephyr-7b-beta

INT4-MIXED

1024

724.4

82.4

5829.6

12.14

gemma-2b-it

FP16

32

91.4

83.1

6021.3

12.03

gemma-7b-it

INT4-MIXED

32

127.6

83.7

6323.6

11.95

gemma-2b-it

FP16

1024

395.4

83.8

6137.8

11.93

dolly-v2-3b

FP16

32

105.6

86

6710.9

11.63

phi-2

FP16

32

111.3

86.3

6934.2

11.59

gemma-7b-it

INT4-MIXED

1024

1112.1

86.8

6894.6

11.52

stable-zephyr-3b-dpo

FP16

32

145.3

90.2

6954

11.09

dolly-v2-3b

FP16

1024

602

90.8

7525.8

11.01

phi-2

FP16

1024

600.2

92

7523.2

10.87

stablelm-3b-4e1t

FP16

32

119.4

92.1

6861.9

10.86

qwen-7b-chat

INT4-MIXED

32

133.3

93.7

7386.6

10.67

stable-zephyr-3b-dpo

FP16

1024

604.7

94.5

7539.5

10.58

stablelm-3b-4e1t

FP16

1024

610.1

96.2

7450.4

10.40

qwen-7b-chat

INT4-MIXED

1024

736.9

98.1

7898

10.19

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

gemma-2b-it

INT4-MIXED

32

28.6

17.7

3378

56.50

dolly-v2-3b

INT4-MIXED

32

53.6

18.5

3424.4

54.05

gemma-2b-it

INT4-MIXED

1024

196.7

18.6

3390.2

53.76

dolly-v2-3b

INT4-MIXED

1024

392.5

21

3844.7

47.62

gemma-2b-it

INT8-CW

32

35

28.3

3749.2

35.34

gemma-2b-it

INT8-CW

1024

187.5

29.2

3893.1

34.25

dolly-v2-3b

INT8-CW

32

49.8

30.9

4453

32.36

dolly-v2-3b

INT8-CW

1024

363.7

33

4904.5

30.30

chatglm3-6b

INT4-MIXED

32

49

33.9

5220.1

29.50

flan-t5-xxl

INT4-MIXED

33

61.6

34.9

13722.1

28.65

chatglm3-6b

INT4-MIXED

1024

465.7

36

5222.1

27.78

flan-t5-xxl

INT4-MIXED

1139

223.5

39.1

15435.4

25.58

gpt-j-6b

INT4-MIXED

32

72.4

39.6

5314

25.25

gpt-j-6b

INT4-MIXED

1024

563.9

43.7

6473.2

22.88

zephyr-7b-beta

INT4-MIXED

32

67

48.9

6141.4

20.45

baichuan2-7b-chat

INT4-MIXED

32

65.7

50

6553.6

20.00

zephyr-7b-beta

INT4-MIXED

1024

461

51.6

6114.5

19.38

baichuan2-7b-chat

INT4-MIXED

1024

1605.2

54.4

7411.5

18.38

qwen-7b-chat

INT4-MIXED

32

77.7

58.6

7451.7

17.06

gemma-2b-it

FP16

32

66.9

62.4

6240

16.03

gemma-2b-it

FP16

1024

242

63

6373.7

15.87

qwen-7b-chat

INT4-MIXED

1024

476.6

63

8100.3

15.87

dolly-v2-3b

FP16

32

66

65.1

6938

15.36

chatglm3-6b

INT8-CW

32

78.6

66.2

7591.9

15.11

chatglm3-6b

INT8-CW

1024

430.1

68.6

7526.1

14.58

dolly-v2-3b

FP16

1024

433.2

68.8

7754

14.53

gpt-j-6b

INT8-CW

32

85.9

75.1

7469.5

13.32

gpt-j-6b

INT8-CW

1024

562.3

79.1

8937

12.64

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

dolly-v2-3b

INT4-MIXED

32

70.8

23.6

3580.5

42.37

gemma-2b-it

INT4-MIXED

32

58.8

24

4086.6

41.67

phi-2

INT4-MIXED

32

67.9

24.1

3782.8

41.49

gemma-2b-it

INT4-MIXED

1024

712.9

24.8

4186.5

40.32

gemma-2b-it

INT4-MIXED

4096

3135.2

25.3

4931.5

39.53

stable-zephyr-3b-dpo

INT4-MIXED

32

76.3

25.4

3810.1

39.37

gemma-2b-it

INT4-MIXED

2048

1440

25.5

4250

39.22

dolly-v2-3b

INT4-MIXED

1024

1068.1

26.4

4119.8

37.88

phi-2

INT4-MIXED

1024

1043

27

4328.3

37.04

stablelm-3b-4e1t

INT4-MIXED

32

68.5

27.9

3439.8

35.84

stable-zephyr-3b-dpo

INT4-MIXED

1024

1051.4

28.3

4325.2

35.34

stable-zephyr-3b-dpo

INT4-MIXED

2076

2677.9

31.1

4992.6

32.15

stablelm-3b-4e1t

INT4-MIXED

1024

1057.3

31.1

4182.4

32.15

phi-3-mini-4k-instruct

INT4-MIXED

32

94.5

32.3

4263.6

30.96

stablelm-3b-4e1t

INT4-MIXED

2048

2564

33.7

5129.6

29.67

phi-3-mini-4k-instruct

INT4-MIXED

1024

1366

35.9

4642.7

27.86

phi-3-mini-4k-instruct

INT4-MIXED

2048

3183

38.9

5685.2

25.71

stable-zephyr-3b-dpo

INT8-CW

32

84.6

39.2

4402.8

25.51

stablelm-3b-4e1t

INT4-MIXED

4096

6621.6

39.2

7302

25.51

stablelm-3b-4e1t

INT8-CW

32

82.9

39.3

4315.5

25.45

gemma-2b-it

INT8-CW

32

85.4

39.5

4886.5

25.32

phi-2

INT8-CW

32

95

40.1

4447.6

24.94

gemma-2b-it

INT8-CW

1024

791.5

40.2

4847.7

24.88

dolly-v2-3b

INT8-CW

32

93.6

40.4

4614.8

24.75

gemma-2b-it

INT8-CW

4096

3371.3

40.6

5799.9

24.63

gemma-2b-it

INT8-CW

2048

1594.2

40.9

5115.9

24.45

stable-zephyr-3b-dpo

INT8-CW

1024

1177.2

42.1

5124.4

23.75

stablelm-3b-4e1t

INT8-CW

1024

1174.4

42.3

5037

23.64

phi-2

INT8-CW

1024

1145.9

43

5183.6

23.26

dolly-v2-3b

INT8-CW

1024

1171.3

43.3

5172.3

23.09

stable-zephyr-3b-dpo

INT8-CW

2076

4070

44.8

5927.9

22.32

stablelm-3b-4e1t

INT8-CW

2048

2797.1

44.8

5950

22.32

phi-3-mini-4k-instruct

INT4-MIXED

4096

8047.2

45.1

8019.5

22.17

gpt-j-6b

INT4-MIXED

32

136

48.1

5306.2

20.79

flan-t5-xxl

INT4-MIXED

33

79.3

48.2

14071

20.75

chatglm3-6b

INT4-MIXED

32

147.4

48.7

5063.6

20.53

chatglm3-6b

INT4-MIXED

1024

1877.2

50.1

5267.6

19.96

stablelm-3b-4e1t

INT8-CW

4096

7094.2

50.1

8047.9

19.96

chatglm3-6b

INT4-MIXED

2048

4009.4

51.4

5677

19.46

chatglm3-6b

INT4-MIXED

4096

8779.6

51.5

7111.5

19.42

flan-t5-xxl

INT4-MIXED

1139

376.5

52

15851.9

19.23

gpt-j-6b

INT4-MIXED

1024

2112.5

52.2

6915.4

19.16

phi-3-mini-4k-instruct

INT8-CW

32

101.5

53

5272.3

18.87

falcon-7b-instruct

INT4-MIXED

32

172.6

55.7

5732.4

17.95

gpt-j-6b

INT4-MIXED

2048

4759.6

56.2

9203.4

17.79

phi-3-mini-4k-instruct

INT8-CW

1024

1556.7

56.4

6118

17.73

flan-t5-xxl

INT4-MIXED

2048

664

56.7

20616.4

17.64

falcon-7b-instruct

INT4-MIXED

1024

2316

57.1

6052.5

17.51

phi-3-mini-4k-instruct

INT8-CW

2048

3593.9

59.3

7165.8

16.86

mistral-7b-v0.1

INT4-MIXED

32

180

61

5781.8

16.39

mistral-7b-v0.1

INT4-MIXED

1024

2311.4

63.2

6227

15.82

mistral-7b-v0.1

INT4-MIXED

2048

4931.2

65.2

6883.6

15.34

phi-3-mini-4k-instruct

INT8-CW

4096

8851

65.4

9741.2

15.29

zephyr-7b-beta

INT4-MIXED

32

172

65.7

6387.4

15.22

mistral-7b-v0.1

INT4-MIXED

4096

10822.1

66.1

8790.2

15.13

baichuan2-7b-chat

INT4-MIXED

32

155.4

66.7

6847.8

14.99

flan-t5-xxl

INT4-MIXED

4096

1368.2

67.5

30669.4

14.81

zephyr-7b-beta

INT4-MIXED

1024

2313.7

67.8

6530.2

14.75

gemma-7b-it

INT4-MIXED

32

231.5

68.8

7265.1

14.53

gemma-2b-it

FP16

32

91.8

69.1

7829.9

14.47

gemma-2b-it

FP16

1024

1398.2

69.9

7656.5

14.31

gemma-2b-it

FP16

2048

2977.5

70.7

7982.2

14.14

baichuan2-7b-chat

INT4-MIXED

1024

2771.3

71

7753.8

14.08

phi-2

FP16

32

99.8

71.1

7066.5

14.06

gemma-2b-it

FP16

4096

6309.8

71.2

8607.5

14.04

stable-zephyr-3b-dpo

FP16

32

94.7

71.3

7072.5

14.03

stablelm-3b-4e1t

FP16

32

94.5

71.3

6980.7

14.03

dolly-v2-3b

FP16

32

98.8

71.6

7211

13.97

gemma-7b-it

INT4-MIXED

1024

2832

72.8

8247.3

13.74

baichuan2-7b-chat

INT4-MIXED

2048

5351.1

75.1

9136.2

13.32

phi-2

FP16

1024

1376.9

75.7

8097.9

13.21

stable-zephyr-3b-dpo

FP16

1024

1375.4

75.8

8106.4

13.19

stablelm-3b-4e1t

FP16

1024

1382.8

75.8

8020.3

13.19

dolly-v2-3b

FP16

1024

1405.3

76.3

8093.4

13.11

qwen-7b-chat

INT4-MIXED

32

145.9

76.4

7648.4

13.09

gemma-7b-it

INT4-MIXED

2048

6079.5

77

9493.9

12.99

stablelm-3b-4e1t

FP16

2048

3170.1

80.3

9339.1

12.45

stable-zephyr-3b-dpo

FP16

2076

9419.9

80.4

9101

12.44

qwen-7b-chat

INT4-MIXED

1024

2405.7

80.7

8483.9

12.39

gpt-j-6b

INT8-CW

32

142.1

81.9

7614

12.21

baichuan2-7b-chat

INT4-MIXED

3968

12508.6

83.2

9938

12.02

flan-t5-xxl

INT8-CW

33

123.9

83.2

23446.4

12.02

gemma-7b-it

INT4-MIXED

4096

14292.9

84.9

11672.5

11.78

qwen-7b-chat

INT4-MIXED

2048

5429.1

85.2

9822.4

11.74

chatglm3-6b

INT8-CW

32

143

85.4

7618.1

11.71

gpt-j-6b

INT8-CW

1024

2307.9

86.1

9248.2

11.61

chatglm3-6b

INT8-CW

1024

2472.9

86.8

7824.2

11.52

flan-t5-xxl

INT8-CW

1139

471

86.8

25692.8

11.52

chatglm3-6b

INT8-CW

2048

5232.6

88

8354.6

11.36

chatglm3-6b

INT8-CW

4096

11328.1

88.1

9675.8

11.35

stablelm-3b-4e1t

FP16

4096

7736.9

89.1

12203.8

11.22

gpt-j-6b

INT8-CW

2048

5140

90.2

11542.2

11.09

flan-t5-xxl

INT8-CW

2048

804.4

91.4

30371.1

10.94

falcon-7b-instruct

INT8-CW

32

171.9

93.7

8415.5

10.67

qwen-7b-chat

INT4-MIXED

4096

13230.6

93.7

12824.2

10.67

falcon-7b-instruct

INT8-CW

1024

2564.1

95.1

8741.2

10.52

phi-3-mini-4k-instruct

FP16

32

121.4

98.8

9057.3

10.12

baichuan2-7b-chat

INT8-CW

32

152

99.3

9002

10.07

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1