Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2024.4, 24 Oct. 2024

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd tok/sec

opt-125m-gptq

INT4-MIXED

1024

1610.2

146

9.4

106.38

opt-125m-gptq

INT4-MIXED

32

1087.6

60.8

9.5

105.26

tiny-llama-1.1b-chat

INT4-MIXED

32

1977

85.7

20.2

49.50

tiny-llama-1.1b-chat

INT4-MIXED

1024

1940.8

367.7

20.3

49.26

tiny-llama-1.1b-chat

INT8-CW

32

1855.2

70.2

21.8

45.87

qwen2-0.5b

INT4-MIXED

1024

3029.3

226.4

22.3

44.84

qwen2-0.5b

INT8-CW

1024

3093

222

22.3

44.84

qwen2-0.5b

FP16

1024

2509.5

234.3

22.4

44.64

qwen2-0.5b

FP16

32

1933.8

146.4

22.4

44.64

tiny-llama-1.1b-chat

INT8-CW

1024

2288.3

368.6

22.9

43.67

qwen2-0.5b

INT4-MIXED

32

2670.9

115.1

23

43.48

qwen2-0.5b

INT8-CW

32

2530

157.9

24.3

41.15

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

2677.3

186.1

27.9

35.84

qwen2-1.5b

INT4-MIXED

32

4515.1

179.8

28.7

34.84

qwen2-1.5b

INT4-MIXED

1024

4927.5

254.3

29.1

34.36

dolly-v2-3b

INT4-MIXED

32

2420.9

245.6

30.8

32.47

qwen2-1.5b

INT8-CW

32

4824.9

165.1

31.2

32.05

phi-2

INT4-MIXED

32

2523.5

233.9

31.5

31.75

qwen2-1.5b

INT8-CW

1024

5401.8

331.1

32

31.25

stable-zephyr-3b-dpo

INT4-MIXED

30

2816.2

151.3

32.9

30.40

red-pajama-incite-chat-3b-v1

INT4-MIXED

1020

2646.7

860.6

33

30.30

opt-2.7b

INT4-MIXED

31

2814.5

174.7

33.1

30.21

phi-2

INT4-MIXED

32

2363.6

236.6

34

29.41

stablelm-3b-4e1t

INT4-MIXED

32

3079.1

220

34

29.41

minicpm-1b-sft

INT4-MIXED

31

2971

185.1

34.1

29.33

minicpm-1b-sft

INT8-CW

31

3103.6

233.5

34.3

29.15

dolly-v2-3b

INT4-MIXED

1024

2152.3

876.6

34.7

28.82

phi-3-mini-4k-instruct

INT4-MIXED

38

2951

155.4

35.9

27.86

phi-2

INT4-MIXED

1024

2689.9

971.7

36.5

27.40

stablelm-3b-4e1t

INT4-MIXED

1024

3335.9

519.3

37.3

26.81

opt-2.7b

INT4-MIXED

937

3227.5

639.5

37.7

26.53

phi-3-mini-4k-instruct

INT4-MIXED

38

3289.7

161

37.9

26.39

gemma-2b-it

INT4-MIXED

32

4099.6

258.6

38

26.32

tiny-llama-1.1b-chat

FP16

32

3098.7

143.9

38.2

26.18

stable-zephyr-3b-dpo

INT4-MIXED

946

3548.5

453.9

38.8

25.77

tiny-llama-1.1b-chat

FP16

1024

3388.6

523

39

25.64

phi-2

INT4-MIXED

1024

2594.7

964.2

39.1

25.58

minicpm-1b-sft

FP16

31

3597.7

164.8

39.8

25.13

gemma-2b-it

INT4-MIXED

1024

5059.1

669.1

40.5

24.69

phi-3-mini-4k-instruct

INT4-MIXED

1061

3431.8

840.1

40.6

24.63

phi-3-mini-4k-instruct

INT4-MIXED

1061

3555.6

836.3

41.8

23.92

qwen2-1.5b

FP16

32

3979.4

111.8

42.5

23.53

red-pajama-incite-chat-3b-v1

INT8-CW

32

3639.9

199.1

43.6

22.94

qwen2-1.5b

FP16

1024

4569.8

250.5

44.1

22.68

dolly-v2-3b

INT8-CW

32

3727

248.2

44.5

22.47

opt-2.7b

INT8-CW

31

3746.3

175.6

44.6

22.42

stablelm-3b-4e1t

INT8-CW

32

3651.3

178

45.4

22.03

chatglm3-6b

INT4-MIXED

32

4050.3

88.1

47.4

21.10

phi-2

INT8-CW

32

3608.7

232

48.3

20.70

red-pajama-incite-chat-3b-v1

INT8-CW

1020

2951

816.6

48.4

20.66

stablelm-3b-4e1t

INT8-CW

1024

4142.8

658.7

48.5

20.62

opt-2.7b

INT8-CW

937

4019

640.7

48.8

20.49

stable-zephyr-3b-dpo

INT8-CW

30

3264.5

150.7

48.8

20.49

gemma-2b-it

INT8-CW

32

4874.7

249.4

48.9

20.45

chatglm3-6b

INT4-MIXED

32

3902.1

84.9

49.5

20.20

dolly-v2-3b

INT8-CW

1024

2931.4

865.2

49.7

20.12

gemma-2b-it

INT8-CW

1024

5834

545.4

50.7

19.72

vicuna-7b-v1.5

INT4-MIXED

32

4560.3

119.4

50.7

19.72

chatglm3-6b

INT4-MIXED

1024

4070.1

895.9

50.9

19.65

chatglm3-6b

INT4-MIXED

1024

3832.1

854.4

52

19.23

orca-mini-3b

INT4-MIXED

32

2345.5

132.8

52.2

19.16

phi-2

INT8-CW

1024

3511.6

989.7

53.1

18.83

chatglm2-6b

INT4-MIXED

32

4960.2

91.5

54.2

18.45

qwen1.5-7b-chat

INT4-MIXED

32

5936.5

195.7

54.8

18.25

stable-zephyr-3b-dpo

INT8-CW

946

3700.5

677.9

54.8

18.25

llama-2-7b-chat-hf

INT4-MIXED

32

4010.5

113.7

55.6

17.99

qwen-7b-chat

INT4-MIXED

32

7393

132.7

56.1

17.83

chatglm2-6b

INT4-MIXED

1024

5234.5

747.3

56.2

17.79

qwen2-7b

INT4-MIXED

32

7086.2

183

56.3

17.76

phi-3-mini-4k-instruct

INT8-CW

38

4574.4

132.9

56.9

17.57

llama-2-7b-gptq

INT4-MIXED

32

4134.1

120

58

17.24

chatglm3-6b-gptq

INT4-MIXED

32

4288.1

99.4

58.1

17.21

qwen2-7b

INT4-MIXED

1024

7716.4

734.9

58.3

17.15

mistral-7b-v0.1

INT4-MIXED

31

4509.3

115

58.6

17.06

codegen25-7b

INT4-MIXED

32

4211.8

136.5

59

16.95

qwen1.5-7b-chat

INT4-MIXED

1024

7007.2

792.7

60.6

16.50

chatglm3-6b-gptq

INT4-MIXED

1024

4545.4

860.3

60.9

16.42

phi-3-mini-4k-instruct

INT8-CW

1061

5087.2

1029.5

60.9

16.42

gpt-j-6b

INT4-MIXED

32

4013.5

316.1

61.1

16.37

mistral-7b-v0.1

INT4-MIXED

1007

876.5

984.4

61.7

16.21

llama-3-8b

INT4-MIXED

32

4357.1

132.8

62

16.13

llama-2-7b-chat-hf

INT4-MIXED

1024

3564.8

1163.7

62.5

16.00

qwen-7b-chat-gptq

INT4-MIXED

32

7384.1

217.8

62.9

15.90

zephyr-7b-beta

INT4-MIXED

32

5331.6

125

62.9

15.90

qwen-7b-chat

INT4-MIXED

32

6545.8

218.7

63

15.87

llama-3.1-8b

INT4-MIXED

31

5076.3

110.4

63.4

15.77

llama-3.1-8b

INT4-MIXED

31

4419

145.6

63.5

15.75

llama-2-7b-gptq

INT4-MIXED

1024

3434.2

921.6

64.4

15.53

llama-3-8b

INT4-MIXED

32

4886.7

132.3

65.4

15.29

stablelm-7b

INT4-MIXED

32

4768.4

132.1

65.5

15.27

codegen25-7b

INT4-MIXED

1024

1429.7

967.5

65.7

15.22

zephyr-7b-beta

INT4-MIXED

1024

5575.6

837.2

65.7

15.22

llama-3-8b

INT4-MIXED

32

4888.3

161.8

66.2

15.11

mistral-7b-v0.1

INT4-MIXED

31

4401.4

142.7

66.2

15.11

llama-3-8b

INT4-MIXED

1024

3782.4

1091.5

66.8

14.97

llama-3.1-8b

INT4-MIXED

31

4781.4

159.4

67

14.93

glm-4-9b

INT4-MIXED

33

6392.6

298.7

67.2

14.88

qwen-7b-chat

INT4-MIXED

1024

8472.8

1331.2

67.4

14.84

gpt-j-6b

INT4-MIXED

1024

1237.8

1638.8

68.1

14.68

llama-2-7b-chat-hf

INT4-MIXED

32

4497.4

153.2

68.7

14.56

llama-3-8b

INT4-MIXED

1024

4526.9

1060.3

69.8

14.33

mistral-7b-v0.1

INT4-MIXED

1007

3968.7

1033.1

69.9

14.31

llama-3-8b

INT4-MIXED

1024

4297.9

1041.7

70

14.29

orca-mini-3b

INT8-CW

32

3744.3

174

70.5

14.18

stablelm-7b

INT4-MIXED

1020

4402.1

1186.4

70.5

14.18

gemma-2b-it

FP16

32

5806.3

117.6

71.8

13.93

glm-4-9b

INT4-MIXED

1025

7003.5

1354.2

72.5

13.79

gemma-2b-it

FP16

1024

6804.7

490.6

73.4

13.62

stablelm-3b-4e1t

FP16

32

6217

207.5

75.2

13.30

llama-2-7b-chat-hf

INT4-MIXED

1024

4320.9

1247.7

75.8

13.19

gemma-7b-it

INT4-MIXED

32

8050.6

134.6

76.1

13.14

gemma-7b-it

INT4-MIXED

32

7992.6

146.4

76.1

13.14

qwen-7b-chat

INT4-MIXED

1024

5712.7

1144.4

77.1

12.97

stablelm-3b-4e1t

FP16

1024

6722.9

491.4

77.7

12.87

chatglm2-6b

INT8-CW

32

6856.2

111.6

78.9

12.67

opt-2.7b

FP16

31

5377.5

138

79.6

12.56

chatglm2-6b

INT8-CW

1024

7133.8

1012.1

81

12.35

red-pajama-incite-chat-3b-v1

FP16

32

5672.5

211

81.2

12.32

gemma-7b-it

INT4-MIXED

1024

9399.5

1726.7

82.2

12.17

dolly-v2-3b

FP16

32

5573

230.6

82.5

12.12

gemma-7b-it

INT4-MIXED

1024

9460

1241.2

82.7

12.09

opt-2.7b

FP16

937

4727.8

618.8

84.6

11.82

baichuan2-7b-chat

INT4-MIXED

32

5782.4

274.1

84.8

11.79

phi-2

FP16

32

5497.3

244.9

85

11.76

stable-zephyr-3b-dpo

FP16

30

5714.8

173.1

86

11.63

red-pajama-incite-chat-3b-v1

FP16

1020

5262.2

817.4

86.2

11.60

dolly-v2-3b

FP16

1024

2376.1

935.5

87

11.49

qwen-7b-chat

INT4-MIXED

32

8597.4

226.2

87.7

11.40

phi-2

FP16

1024

4063.9

969.8

89.7

11.15

chatglm3-6b

INT8-CW

32

6158.8

123.4

89.8

11.14

stable-zephyr-3b-dpo

FP16

946

5337.1

781.4

90.5

11.05

baichuan2-7b-chat

INT4-MIXED

1024

807.4

1725.7

91.8

10.89

vicuna-7b-v1.5

INT8-CW

32

7391

171.3

92.5

10.81

chatglm3-6b

INT8-CW

1024

550.7

1210.9

93.3

10.72

phi-3-mini-4k-instruct

FP16

38

8299.3

142

94.1

10.63

qwen2-7b

INT8-CW

32

9941.1

139.1

94.9

10.54

qwen-7b-chat-gptq

INT4-MIXED

1024

6545

1103.9

95.8

10.44

qwen2-7b

INT8-CW

1024

10575.1

1183

96.7

10.34

qwen-7b-chat

INT4-MIXED

1024

6777.4

1309.6

96.9

10.32

vicuna-7b-v1.5

INT8-CW

1024

8013.7

1154.6

96.9

10.32

phi-3-medium-4k-instruct

INT4-MIXED

38

8212.8

448.3

97

10.31

zephyr-7b-beta

INT8-CW

32

7888

144.8

97.4

10.27

phi-3-mini-4k-instruct

FP16

1061

8814.8

1195.7

98.7

10.13

zephyr-7b-beta

INT8-CW

1024

8136.7

1191.6

99.4

10.06

llama-2-13b-chat-hf

INT4-MIXED

32

6927.5

165.3

99.9

10.01

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd tok/sec

opt-125m-gptq

INT4-MIXED

1024

1513.6

81.9

7.8

128.21

opt-125m-gptq

INT4-MIXED

32

979.9

50.4

7.9

126.58

tiny-llama-1.1b-chat

INT4-MIXED

1024

1943.3

176.3

16.8

59.52

tiny-llama-1.1b-chat

INT4-MIXED

32

1982.2

59.5

17.1

58.48

qwen2-0.5b

INT4-MIXED

32

2678

117.3

18.7

53.48

tiny-llama-1.1b-chat

INT8-CW

32

2080.9

59.4

19

52.63

qwen2-0.5b

INT4-MIXED

1024

3036.1

165.5

19.2

52.08

tiny-llama-1.1b-chat

INT8-CW

1024

2287

241.4

19.6

51.02

qwen2-0.5b

INT8-CW

1024

3084.9

172.1

20

50.00

qwen2-0.5b

INT8-CW

32

2518

105.5

21.4

46.73

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

2793.6

141.8

23.9

41.84

qwen2-1.5b

INT4-MIXED

32

4515.4

118.7

24

41.67

qwen2-1.5b

INT4-MIXED

1024

4930.1

229.6

24.3

41.15

dolly-v2-3b

INT4-MIXED

32

2486.1

174

25.4

39.37

phi-2

INT4-MIXED

32

2552.9

210.6

26.9

37.17

red-pajama-incite-chat-3b-v1

INT4-MIXED

1020

2934.1

464.5

27.5

36.36

qwen2-1.5b

INT8-CW

32

4813.4

119.1

27.8

35.97

opt-2.7b

INT4-MIXED

31

3172.5

131.9

28.5

35.09

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

3038.2

447.1

28.6

34.97

dolly-v2-3b

INT4-MIXED

1024

2947.4

409

28.8

34.72

qwen2-1.5b

INT8-CW

1024

5394.8

327.9

29.3

34.13

stable-zephyr-3b-dpo

INT4-MIXED

30

2728.1

131.2

29.8

33.56

phi-2

INT4-MIXED

32

2805.1

208.3

30.2

33.11

minicpm-1b-sft

INT8-CW

31

3104.2

147.8

30.9

32.36

phi-2

INT4-MIXED

1024

3058.9

602.9

31.1

32.15

minicpm-1b-sft

INT4-MIXED

31

2970.1

183.7

31.1

32.15

stablelm-3b-4e1t

INT4-MIXED

32

3077.1

183.2

31.6

31.65

opt-2.7b

INT4-MIXED

937

3416.7

429.4

31.6

31.65

stable-zephyr-3b-dpo

INT4-MIXED

946

3211.8

428.8

32.3

30.96

phi-3-mini-4k-instruct

INT4-MIXED

31

3014.5

116

32.5

30.77

phi-3-mini-4k-instruct

INT4-MIXED

38

2957.4

153.9

32.5

30.77

phi-2

INT4-MIXED

1024

3278.9

613.3

33.4

29.94

phi-3-mini-4k-instruct

INT4-MIXED

38

3288.5

152.9

33.4

29.94

phi-3-mini-4k-instruct

INT4-MIXED

31

3265.1

123.6

34.1

29.33

gemma-2b-it

INT4-MIXED

32

4162.1

208.8

34.2

29.24

stablelm-3b-4e1t

INT4-MIXED

1024

3525.8

524.5

35

28.57

phi-3-mini-4k-instruct

INT4-MIXED

1061

3427.8

777.5

36.5

27.40

phi-3-mini-4k-instruct

INT4-MIXED

1023

3405.4

554.1

36.7

27.25

gemma-2b-it

INT4-MIXED

1024

5053.1

354.8

36.9

27.10

minicpm-1b-sft

FP16

31

3595.5

124.9

36.9

27.10

phi-3-mini-4k-instruct

INT4-MIXED

1061

3547.2

755.8

37.1

26.95

phi-3-mini-4k-instruct

INT4-MIXED

1023

3528.4

536.4

37.4

26.74

red-pajama-incite-chat-3b-v1

INT8-CW

32

3747.7

189.9

38.1

26.25

opt-2.7b

INT8-CW

31

3810.7

145.7

38.5

25.97

chatglm3-6b

INT4-MIXED

32

4120.7

67.3

38.7

25.84

dolly-v2-3b

INT8-CW

32

3747

188.4

39.2

25.51

chatglm3-6b

INT4-MIXED

32

4482.9

69.9

40.7

24.57

chatglm3-6b

INT4-MIXED

1024

4146

606.8

41

24.39

opt-2.7b

INT8-CW

937

4458.9

587.8

41.8

23.92

red-pajama-incite-chat-3b-v1

INT8-CW

1024

4088.4

634.1

41.9

23.87

red-pajama-incite-chat-3b-v1

INT8-CW

1020

4086.8

653.4

42

23.81

phi-2

INT8-CW

32

3794.6

202.7

42.1

23.75

chatglm3-6b

INT4-MIXED

1024

4446.7

598.6

42.3

23.64

stablelm-3b-4e1t

INT8-CW

32

3652.5

146

42.6

23.47

stable-zephyr-3b-dpo

INT8-CW

30

3768.6

151.9

42.6

23.47

dolly-v2-3b

INT8-CW

1024

4092

603.1

42.9

23.31

stablelm-3b-4e1t

INT8-CW

1024

4143.2

671.7

45.2

22.12

gemma-2b-it

INT8-CW

32

4878.4

221.6

45.6

21.93

phi-2

INT8-CW

1024

4153.6

810.3

46

21.74

llama-2-7b-chat-hf

INT4-MIXED

32

4394.6

109.7

46.2

21.65

chatglm3-6b-gptq

INT4-MIXED

32

5218.9

79.7

46.7

21.41

stable-zephyr-3b-dpo

INT8-CW

946

4360.1

627.8

46.8

21.37

vicuna-7b-v1.5

INT4-MIXED

32

4482.3

101.2

47.2

21.19

gemma-2b-it

INT8-CW

1024

5837.1

507.1

48

20.83

llama-2-7b-gptq

INT4-MIXED

32

4734.3

102.8

48.1

20.79

orca-mini-3b

INT4-MIXED

32

2720.1

132

48.1

20.79

qwen-7b-chat

INT4-MIXED

32

7803.7

178.5

48.3

20.70

mistral-7b-v0.1

INT4-MIXED

31

4537.5

99

48.5

20.62

codegen25-7b

INT4-MIXED

32

4723.3

108.5

48.5

20.62

chatglm3-6b-gptq

INT4-MIXED

1024

5150.8

614.2

48.8

20.49

mistral-7b-v0.1

INT4-MIXED

32

4572

102.9

48.8

20.49

llama-3-8b

INT4-MIXED

33

4991.2

252.2

50.9

19.65

qwen-7b-chat-gptq

INT4-MIXED

32

8088.4

212.6

51

19.61

chatglm2-6b

INT4-MIXED

32

4960.6

105.5

51.2

19.53

gpt-j-6b

INT4-MIXED

32

4699.5

259.2

51.4

19.46

llama-3.1-8b

INT4-MIXED

31

4897.8

106.9

51.5

19.42

llama-3-8b

INT4-MIXED

32

4999.7

105.9

51.6

19.38

qwen-7b-chat

INT4-MIXED

32

8085.9

193.5

51.7

19.34

falcon-7b-instruct

INT4-MIXED

32

5416.2

175

52.5

19.05

mistral-7b-v0.1

INT4-MIXED

1007

4772.6

803

52.6

19.01

qwen1.5-7b-chat

INT4-MIXED

32

6027.3

174.9

53

18.87

mistral-7b-v0.1

INT4-MIXED

1024

4775

717.6

53

18.87

llama-2-7b-chat-hf

INT4-MIXED

1024

4976.5

992.1

53.1

18.83

qwen2-7b

INT4-MIXED

32

7087.1

138.1

53.3

18.76

llama-2-7b-gptq

INT4-MIXED

1024

5351.2

711.6

53.7

18.62

llama-3-8b

INT4-MIXED

32

5472.8

109.4

53.7

18.62

phi-3-mini-4k-instruct

INT8-CW

38

4575.3

115.9

53.7

18.62

stablelm-7b

INT4-MIXED

32

5213.7

128.5

53.8

18.59

phi-3-mini-4k-instruct

INT8-CW

31

4571.8

118.9

53.8

18.59

llama-3-8b

INT4-MIXED

33

5480.4

246.8

53.9

18.55

llama-3-8b

INT4-MIXED

32

5528.2

144.9

54.3

18.42

llama-3.1-8b

INT4-MIXED

31

5377.3

112.8

54.3

18.42

chatglm2-6b

INT4-MIXED

1024

5232.3

759.6

54.6

18.32

llama-3.1-8b

INT4-MIXED

31

5440.4

126.4

54.8

18.25

llama-3-8b

INT4-MIXED

33

5532.8

248.2

54.9

18.21

codegen25-7b

INT4-MIXED

1024

5412.9

714.8

55

18.18

mistral-7b-v0.1

INT4-MIXED

32

4998.5

117.3

55.2

18.12

mistral-7b-v0.1

INT4-MIXED

31

5000.2

122.4

55.6

17.99

llama-3-8b

INT4-MIXED

1024

5594

953.5

56.6

17.67

gpt-j-6b

INT4-MIXED

1024

5323.8

1254

56.8

17.61

llama-3-8b

INT4-MIXED

1025

5596.7

1192.3

56.8

17.61

qwen2-7b

INT4-MIXED

1024

7722.1

714.2

57

17.54

phi-3-mini-4k-instruct

INT8-CW

1023

5067.1

818.5

57.4

17.42

phi-3-mini-4k-instruct

INT8-CW

1061

5086.1

975.1

57.4

17.42

llama-2-7b-chat-hf

INT4-MIXED

32

5087.7

126.2

57.9

17.27

stablelm-7b

INT4-MIXED

1020

5780.5

1248.4

59

16.95

llama-3-8b

INT4-MIXED

1025

6088.9

1381.5

59

16.95

llama-3-8b

INT4-MIXED

1024

6084.8

931.2

59.2

16.89

llama-3-8b

INT4-MIXED

1025

6141.2

1494.3

59.4

16.84

llama-3-8b

INT4-MIXED

1024

6133.8

1075.2

59.6

16.78

mistral-7b-v0.1

INT4-MIXED

1024

5472.6

794.3

59.7

16.75

zephyr-7b-beta

INT4-MIXED

32

5328.5

103.5

59.8

16.72

falcon-7b-instruct

INT4-MIXED

1024

5677.5

686.2

59.8

16.72

mistral-7b-v0.1

INT4-MIXED

1007

5243.5

1074

59.9

16.69

qwen1.5-7b-chat

INT4-MIXED

1024

7096.7

1132.7

60

16.67

qwen-7b-chat

INT4-MIXED

1024

8872.6

792.8

61

16.39

qwen-7b-chat

INT4-MIXED

1024

9164.4

822.6

63.3

15.80

orca-mini-3b

INT8-CW

32

4221.7

170.6

63.5

15.75

llama-2-7b-chat-hf

INT4-MIXED

1024

5708.1

1397.9

63.6

15.72

glm-4-9b

INT4-MIXED

33

6402.9

307.1

63.8

15.67

zephyr-7b-beta

INT4-MIXED

1024

5572.4

1156.4

64.3

15.55

glm-4-9b

INT4-MIXED

32

6383.1

256.2

64.5

15.50

baichuan2-7b-chat

INT4-MIXED

32

5926.3

191.8

65.8

15.20

opt-2.7b

FP16

31

5886

112.2

68

14.71

dolly-v2-3b

FP16

32

6161.5

147.5

69.5

14.39

red-pajama-incite-chat-3b-v1

FP16

32

6265.4

146.2

69.6

14.37

glm-4-9b

INT4-MIXED

1024

6994.5

1013.7

69.8

14.33

opt-2.7b

FP16

937

6345

379.5

71.6

13.97

glm-4-9b

INT4-MIXED

1025

7014.9

1416.8

72.5

13.79

phi-2

FP16

32

6204.7

189.2

72.9

13.72

stable-zephyr-3b-dpo

FP16

30

6221.4

159.7

73

13.70

dolly-v2-3b

FP16

1024

6669.9

424.3

73.3

13.64

red-pajama-incite-chat-3b-v1

FP16

1020

6658.8

484.7

73.4

13.62

stablelm-3b-4e1t

FP16

32

6216.3

145.4

73.5

13.61

qwen-7b-chat

INT4-MIXED

32

9294.9

144.4

73.8

13.55

red-pajama-incite-chat-3b-v1

FP16

1024

6755.1

469.1

73.9

13.53

qwen-7b-chat-gptq

INT4-MIXED

1024

9152.1

827.2

75.1

13.32

gemma-7b-it

INT4-MIXED

32

7991.4

128.6

75.8

13.19

chatglm2-6b

INT8-CW

32

6854.4

110.2

76.3

13.11

chatglm3-6b

INT8-CW

32

6754.8

112.3

76.4

13.09

stable-zephyr-3b-dpo

FP16

946

6940

428.6

76.7

13.04

baichuan2-7b-chat

INT4-MIXED

1024

6930.2

1229.5

76.7

13.04

gemma-7b-it

INT4-MIXED

32

8061.5

125.6

76.7

13.04

stablelm-3b-4e1t

FP16

1024

6722.9

480.8

77

12.99

phi-2

FP16

1024

6709.4

624.1

77.2

12.95

chatglm2-6b

INT8-CW

1024

7132.9

1361.9

78.7

12.71

chatglm3-6b

INT8-CW

1024

7037.5

1389.2

78.7

12.71

qwen-7b-chat

INT4-MIXED

1024

10374.1

1357.5

81.1

12.33

gemma-7b-it

INT4-MIXED

1024

9398

1268.5

82.7

12.09

gemma-7b-it

INT4-MIXED

1024

9469.5

1268

83.2

12.02

gpt-j-6b

INT8-CW

32

7126.5

255.2

87.2

11.47

falcon-7b-instruct

INT8-CW

32

8287.6

131.1

88.4

11.31

llama-2-7b-chat-hf

INT8-CW

32

7474.9

139.5

89.7

11.15

codegen25-7b

INT8-CW

32

7559.4

138

90.8

11.01

vicuna-7b-v1.5

INT8-CW

32

7390.8

136.6

90.8

11.01

falcon-7b-instruct

INT8-CW

1024

8546.8

1205.9

92.2

10.85

stablelm-7b

INT8-CW

32

8356.4

143

92.4

10.82

qwen2-7b

INT8-CW

32

9940.7

132

92.5

10.81

baichuan2-13b-chat

INT4-MIXED

32

9879.2

184.9

93.3

10.72

phi-3-mini-4k-instruct

FP16

38

8290

125.2

93.4

10.71

phi-3-mini-4k-instruct

FP16

31

8290.5

109.5

93.5

10.70

gpt-j-6b

INT8-CW

1024

7759

1996.8

93.9

10.65

llama-2-7b-chat-hf

INT8-CW

1024

8097.8

1701.6

94.7

10.56

phi-3-medium-4k-instruct

INT4-MIXED

38

8210.4

527

95.1

10.52

mistral-7b-v0.1

INT8-CW

31

7882.4

128.6

95.1

10.52

vicuna-7b-v1.5

INT8-CW

1024

8013.2

1558.1

95.1

10.52

mistral-7b-v0.1

INT8-CW

32

7886.9

140.6

95.2

10.50

qwen2-7b

INT8-CW

1024

10573.1

1564.5

95.3

10.49

codegen25-7b

INT8-CW

1024

8253.1

1526.3

95.7

10.45

zephyr-7b-beta

INT8-CW

32

7785.3

144.4

95.8

10.44

stablelm-7b

INT8-CW

1020

8921.9

1845

96.9

10.32

mistral-7b-v0.1

INT8-CW

1007

8127.4

1648.4

97.4

10.27

qwen-7b-chat

INT8-CW

32

11083.2

140.6

97.7

10.24

qwen1.5-7b-chat

INT8-CW

32

8870

156.4

98.1

10.19

llama-3.1-8b

INT8-CW

31

8600.3

189.2

98.4

10.16

mistral-7b-v0.1

INT8-CW

1024

8134.7

1554.1

98.4

10.16

qwen-14b-chat

INT4-MIXED

32

9876.2

192.3

98.6

10.14

zephyr-7b-beta

INT8-CW

1024

8035.2

1580.4

98.8

10.12

llama-3-8b

INT8-CW

32

8694.2

150.7

99.5

10.05

llama-3-8b

INT8-CW

33

8700.4

175.4

99.8

10.02

phi-3-mini-4k-instruct

FP16

1023

8795.2

601.3

99.9

10.01

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd tok/sec

opt-125m-gptq

INT4-MIXED

32

965.9

29

7.7

129.87

opt-125m-gptq

INT4-MIXED

1024

1507.9

113.1

7.8

128.21

tiny-llama-1.1b-chat

INT4-MIXED

32

1831.8

46.5

16.7

59.88

tiny-llama-1.1b-chat

INT4-MIXED

1024

1806.3

635

17.8

56.18

qwen2-0.5b

INT4-MIXED

32

2551.7

61.4

18.3

54.64

qwen2-0.5b

INT4-MIXED

1024

2976.6

356.1

19.2

52.08

tiny-llama-1.1b-chat

INT8-CW

32

1987.4

56

21.6

46.30

tiny-llama-1.1b-chat

INT8-CW

1024

2209.1

772.7

22.6

44.25

qwen2-0.5b

INT8-CW

32

2484.9

57.3

22.8

43.86

qwen2-0.5b

INT8-CW

1024

3102.5

407.1

23.9

41.84

qwen2-1.5b

INT4-MIXED

32

4265.2

71.7

25.5

39.22

qwen2-1.5b

INT4-MIXED

1024

4884.5

862.4

26.8

37.31

dolly-v2-3b

INT4-MIXED

32

2401.3

89.6

27.5

36.36

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

2511.5

78.6

28.2

35.46

phi-2

INT4-MIXED

32

2279.5

95.7

29.1

34.36

minicpm-1b-sft

INT4-MIXED

31

2759.9

104.4

30.9

32.36

phi-2

INT4-MIXED

32

2620.1

100.8

31

32.26

stable-zephyr-3b-dpo

INT4-MIXED

30

2636.5

86.8

31.7

31.55

dolly-v2-3b

INT4-MIXED

1024

3137.1

1782.9

32.2

31.06

red-pajama-incite-chat-3b-v1

INT4-MIXED

1020

3118.5

1831.7

33.3

30.03

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

2862.7

1821.1

33.5

29.85

qwen2-1.5b

INT8-CW

32

4831.2

87

33.8

29.59

opt-2.7b

INT4-MIXED

31

2898.3

73.2

33.9

29.50

phi-2

INT4-MIXED

1024

2797.4

1887

34

29.41

orca-mini-3b

INT4-MIXED

32

2877.8

100.3

35

28.57

stablelm-3b-4e1t

INT4-MIXED

32

2669.4

94.7

35.3

28.33

qwen2-1.5b

INT8-CW

1024

5455.8

1047.6

35.3

28.33

minicpm-1b-sft

INT8-CW

31

3104.1

103.5

35.3

28.33

phi-2

INT4-MIXED

1024

3039.8

1917.4

35.9

27.86

stable-zephyr-3b-dpo

INT4-MIXED

946

3411.4

1695

37

27.03

gemma-2b-it

INT4-MIXED

32

3991.7

116.1

37.9

26.39

opt-2.7b

INT4-MIXED

937

3617.5

1764.9

38.2

26.18

phi-3-mini-4k-instruct

INT4-MIXED

31

2935.3

111.6

38.2

26.18

phi-3-mini-4k-instruct

INT4-MIXED

38

3102.4

134

38.4

26.04

phi-3-mini-4k-instruct

INT4-MIXED

31

2986.1

114.1

38.9

25.71

phi-3-mini-4k-instruct

INT4-MIXED

38

2977.4

131.1

39

25.64

gemma-2b-it

INT4-MIXED

1024

4973.3

1249.2

39.7

25.19

stablelm-3b-4e1t

INT4-MIXED

1024

3196.9

2045.4

39.9

25.06

dolly-v2-3b

INT8-CW

32

3490.2

107.4

41.5

24.10

red-pajama-incite-chat-3b-v1

INT8-CW

32

3457.9

105

42.5

23.53

opt-2.7b

INT8-CW

31

3686.8

107.5

44.1

22.68

phi-2

INT8-CW

32

3554.9

116.6

44.1

22.68

phi-3-mini-4k-instruct

INT4-MIXED

1023

3390.7

2277.1

44.2

22.62

phi-3-mini-4k-instruct

INT4-MIXED

1061

3643.6

2485

44.4

22.52

phi-3-mini-4k-instruct

INT4-MIXED

1023

3516.4

2280.9

44.5

22.47

phi-3-mini-4k-instruct

INT4-MIXED

1061

3537.2

2522.4

44.7

22.37

orca-mini-3b

INT4-MIXED

1024

3557.3

1898.9

45

22.22

minicpm-1b-sft

FP16

31

3814.4

97.9

45.4

22.03

stablelm-3b-4e1t

INT8-CW

32

3486.9

100.5

46.1

21.69

stable-zephyr-3b-dpo

INT8-CW

30

3516.7

101.9

46.1

21.69

dolly-v2-3b

INT8-CW

1024

4265.9

2178.6

46.2

21.65

red-pajama-incite-chat-3b-v1

INT8-CW

1020

3979.1

2219.7

47.2

21.19

red-pajama-incite-chat-3b-v1

INT8-CW

1024

3975.5

2199.7

47.3

21.14

opt-2.7b

INT8-CW

937

4358.6

1981.8

48.4

20.66

phi-2

INT8-CW

1024

4058.1

2280.1

48.9

20.45

gemma-2b-it

INT8-CW

32

4786.8

119.8

49.4

20.24

chatglm3-6b

INT4-MIXED

32

4141.5

166.6

49.7

20.12

stablelm-3b-4e1t

INT8-CW

1024

4054.8

2243.5

50.7

19.72

stable-zephyr-3b-dpo

INT8-CW

946

4521.8

1816.4

51.3

19.49

gemma-2b-it

INT8-CW

1024

5810.7

1580

51.3

19.49

chatglm3-6b

INT4-MIXED

32

4651.4

164.7

51.6

19.38

chatglm3-6b

INT4-MIXED

1024

4235.1

2818.7

52.3

19.12

orca-mini-3b

INT8-CW

32

4162

109.2

53.3

18.76

chatglm3-6b

INT4-MIXED

1024

4783.8

2869

54.4

18.38

gpt-j-6b

INT4-MIXED

32

4667.3

176.7

56.3

17.76

chatglm3-6b-gptq

INT4-MIXED

32

5369.4

173.9

58.9

16.98

llama-2-7b-chat-hf

INT4-MIXED

32

4280

173.2

60.1

16.64

phi-3-mini-4k-instruct

INT8-CW

31

4585.1

123

60.5

16.53

phi-3-mini-4k-instruct

INT8-CW

38

4597

152

60.5

16.53

chatglm2-6b

INT4-MIXED

32

4847.8

158.7

60.6

16.50

vicuna-7b-v1.5

INT4-MIXED

32

4476.9

178.2

61.2

16.34

chatglm3-6b-gptq

INT4-MIXED

1024

5217.6

2863.7

61.3

16.31

mistral-7b-v0.1

INT4-MIXED

31

4413.6

194

61.7

16.21

qwen2-7b

INT4-MIXED

32

7044.7

184.4

61.7

16.21

mistral-7b-v0.1

INT4-MIXED

32

4427.6

193.3

61.8

16.18

orca-mini-3b

INT8-CW

1024

4821.6

2239.1

62

16.13

codegen25-7b

INT4-MIXED

32

4687.2

176.2

62.7

15.95

chatglm2-6b

INT4-MIXED

1024

5165.9

3148

63

15.87

llama-2-7b-gptq

INT4-MIXED

32

4632.8

175.2

63.4

15.77

stablelm-7b

INT4-MIXED

32

5219.5

206.3

63.4

15.77

qwen-7b-chat

INT4-MIXED

32

7805.6

193.8

63.6

15.72

gpt-j-6b

INT4-MIXED

1024

5314.9

3111.8

63.6

15.72

qwen2-7b

INT4-MIXED

1024

7716.2

3548.3

64.1

15.60

llama-3-8b

INT4-MIXED

32

4910.9

204.8

64.7

15.46

mistral-7b-v0.1

INT4-MIXED

1024

4720.8

3667.1

64.8

15.43

mistral-7b-v0.1

INT4-MIXED

1007

4704.7

3685.4

64.9

15.41

llama-3.1-8b

INT4-MIXED

31

4850.3

211.5

64.9

15.41

phi-3-mini-4k-instruct

INT8-CW

1023

5128.6

2815.2

65.7

15.22

phi-3-mini-4k-instruct

INT8-CW

1061

5155

3407.9

65.9

15.17

mistral-7b-v0.1

INT4-MIXED

32

4939.3

192

66.5

15.04

llama-3-8b

INT4-MIXED

33

4919.4

261.9

67.2

14.88

llama-2-7b-chat-hf

INT4-MIXED

1024

4948.2

3811

67.3

14.86

qwen1.5-7b-chat

INT4-MIXED

32

5943.1

180.5

67.7

14.77

qwen-7b-chat-gptq

INT4-MIXED

32

8057

187

68.1

14.68

llama-3-8b

INT4-MIXED

32

5503.5

198.4

68.1

14.68

qwen-7b-chat

INT4-MIXED

32

8091.6

185.9

68.1

14.68

llama-3-8b

INT4-MIXED

1024

5569.1

3920.5

68.2

14.66

llama-3.1-8b

INT4-MIXED

31

5358.6

201

68.2

14.66

stablelm-7b

INT4-MIXED

1020

5804.4

3726.6

68.8

14.53

llama-3.1-8b

INT4-MIXED

31

5452.6

202.9

68.8

14.53

llama-2-7b-chat-hf

INT4-MIXED

32

5023

165.7

69

14.49

llama-3-8b

INT4-MIXED

32

5413.6

202

69.1

14.47

llama-3-8b

INT4-MIXED

33

5440.4

262.1

69.2

14.45

codegen25-7b

INT4-MIXED

1024

5434.6

3513.2

69.9

14.31

mistral-7b-v0.1

INT4-MIXED

1024

5614.9

3819.1

70

14.29

mistral-7b-v0.1

INT4-MIXED

31

4927.8

205

70.5

14.18

llama-3-8b

INT4-MIXED

33

5498.9

270.7

70.6

14.16

llama-3-8b

INT4-MIXED

1025

5577.4

4271.2

70.6

14.16

llama-2-7b-gptq

INT4-MIXED

1024

5302.2

3529.4

70.7

14.14

zephyr-7b-beta

INT4-MIXED

32

5212.4

190.6

71.2

14.04

llama-3-8b

INT4-MIXED

1024

6161.1

3918

71.5

13.99

llama-3-8b

INT4-MIXED

1025

6098

4441.8

72.3

13.83

llama-3-8b

INT4-MIXED

1024

6071.7

3972.2

72.4

13.81

mistral-7b-v0.1

INT4-MIXED

1007

5224.1

4153.4

73.8

13.55

llama-3-8b

INT4-MIXED

1025

6156.9

4357

73.9

13.53

zephyr-7b-beta

INT4-MIXED

1024

5511.6

3978

74.4

13.44

opt-2.7b

FP16

31

9220.3

107.8

74.7

13.39

dolly-v2-3b

FP16

32

6058.9

109.9

74.7

13.39

qwen1.5-7b-chat

INT4-MIXED

1024

7063.2

3791.7

75

13.33

qwen-7b-chat

INT4-MIXED

1024

8919.5

3763.9

75

13.33

red-pajama-incite-chat-3b-v1

FP16

32

6036.5

107.5

75.9

13.18

llama-2-7b-chat-hf

INT4-MIXED

1024

5716.8

4231.7

76.2

13.12

phi-2

FP16

32

6090.1

115.2

77.1

12.97

stable-zephyr-3b-dpo

FP16

30

6113.1

112.1

78.6

12.72

qwen-7b-chat

INT4-MIXED

1024

9212.9

3857.4

78.6

12.72

stablelm-3b-4e1t

FP16

32

6065.4

110.2

78.7

12.71

opt-2.7b

FP16

937

9733.8

3750.8

78.8

12.69

dolly-v2-3b

FP16

1024

6615.2

2230.9

79.1

12.64

red-pajama-incite-chat-3b-v1

FP16

1020

6588.3

2259.4

80.2

12.47

glm-4-9b

INT4-MIXED

33

6386.2

328

80.4

12.44

red-pajama-incite-chat-3b-v1

FP16

1024

6570.3

2268.7

80.4

12.44

baichuan2-7b-chat

INT4-MIXED

32

5977.9

201.7

81

12.35

glm-4-9b

INT4-MIXED

32

6389.7

248.1

81

12.35

phi-2

FP16

1024

6646.2

2406.7

81.4

12.29

stable-zephyr-3b-dpo

FP16

946

6875.7

1868.2

82.9

12.06

stablelm-3b-4e1t

FP16

1024

6636.1

2036.9

83

12.05

chatglm2-6b

INT8-CW

32

6731.8

159.2

84.4

11.85

glm-4-9b

INT4-MIXED

1025

7061.4

4939.2

85.2

11.74

qwen-7b-chat-gptq

INT4-MIXED

1024

9175.3

3898

85.3

11.72

gemma-7b-it

INT4-MIXED

32

7883.9

230.5

86

11.63

gemma-7b-it

INT4-MIXED

32

8002.6

235

86.1

11.61

glm-4-9b

INT4-MIXED

1024

7064.9

4411.2

86.2

11.60

gpt-j-6b

INT8-CW

32

7009.2

176.8

86.4

11.57

chatglm2-6b

INT8-CW

1024

7050.5

3871.6

86.8

11.52

chatglm3-6b

INT8-CW

32

6755.9

159

86.8

11.52

baichuan2-7b-chat

INT4-MIXED

1024

7033.3

4049

88.8

11.26

chatglm3-6b

INT8-CW

1024

7076.5

3865.9

89.2

11.21

qwen-7b-chat

INT4-MIXED

32

9245.7

176.3

90

11.11

gemma-7b-it

INT4-MIXED

1024

9449.4

4305.8

93.2

10.73

gpt-j-6b

INT8-CW

1024

7672.3

4181.1

93.5

10.70

gemma-7b-it

INT4-MIXED

1024

9330.5

4222.5

93.7

10.67

orca-mini-3b

FP16

32

7416.5

122.3

94.7

10.56

codegen25-7b

INT8-CW

32

7557.6

170.7

98.4

10.16

qwen-7b-chat

INT4-MIXED

1024

10371.1

4271.7

98.9

10.11

llama-2-7b-chat-hf

INT8-CW

32

7390.6

171.6

99.9

10.01

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1