Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.0, 06 March 2025 (7-155H and 7-268V) and OpenVINO 2024.6, 13 Dec. 2024 (9-288V).

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd tok/sec

opt-125m-gptq

INT4-MIXED

32

833.1

15.6

3.9

256.4

opt-125m-gptq

INT4-MIXED

1024

955.9

553.8

4.8

208.3

bloomz-560m

INT4-MIXED

32

1457.5

48.5

11.1

90.1

qwen2-0.5b

INT4-MIXED

32

1167.8

95.7

11.5

87.0

qwen2-0.5b

INT4-MIXED

1024

1266

2330.3

12.7

78.7

qwen2-0.5b

INT8-CW

32

1496.3

90.5

12.8

78.1

bloomz-560m

INT8-CW

32

1724.2

84

13.9

71.9

qwen2-0.5b

INT8-CW

1024

1593

2370.7

14

71.4

bloomz-560m

INT4-MIXED

1024

1691

2005.3

15.2

65.8

qwen2-0.5b

FP16

32

2989.8

94.6

15.9

62.9

bloomz-560m

INT8-CW

1024

1941

2343.4

16.1

62.1

qwen2-0.5b

FP16

1024

3088.1

2376.8

17.4

57.5

bloomz-560m

FP16

32

3857

86.7

17.5

57.1

bloomz-560m

FP16

1024

4085.6

2373.4

19.8

50.5

tiny-llama-1.1b-chat

INT4-MIXED

32

1738.9

237.4

20

50.0

tiny-llama-1.1b-chat

INT8-CW

32

2471.2

224.6

22.6

44.2

tiny-llama-1.1b-chat

INT4-MIXED

1024

1929.3

5993

22.7

44.1

tiny-llama-1.1b-chat

INT8-CW

1024

2661.8

6238.8

25.2

39.7

qwen2-1.5b

INT4-MIXED

32

2429

312.8

28.4

35.2

tiny-llama-1.1b-chat

FP16

32

4834.9

231.7

28.9

34.6

tiny-llama-1.1b-chat

FP16

1024

5023.2

6191.5

31.7

31.5

qwen2-1.5b

INT4-MIXED

1024

2600.3

7597.3

31.8

31.4

stablelm-3b-4e1t

INT4-MIXED

32

3982.1

348.4

32.1

31.2

qwen2-1.5b

INT8-CW

32

3619

301

32.7

30.6

qwen2-1.5b

INT8-CW

1024

3790.3

7990.5

34.6

28.9

stablelm-3b-4e1t

INT4-MIXED

1023

4455.4

11963.2

39.2

25.5

minicpm-1b-sft

INT4-MIXED

31

5815.4

214.3

40.1

24.9

qwen2-1.5b

FP16

32

7582.3

304.4

42.2

23.7

minicpm-1b-sft

INT8-CW

31

6609.6

210.6

43.3

23.1

qwen2-1.5b

FP16

1024

7753.4

7915.3

44.2

22.6

gemma-2b-it

INT4-MIXED

32

3728.2

523

46.2

21.6

stable-zephyr-3b-dpo

INT4-MIXED

32

3689.3

656.5

47.4

21.1

gemma-2b-it

INT4-MIXED

1024

4207.3

11867.9

47.5

21.1

minicpm-1b-sft

FP16

31

8999.8

222.2

49.1

20.4

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

3448.1

1028.9

49.6

20.2

dolly-v2-3b

INT4-MIXED

32

3448.4

714.8

49.9

20.0

gemma-2b-it

INT8-CW

32

5423.2

488.8

51

19.6

gemma-2b-it

INT8-CW

1024

5902.7

12434.4

52.3

19.1

stable-zephyr-3b-dpo

INT8-CW

32

5630.3

694.5

54.4

18.4

phi-2

INT4-MIXED

32

3732.9

723.2

54.5

18.3

phi-2

INT8-CW

32

5600.4

747

55.7

18.0

dolly-v2-3b

INT8-CW

32

5589.7

1009.8

55.9

17.9

red-pajama-incite-chat-3b-v1

INT8-CW

32

5590.1

698.9

55.9

17.9

stablelm-3b-4e1t

INT8-CW

32

5630.1

660.7

56.1

17.8

dolly-v2-3b

INT4-MIXED

1024

3984.5

15502.8

56.5

17.7

red-pajama-incite-chat-3b-v1

INT4-MIXED

1023

3915.6

15363.9

56.6

17.7

llama-2-7b-gptq

INT4-MIXED

32

8618.5

782.9

56.9

17.6

phi-2

INT4-MIXED

1024

4251.3

15317

61

16.4

phi-2

INT8-CW

1024

6119.4

15886.6

62

16.1

red-pajama-incite-chat-3b-v1

INT8-CW

1023

6056.9

15984.9

62.2

16.1

dolly-v2-3b

INT8-CW

1024

6124.9

16099.7

62.5

16.0

stablelm-3b-4e1t

INT8-CW

1023

6097.1

16206.9

62.5

16.0

gemma-2b-it

FP16

32

12208.2

501.4

65.5

15.3

llama-3-8b

INT4-MIXED

33

8741.2

869

65.7

15.2

llama-2-7b-gptq

INT4-MIXED

1024

9468.1

26350.7

66.1

15.1

qwen-7b-chat-gptq

INT4-MIXED

32

8561

773.7

67

14.9

gemma-2b-it

FP16

1024

12687.8

12168.7

67.1

14.9

mistral-7b-v0.1

INT4-MIXED

32

8588.7

1020.6

67.4

14.8

llama-2-7b-chat-hf

INT4-MIXED

32

8626.8

1100

69.4

14.4

phi-2

FP16

32

11385.9

693.8

70.2

14.2

dolly-v2-3b

FP16

32

11359

688.5

70.5

14.2

stable-zephyr-3b-dpo

FP16

32

11432.9

648.5

70.6

14.2

red-pajama-incite-chat-3b-v1

FP16

32

11364

692.4

70.7

14.1

stablelm-3b-4e1t

FP16

32

11432.6

649

71.1

14.1

llama-3-8b

INT4-MIXED

1025

9254.8

29700.3

71.9

13.9

mistral-7b-v0.1

INT4-MIXED

1024

9121.9

29492.9

73.3

13.6

phi-3-mini-4k-instruct

INT8-CW

32

7646.1

952.6

75.7

13.2

qwen-7b-chat-gptq

INT4-MIXED

1024

10458.7

29022.2

75.9

13.2

zephyr-7b-beta

INT4-MIXED

32

9217.5

1196.6

76.2

13.1

phi-2

FP16

1024

11902.2

15868

77

13.0

dolly-v2-3b

FP16

1024

11892.5

15987.1

77.1

13.0

baichuan2-7b-chat

INT4-MIXED

32

9440.3

1118.1

77.3

12.9

red-pajama-incite-chat-3b-v1

FP16

1023

11829.1

16008.7

77.3

12.9

stablelm-3b-4e1t

FP16

1023

11897.5

16030

77.7

12.9

phi-3-mini-4k-instruct

INT4-MIXED

32

4961.9

968.8

78.2

12.8

llama-2-7b-chat-hf

INT4-MIXED

1024

9478.1

28958.6

78.6

12.7

zephyr-7b-beta

INT4-MIXED

1024

9764.2

30982

82.3

12.2

phi-3-mini-4k-instruct

INT8-CW

1024

8255.7

23200.5

83.1

12.0

phi-3-mini-4k-instruct

INT4-MIXED

1024

5570.2

22277.1

85.7

11.7

baichuan2-7b-chat

INT4-MIXED

1024

10305.2

29010

86.4

11.6

phi-3-mini-4k-instruct

FP16

32

15292.6

934.7

96.4

10.4

qwen-7b-chat

INT4-MIXED

32

10964.7

1413

97.8

10.2

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd token per sec (2nd lat^(-1))

tiny-llama-1.1b-chat

INT4

32

2176.5

31.7

9.6

104.1666667

tiny-llama-1.1b-chat

INT4

1024

2261

132.4

10.2

98.03921569

bloomz-560m

INT4

1024

2103

67.4

10.3

97.08737864

bloomz-560m

INT4

32

1880.6

33.7

10.5

95.23809524

qwen2-0.5b

INT4

1024

1679.5

63.2

10.8

92.59259259

qwen2-0.5b

INT4

32

1577.1

36.3

10.9

91.74311927

bloomz-560m

INT8

32

2015.6

30.3

10.9

91.74311927

qwen2-0.5b

INT8

32

1869.8

31.7

11

90.90909091

bloomz-560m

INT8

1024

2230.8

67.3

11.4

87.71929825

qwen2-0.5b

INT8

1024

1951.1

68

11.9

84.03361345

tiny-llama-1.1b-chat

INT8

32

2687.2

28.6

12.9

77.51937984

qwen2-1.5b

INT4

1024

2368.7

167.6

13.5

74.07407407

tiny-llama-1.1b-chat

INT8

1024

2530.6

127.8

13.7

72.99270073

qwen2-1.5b

INT4

32

2480.4

43.1

13.9

71.94244604

bloomz-560m

FP16

32

2654.3

29.6

14.5

68.96551724

bloomz-560m

FP16

1024

2880.8

75.8

15.8

63.29113924

qwen2-1.5b

INT8

32

2994.7

37.7

18.9

52.91005291

red-pajama-incite-chat-3b-v1

INT4

32

3240.5

53.2

19.2

52.08333333

qwen2-1.5b

INT8

1024

2893.3

163.6

19.6

51.02040816

gemma-2b-it

INT4

32

3245

188.4

20.1

49.75124378

minicpm-1b-sft

INT4

31

3024.6

68.3

20.4

49.01960784

dolly-v2-3b

INT4

32

3301

66.3

20.4

49.01960784

gemma-2b-it

INT4

1024

3022.4

231.3

21.5

46.51162791

red-pajama-incite-chat-3b-v1

INT4

1023

3400.6

397.9

22.1

45.24886878

minicpm-1b-sft

INT4

1014

2902.7

266.7

22.1

45.24886878

dolly-v2-3b

INT4

1024

3442.1

377.2

23.1

43.29004329

minicpm-1b-sft

INT8

31

3330

62.7

23.2

43.10344828

minicpm-1b-sft

INT8

1014

3259.1

211

24.8

40.32258065

phi-3-mini-4k-instruct

INT4

32

3662.9

44.3

26.3

38.02281369

phi-3-mini-4k-instruct

INT4

1024

4000.4

417.4

29.9

33.44481605

gemma-2b-it

INT8

32

3739.4

191.2

30.5

32.78688525

red-pajama-incite-chat-3b-v1

INT8

32

4338.9

48.7

31.2

32.05128205

minicpm-1b-sft

FP16

31

4195.6

63.5

31.5

31.74603175

dolly-v2-3b

INT8

32

4438.1

63.2

32

31.25

gemma-2b-it

INT8

1024

3910.8

248.1

32

31.25

minicpm-1b-sft

FP16

1014

4123.4

229.6

33.3

30.03003003

red-pajama-incite-chat-3b-v1

INT8

1023

4503.5

405.2

34.1

29.3255132

chatglm3-6b

INT4

32

4909.8

52.2

34.2

29.23976608

dolly-v2-3b

INT8

1024

4626

379.1

35.3

28.3286119

chatglm3-6b

INT4

1024

5049.9

638.2

36

27.77777778

llama-2-7b-gptq

INT4

32

5045.7

63.8

38.9

25.70694087

codegen25-7b

INT4

32

5366.8

66

39.2

25.51020408

decilm-7b-instruct

INT4

36

4964.3

123.3

39.9

25.06265664

chatglm3-6b-gptq

INT4

32

5344

70.6

40.7

24.57002457

decilm-7b-instruct

INT4

1091

4965

795.7

42.1

23.75296912

qwen-7b-chat-gptq

INT4

32

5958.7

68.5

42.7

23.41920375

chatglm3-6b-gptq

INT4

1024

6003.4

645.5

42.7

23.41920375

phi-3-mini-4k-instruct

INT8

32

5021.3

51.3

43

23.25581395

qwen2-7b

INT4

32

5653.2

69.4

43.8

22.83105023

llama-2-7b-gptq

INT4

1024

5767

752.6

43.9

22.77904328

codegen25-7b

INT4

1024

5757.8

811.5

44.2

22.62443439

falcon-7b-instruct

INT4

32

5062.6

65.2

44.2

22.62443439

llama-3-8b

INT4

33

5800.1

72.2

44.3

22.57336343

llama-3-8b

INT4

33

5867.8

73

45

22.22222222

mistral-7b-v0.1

INT4

32

5750.6

72.9

45.4

22.02643172

qwen2-7b

INT4

1024

5654

762.2

45.7

21.88183807

llama-3-8b

INT4

1025

5800.7

743.1

46.3

21.59827214

phi-3-mini-4k-instruct

INT8

1024

5021.9

441.7

46.7

21.41327623

llama-2-7b-chat-hf

INT4

32

5726.6

68.6

47

21.27659574

llama-3-8b

INT4

1025

5868.4

761.1

47.5

21.05263158

mistral-7b-v0.1

INT4

1024

5607.6

741.7

47.9

20.87682672

falcon-7b-instruct

INT4

1024

5063.2

645.9

48.2

20.74688797

qwen-7b-chat-gptq

INT4

1024

6647.1

757.2

48.6

20.57613169

zephyr-7b-beta

INT4

32

6088

73.6

49.6

20.16129032

baichuan2-7b-chat

INT4

32

6268.1

74.6

50.7

19.72386588

glm-4-9b-chat

INT4

32

6987.4

79.1

51.5

19.41747573

llama-2-7b-chat-hf

INT4

1024

6149.7

923.4

51.7

19.34235977

zephyr-7b-beta

INT4

1024

6016.3

745.6

51.9

19.26782274

gemma-7b-it

INT4

32

6418.3

89.4

53.1

18.83239171

baichuan2-7b-chat

INT4

1024

6268.6

734.8

55.7

17.95332136

llama-3.1-8b

INT4

32

6640.3

87.3

56

17.85714286

glm-4-9b-chat

INT4

1024

6831.9

846.7

56.4

17.73049645

llama-3.1-8b

INT4

1024

6904.8

933.2

58.9

16.97792869

qwen-7b-chat

INT4

32

7172.3

77.7

59.9

16.69449082

gemma-7b-it

INT4

1024

6821

749.7

60

16.66666667

red-pajama-incite-chat-3b-v1

FP16

32

6798.6

66.3

64.6

15.47987616

dolly-v2-3b

FP16

32

6737.8

68.2

65.5

15.26717557

chatglm3-6b

INT8

32

7478.5

77.6

66.3

15.08295626

qwen-7b-chat

INT4

1024

7851.6

758.8

66.5

15.03759398

red-pajama-incite-chat-3b-v1

FP16

1023

7121.4

434.8

67.7

14.77104874

dolly-v2-3b

FP16

1024

6738.7

402.8

68.5

14.59854015

chatglm3-6b

INT8

1024

7397.6

654.2

68.8

14.53488372

falcon-7b-instruct

INT8

32

7912.4

84.6

73.8

13.5501355

qwen2-7b

INT8

32

8567.4

92.1

75

13.33333333

llama-2-7b-chat-hf

INT8

32

7821.1

85.2

77.1

12.97016861

codegen25-7b

INT8

32

8002.4

86.9

77.7

12.87001287

baichuan2-13b-chat

INT4

32

10428.4

156.1

78.6

12.72264631

qwen2-7b

INT8

1024

8797

885.1

78.6

12.72264631

phi-3-medium-4k-instruct

INT4

38

8810.9

156.8

78.7

12.7064803

decilm-7b-instruct

INT8

36

8369.7

119.9

78.8

12.69035533

falcon-7b-instruct

INT8

1024

7912.4

969.3

79.7

12.54705144

baichuan2-7b-chat

INT8

32

8498.5

94.6

81

12.34567901

zephyr-7b-beta

INT8

32

8232.9

94.6

81.7

12.23990208

mistral-7b-v0.1

INT8

32

8644

95

81.8

12.22493888

qwen-7b-chat

INT8

32

8975.1

92.7

81.8

12.22493888

decilm-7b-instruct

INT8

1091

8208.8

1025.7

82.3

12.15066829

llama-2-7b-chat-hf

INT8

1024

7821.4

759.4

82.5

12.12121212

codegen25-7b

INT8

1024

8003

923.1

83.5

11.9760479

phi-3-mini-4k-instruct

FP16

32

8751.9

88.8

84.9

11.77856302

mistral-7b-v0.1

INT8

1024

8488

781.6

85.1

11.75088132

phi-3-medium-4k-instruct

INT4

1061

8946.2

2039.2

85.3

11.72332943

zephyr-7b-beta

INT8

1024

8487.4

826.4

85.3

11.72332943

llama-3.1-8b

INT8

32

9039

98.6

85.9

11.64144354

llama-3-8b

INT8

33

9040.6

102.9

86.1

11.61440186

baichuan2-7b-chat

INT8

1024

9331.5

836.8

86.5

11.56069364

qwen-7b-chat

INT8

1024

9642.8

740.6

88.7

11.27395716

phi-3-mini-4k-instruct

FP16

1024

9173.4

490.2

89

11.23595506

llama-3-8b

INT8

1025

9041

1025.4

89.4

11.18568233

llama-3.1-8b

INT8

1024

9302

885.9

89.6

11.16071429

starcoder

INT4

32

9298.4

142.3

93.6

10.68376068

gemma-7b-it

INT8

32

9662.7

114

94

10.63829787

glm-4-9b-chat

INT8

32

10381.6

110.6

98

10.20408163

gemma-7b-it

INT8

1024

10351.1

1005.5

99.4

10.06036217

glm-4-9b-chat

INT8

1024

10545.1

1116.4

101.1

9.891196835

lcm-dreamshaper-v7

INT8

32

4719.6

117.5

107.7

9.285051068

lcm-dreamshaper-v7

INT8

1024

5279.1

119.4

108.1

9.250693802

lcm-dreamshaper-v7

FP16

32

4907.3

118.5

109.7

9.115770283

lcm-dreamshaper-v7

FP16

1024

5530.5

122.3

109.9

9.099181074

lcm-dreamshaper-v7

INT4

1024

5443.8

121.4

110

9.090909091

flan-t5-xxl

INT4

33

13636.5

85.8

110.2

9.074410163

lcm-dreamshaper-v7

INT4

32

4790.7

120.3

110.9

9.017132552

flan-t5-xxl

INT8

33

23408.6

471.4

128.3

7.794232268

starcoder

INT4

1024

9716.5

1953.1

141.7

7.05716302

phi-3-medium-4k-instruct

INT8

38

14470.9

219.9

149.9

6.671114076

phi-3-medium-4k-instruct

INT8

1061

14471.2

2300

154.3

6.4808814

decilm-7b-instruct

FP16

1091

14729.5

1426.8

162.3

6.161429452

llama-3-8b

FP16

33

15955.7

237.9

168.8

5.924170616

llama-3-8b

FP16

1025

16095.6

1384

172.7

5.790387956

starcoder

INT8

32

15761.5

205.7

180.5

5.540166205

stable-diffusion-v2-1

INT8

1024

5484.5

202.4

181.3

5.515719801

stable-diffusion-v2-1

INT8

32

4830.1

194.9

181.9

5.497526113

stable-diffusion-v2-1

FP16

32

5580

202.3

182.7

5.473453749

stable-diffusion-v2-1

FP16

1024

6018

207.6

183.6

5.446623094

stable-diffusion-v1-5

INT8

32

4732.6

218.9

207.8

4.812319538

stable-diffusion-v1-5

INT8

1024

5212.1

219.8

208.5

4.79616307

stable-diffusion-v1-5

FP16

1024

5524.4

227.1

210.8

4.743833017

stable-diffusion-v1-5

FP16

32

4856.2

220.4

211.4

4.730368969

decilm-7b-instruct

FP16

36

15039.6

223.1

226.3

4.418912947

starcoder

INT8

1024

15763.6

2118.6

229.5

4.357298475

flan-t5-xxl

INT4

1139

16177.9

270.1

236.5

4.22832981

flan-t5-xxl

INT8

1139

26075.7

322.8

263.1

3.800836184

baichuan2-13b-chat

INT4

1024

12977.1

1796.4

279.2

3.581661891

baichuan2-13b-chat

INT8

32

15417

197.2

334.2

2.992220227

llama-3.1-8b

FP16

32

16112.3

853.2

410.5

2.436053593

llama-3.1-8b

FP16

1024

17452.2

1166.4

418.4

2.390057361

baichuan2-13b-chat

INT8

1024

15611.5

1891.2

446.3

2.240645306

phi-3-medium-4k-instruct

FP16

38

27161.6

2440.2

2280.7

0.438461876

phi-3-medium-4k-instruct

FP16

1061

27505.4

3536.3

2285.2

0.43759846

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd token per sec (2nd lat^(-1))

bloomz-560m

INT4

32

2123

36.1

12.5

80

bloomz-560m

INT4

1024

2123.6

195

13.7

72.99270073

tiny-llama-1.1b-chat

INT4

32

2249.2

36.8

13.9

71.94244604

tiny-llama-1.1b-chat

INT4

1024

2249.9

427.8

15

66.66666667

qwen2-0.5b

INT4

32

1800.7

44.7

15.4

64.93506494

bloomz-560m

INT8

32

2273.5

39.5

15.4

64.93506494

qwen2-0.5b

INT4

1024

1801.1

185.9

15.5

64.51612903

bloomz-560m

INT8

1024

2471.6

213.3

15.8

63.29113924

qwen2-0.5b

INT8

32

2000.1

37.9

18.2

54.94505495

qwen2-0.5b

INT8

1024

2135.9

218

18.7

53.47593583

bloomz-560m

FP16

32

3069.2

39.1

19.7

50.76142132

qwen2-1.5b

INT4

32

2750.3

47.6

20

50

tiny-llama-1.1b-chat

INT8

32

2441.6

49.4

20.5

48.7804878

qwen2-1.5b

INT4

1024

2575.9

531.2

20.9

47.84688995

bloomz-560m

FP16

1024

3057.5

232.7

21

47.61904762

tiny-llama-1.1b-chat

INT8

1024

2431.7

523.6

21.5

46.51162791

dolly-v2-3b

INT4

32

3178.8

75.4

27.1

36.900369

minicpm-1b-sft

INT4

31

3131.5

74

27.6

36.23188406

red-pajama-incite-chat-3b-v1

INT4

32

3057.5

67.1

27.6

36.23188406

gemma-2b-it

INT4

32

3460.7

97.9

28.5

35.0877193

minicpm-1b-sft

INT4

1014

3132

732.4

29

34.48275862

qwen2-1.5b

INT8

32

3126.4

77.4

29.3

34.12969283

gemma-2b-it

INT4

1024

3461.4

796.3

29.4

34.01360544

qwen2-1.5b

INT8

1024

3126.8

660.3

30.1

33.22259136

dolly-v2-3b

INT4

1024

3179

1171.9

31.8

31.44654088

minicpm-1b-sft

INT8

31

3496

77.9

31.9

31.34796238

red-pajama-incite-chat-3b-v1

INT4

1023

3057.7

1211

32.8

30.48780488

minicpm-1b-sft

INT8

1014

3433.2

783.7

33.6

29.76190476

phi-3-mini-4k-instruct

INT4

32

3534.8

96.6

36.6

27.32240437

red-pajama-incite-chat-3b-v1

INT8

32

4099.8

107.3

42.3

23.64066194

gemma-2b-it

INT8

32

4478.7

103.1

42.4

23.58490566

minicpm-1b-sft

FP16

31

4157.5

75.7

42.7

23.41920375

phi-3-mini-4k-instruct

INT4

1024

3535.3

1521.7

42.8

23.36448598

dolly-v2-3b

INT8

32

4143.7

102

43.1

23.20185615

gemma-2b-it

INT8

1024

4478.9

936.2

43.3

23.09468822

minicpm-1b-sft

FP16

1014

4329.7

876.6

44.8

22.32142857

red-pajama-incite-chat-3b-v1

INT8

1023

4412.8

1815.9

44.9

22.27171492

dolly-v2-3b

INT8

1024

4143.8

1276.4

45.6

21.92982456

chatglm3-6b

INT4

32

4746.8

149.6

50.6

19.76284585

chatglm3-6b

INT4

1024

4747

2279.1

52.6

19.01140684

flan-t5-xxl

INT4

33

13681.2

91.7

53.6

18.65671642

phi-3-mini-4k-instruct

INT8

32

5041.3

110.9

56.9

17.57469244

llama-2-7b-gptq

INT4

32

5115.9

168.1

57.8

17.30103806

chatglm3-6b-gptq

INT4

32

5371.4

159.5

57.8

17.30103806

decilm-7b-instruct

INT4

36

5415.9

230.5

58

17.24137931

codegen25-7b

INT4

32

5110.5

161

59.1

16.92047377

flan-t5-xxl

INT4

1139

16627.6

455.8

59.3

16.86340641

qwen2-7b

INT4

32

5802.2

173.2

60.1

16.63893511

phi-3-mini-4k-instruct

INT8

1024

5041.7

1812.4

60.2

16.61129568

chatglm3-6b-gptq

INT4

1024

5748.7

2236

60.2

16.61129568

falcon-7b-instruct

INT4

32

5495.1

181.3

60.3

16.58374793

decilm-7b-instruct

INT4

1091

5237.4

2995.4

60.9

16.42036125

qwen2-7b

INT4

1024

5758.2

2445.4

61.9

16.15508885

falcon-7b-instruct

INT4

1024

5682.7

2718.5

62.6

15.97444089

codegen25-7b

INT4

1024

5513.9

2500.7

63.2

15.82278481

mistral-7b-v0.1

INT4

32

5475.8

178.5

64.7

15.45595054

qwen-7b-chat-gptq

INT4

32

6115.4

174.2

64.8

15.43209877

llama-3-8b

INT4

33

5964.2

238.4

65.2

15.33742331

llama-3-8b

INT4

33

5870.5

239.8

65.3

15.31393568

llama-2-7b-chat-hf

INT4

32

5493.5

157.4

65.4

15.29051988

llama-2-7b-gptq

INT4

1024

5802.7

2547.3

65.4

15.29051988

mistral-7b-v0.1

INT4

1024

5476

2684.8

67.2

14.88095238

llama-3-8b

INT4

1025

6163.2

2842.9

67.6

14.79289941

zephyr-7b-beta

INT4

32

5739.1

177.4

67.7

14.77104874

llama-3-8b

INT4

1025

6069.4

2741.8

67.8

14.74926254

llama-2-7b-chat-hf

INT4

1024

5494

2500.3

69.5

14.38848921

zephyr-7b-beta

INT4

1024

5739.7

2671.4

71

14.08450704

qwen-7b-chat-gptq

INT4

1024

6646.3

2596.9

73

13.69863014

baichuan2-7b-chat

INT4

32

6385.1

159.5

73.1

13.67989056

gemma-7b-it

INT4

32

7297.7

221.9

73.7

13.56852103

dolly-v2-3b

FP16

32

6652.1

107.1

74.2

13.47708895

red-pajama-incite-chat-3b-v1

FP16

32

6640.8

103.1

74.7

13.38688086

llama-3.1-8b

INT4

32

6797.5

182.7

76.3

13.1061599

glm-4-9b-chat

INT4

32

6805.1

215.5

76.4

13.08900524

baichuan2-7b-chat

INT4

1024

6385.5

2597

77.3

12.93661061

gemma-7b-it

INT4

1024

6974.7

3126

77.5

12.90322581

dolly-v2-3b

FP16

1024

6652.2

1542.4

78.7

12.7064803

red-pajama-incite-chat-3b-v1

FP16

1023

7120.4

2490.4

79.3

12.61034048

llama-3.1-8b

INT4

1024

7114

2807.6

79.7

12.54705144

glm-4-9b-chat

INT4

1024

6805.2

3197

79.7

12.54705144

qwen-7b-chat

INT4

32

7255.7

156.2

81.2

12.31527094

chatglm3-6b

INT8

32

7308.6

154.4

85.1

11.75088132

qwen-7b-chat

INT4

1024

7827.7

2693.7

86.6

11.54734411

chatglm3-6b

INT8

1024

7308.9

2486

87.4

11.4416476

flan-t5-xxl

INT8

33

20923.9

170.5

91.7

10.90512541

llama-2-7b-chat-hf

INT8

32

7838.4

157.9

94.8

10.54852321

falcon-7b-instruct

INT8

32

8250

175.3

95.1

10.51524711

codegen25-7b

INT8

32

7996.9

162.7

95.7

10.44932079

falcon-7b-instruct

INT8

1024

8445.4

3055.4

97.5

10.25641026

flan-t5-xxl

INT8

1139

24095.3

571.2

97.6

10.24590164

qwen2-7b

INT8

32

8542.4

185.5

98.2

10.18329939

llama-2-7b-chat-hf

INT8

1024

7838.6

3132.1

98.8

10.12145749

qwen2-7b

INT8

1024

8543.5

3124.5

99.8

10.02004008

codegen25-7b

INT8

1024

8453.5

3136

99.9

10.01001001

decilm-7b-instruct

INT8

36

8088.5

244.9

100.7

9.930486594

phi-3-mini-4k-instruct

FP16

32

8592.5

124.5

102.9

9.718172983

decilm-7b-instruct

INT8

1091

8292.4

9951.9

103.5

9.661835749

qwen-7b-chat

INT8

32

8991.1

169.7

103.7

9.643201543

zephyr-7b-beta

INT8

32

8267.2

183.1

104.5

9.56937799

mistral-7b-v0.1

INT8

32

8269.6

184.1

104.9

9.532888465

zephyr-7b-beta

INT8

1024

8268.1

3379.7

107

9.345794393

mistral-7b-v0.1

INT8

1024

8513.8

3394.1

107.4

9.310986965

phi-3-mini-4k-instruct

FP16

1024

9157.2

2080.8

108.4

9.225092251

qwen-7b-chat

INT8

1024

8991.4

3137.5

109

9.174311927

llama-3-8b

INT8

33

9085.1

264.9

109.4

9.140767824

llama-3.1-8b

INT8

32

9070.9

189.1

110.7

9.033423668

baichuan2-13b-chat

INT4

32

10592.1

330.4

111.4

8.976660682

llama-3-8b

INT8

1025

9085.2

9900.1

111.9

8.936550492

llama-3.1-8b

INT8

1024

9071

3408.2

113.2

8.833922261

phi-3-medium-4k-instruct

INT4

38

9009.6

443.3

116

8.620689655

phi-3-medium-4k-instruct

INT4

1061

8935.4

5655.5

119.9

8.34028357

baichuan2-7b-chat

INT8

32

8633.7

172.7

120.5

8.298755187

baichuan2-7b-chat

INT8

1024

9135.7

3192.6

124.7

8.019246191

gemma-7b-it

INT8

32

10087.5

223.2

125.2

7.987220447

glm-4-9b-chat

INT8

32

10440

224.2

125.7

7.955449483

gemma-7b-it

INT8

1024

9965.1

3723.4

129.1

7.745933385

glm-4-9b-chat

INT8

1024

10440.1

4054.2

129.2

7.73993808

starcoder

INT4

32

9738.6

599.6

177.5

5.633802817

flan-t5-xxl

FP16

33

19273

553.7

188.1

5.316321106

flan-t5-xxl

FP16

1139

24887.6

999

193.1

5.178663905

phi-3-medium-4k-instruct

INT8

38

14453.1

1342.7

205.9

4.856726566

phi-3-medium-4k-instruct

INT8

1061

14287.2

19763.6

210.9

4.741583689

decilm-7b-instruct

FP16

36

14215.6

465.7

222

4.504504505

decilm-7b-instruct

FP16

1091

14332.5

12122.8

225.6

4.432624113

starcoder

INT8

32

8567.4

379.1

235.4

4.24808836

llama-3.1-8b

FP16

32

15653.3

319.9

240.7

4.154549231

starcoder

INT4

1024

9738.7

6736.5

241.1

4.147656574

llama-3.1-8b

FP16

1024

17004.9

4679.8

245.7

4.07000407

starcoder

INT8

1024

9829.9

8819.9

269.2

3.714710253

lcm-dreamshaper-v7

INT4

32

5391.5

296.1

284.2

3.518648839

lcm-dreamshaper-v7

INT4

1024

5779.1

305.6

284.3

3.517411185

lcm-dreamshaper-v7

FP16

1024

5967.9

304.5

284.5

3.514938489

lcm-dreamshaper-v7

FP16

32

5238.8

295.8

284.5

3.514938489

lcm-dreamshaper-v7

INT8

32

4974.1

314.4

301.4

3.317850033

lcm-dreamshaper-v7

INT8

1024

5622.3

323.9

301.7

3.314550878

stable-diffusion-v2-1

FP16

1024

5942.7

475.7

444.7

2.248706993

stable-diffusion-v2-1

FP16

32

5197.9

466.9

445.4

2.245172878

baichuan2-13b-chat

INT4

1024

12879

5213.1

448.6

2.229157379

stable-diffusion-v2-1

INT8

32

4723.6

484

455.9

2.193463479

stable-diffusion-v2-1

INT8

1024

5458.1

489.4

456.2

2.192021043

stable-diffusion-v1-5

FP16

1024

6573.2

576.6

550.6

1.816200509

stable-diffusion-v1-5

FP16

32

5848.9

570.5

551.4

1.81356547

stable-diffusion-v1-5

INT8

32

5581

603.9

587.7

1.701548409

stable-diffusion-v1-5

INT8

1024

6258.2

612.9

589.4

1.696640652

phi-3-medium-4k-instruct

FP16

38

27222.7

3293.8

1198.9

0.834097923

phi-3-medium-4k-instruct

FP16

1061

28813.8

32882.8

1199.7

0.833541719

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1