Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2026.1, 7 April 2026.

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

t5-small

INT4-MIXED

32

11

5.3

1161.7

188.6792453

t5-small

INT4-MIXED

1024

14.1

5.4

1032.5

185.1851852

t5-small

INT8-CW

32

11.3

5.5

1219.6

181.8181818

t5-small

INT4-MIXED

32

11.5

5.6

1131

178.5714286

t5-small

INT8-CW

1024

13.5

5.7

1097.2

175.4385965

t5-small

INT4-MIXED

32

11.2

5.8

1149

172.4137931

t5-small

INT4-MIXED

1024

18.2

6.1

1039.8

163.9344262

t5-small

FP16

32

10.5

6.2

1138.4

161.2903226

t5-small

INT4-MIXED

1024

11.5

6.2

1020.7

161.2903226

t5-small

FP16

1024

14.5

6.8

1233.5

147.0588235

distil-large-v2

INT4-MIXED

32

239.7

8.9

2170.7

112.3595506

distil-large-v2

INT4-MIXED

1024

305.4

8.9

1662.6

112.3595506

distil-large-v2

INT8-CW

1024

281.3

9.2

2160.5

108.6956522

distil-large-v2

INT8-CW

32

237.5

9.3

2451.7

107.5268817

whisper-large-v3-turbo

INT4-MIXED

1024

312.5

9.6

1980.9

104.1666667

whisper-large-v3-turbo

INT8-CW

1024

309.9

9.6

2167.9

104.1666667

whisper-large-v3-turbo

INT4-MIXED

32

263.5

9.7

2240.8

103.0927835

whisper-large-v3-turbo

INT8-CW

32

247.2

9.7

2482.7

103.0927835

codet5-base-sum

INT8-CW

291

22.2

9.8

1505.7

102.0408163

codet5-base-sum

INT4-MIXED

26

22.9

9.9

1453.6

101.010101

codet5-base-sum

INT4-MIXED

291

21.3

9.9

1276

101.010101

codet5-base-sum

INT4-MIXED

291

21.2

9.9

1232

101.010101

codet5-base-sum

INT8-CW

26

20

10

1666.4

100

codet5-base-sum

FP16

26

17.7

10.3

1975.1

97.08737864

codet5-base-sum

INT4-MIXED

26

27.1

10.3

1398.8

97.08737864

minicpm4-0.5b

INT4-MIXED

32

37.2

10.3

1588.9

97.08737864

codet5-base-sum

FP16

291

21.1

10.4

2012.4

96.15384615

minicpm4-0.5b

INT4-MIXED

32

41.7

11

1633.8

90.90909091

minicpm4-0.5b

INT4-MIXED

1024

62.1

11

1253.4

90.90909091

minicpm4-0.5b

INT4-MIXED

32

38.9

11

1492.6

90.90909091

minicpm4-0.5b

INT4-MIXED

1024

67.3

11.1

1283.8

90.09009009

minicpm4-0.5b

INT4-MIXED

1024

75.9

11.3

1444.5

88.49557522

minicpm4-0.5b

INT8-CW

32

38.9

11.6

1816.9

86.20689655

gemma-3-270m

INT4-MIXED

32

34.9

12.5

1624.4

80

minicpm4-0.5b

INT8-CW

1024

73.8

12.6

1408.4

79.36507937

gemma-3-270m

INT4-MIXED

1024

65.7

12.8

1224.2

78.125

whisper-small

INT4-MIXED

32

116.7

12.8

1776.5

78.125

whisper-small

INT4-MIXED

32

124.9

13

1667.8

76.92307692

whisper-small

INT4-MIXED

1024

185.8

13.2

1582.8

75.75757576

whisper-small

INT4-MIXED

1024

172.7

13.2

1454.7

75.75757576

distil-large-v2

FP16

1024

321.4

13.4

2642.5

74.62686567

distil-large-v2

FP16

32

278.9

13.5

2610.8

74.07407407

whisper-large-v3-turbo

FP16

32

287.6

13.6

3008.2

73.52941176

whisper-large-v3-turbo

FP16

1024

348.9

13.6

2937.9

73.52941176

whisper-small

INT4-MIXED

32

125.1

13.9

1748

71.94244604

whisper-small

INT4-MIXED

1024

189.3

14

1552

71.42857143

whisper-small

INT8-CW

32

124.7

14.5

1916.3

68.96551724

whisper-small

INT8-CW

1024

184.6

14.5

1717.8

68.96551724

gemma-3-270m

INT8-CW

32

45.7

14.6

1804

68.49315068

gemma-3-270m

INT8-CW

1024

60.1

15.4

1458.1

64.93506494

gemma-3-270m

FP16

1024

54.2

16.4

1452

60.97560976

tiny-llama-1.1b-chat

INT4-MIXED

32

64.5

16.6

2258.2

60.24096386

whisper-small

FP16

32

126.7

16.6

1929.7

60.24096386

tiny-llama-1.1b-chat

INT4-MIXED

1024

152.9

16.8

1566.4

59.52380952

gemma-3-270m

FP16

32

44.2

17.1

1522.4

58.47953216

whisper-small

FP16

1024

182.5

17.1

1956.8

58.47953216

llama-3.2-1b-instruct

INT4-MIXED

32

40.5

17.2

2166.2

58.13953488

tiny-llama-1.1b-chat

INT4-MIXED

32

43.1

17.5

2039.9

57.14285714

llama-3.2-1b-instruct

INT4-MIXED

1024

140

18.1

1627.3

55.24861878

flan-t5-large-grammar-synthesis

INT4-MIXED

32

43.8

18.3

2426.6

54.64480874

flan-t5-large-grammar-synthesis

INT4-MIXED

1024

63.6

18.8

2701.5

53.19148936

flan-t5-large-grammar-synthesis

INT8-CW

32

48.7

18.8

3181.1

53.19148936

tiny-llama-1.1b-chat

INT4-MIXED

1024

123.3

18.9

1430.6

52.91005291

tiny-llama-1.1b-chat

INT8-CW

32

47.8

19.3

2541.3

51.8134715

flan-t5-large-grammar-synthesis

INT8-CW

1024

73.4

19.8

3283.8

50.50505051

tiny-llama-1.1b-chat

INT8-CW

1024

131.6

19.9

1866.9

50.25125628

nanollava

INT8-CW

760

243.4

20.4

3539.5

49.01960784

llama-3.2-1b-instruct

INT4-MIXED

32

31

20.5

2182.6

48.7804878

llama-3.2-1b-instruct

INT4-MIXED

1024

124.9

21.5

1592

46.51162791

nanollava

INT4-MIXED

760

260.6

22

3484.7

45.45454545

gemma-3-1b-it

INT4-MIXED

32

75

22.3

2216.5

44.84304933

llama-3.2-1b-instruct

INT8-CW

32

43.1

22.3

2633.5

44.84304933

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

78.2

22.4

2451.9

44.64285714

gemma-3-1b-it

INT4-MIXED

1024

121.8

22.7

1773.4

44.05286344

llama-3.2-1b-instruct

INT8-CW

1024

132.6

22.7

2185

44.05286344

flan-t5-large-grammar-synthesis

FP16

32

41.4

23

4519.4

43.47826087

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

163.9

23.1

1971.3

43.29004329

minicpm4-0.5b

FP16

32

71.5

23.1

2318.3

43.29004329

gemma-3-1b-it

INT4-MIXED

32

66.5

23.6

2243.9

42.37288136

nanollava

INT8-CW

1752

362.3

23.6

4985.1

42.37288136

minicpm4-0.5b

FP16

1024

94.7

23.7

1728.9

42.19409283

gemma-3-1b-it

INT4-MIXED

32

81.6

23.9

2374.2

41.84100418

smolvlm2-256m-video-instruct

FP16

1141

682.9

23.9

3507.4

41.84100418

nanollava

INT4-MIXED

1752

364.1

24

4754.7

41.66666667

gemma-3-1b-it

INT4-MIXED

1024

147.8

24.5

1854.2

40.81632653

gemma-3-1b-it

INT4-MIXED

1024

129.4

24.6

1713.5

40.6504065

qwen2.5-1.5b-instruct

INT4-MIXED

32

89.4

25.2

2440.5

39.68253968

qwen2.5-1.5b-instruct

INT4-MIXED

1024

199.2

25.5

1931.8

39.21568627

qwen2.5-1.5b-instruct

INT4-MIXED

32

63

26

2312.8

38.46153846

glm-edge-1.5b-chat

INT4-MIXED

32

107.8

26.3

2460.5

38.02281369

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

77.1

26.4

2569

37.87878788

qwen2.5-1.5b-instruct

INT4-MIXED

1024

171.7

26.4

1781.7

37.87878788

minicpm-1b-sft

INT4-MIXED

31

102.4

26.5

2317

37.73584906

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

88.7

26.6

2875.8

37.59398496

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

172.6

26.7

2003

37.45318352

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

213

26.8

2414

37.31343284

smolvlm2-256m-video-instruct

INT8-CW

1141

626.9

26.9

3014.8

37.17472119

glm-edge-1.5b-chat

INT4-MIXED

1024

263.2

27.6

1879.7

36.23188406

gemma-2b-it

INT4-MIXED

32

66.3

27.8

2841.5

35.97122302

minicpm-1b-sft

INT4-MIXED

1014

188.2

27.9

1773.4

35.84229391

qwen2.5-1.5b-instruct

INT8-CW

32

74.7

28

2863.4

35.71428571

minicpm-1b-sft

INT4-MIXED

31

111.1

28.3

2359.9

35.33568905

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

74

28.5

3355.1

35.0877193

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

175.5

28.6

2723.3

34.96503497

gemma-2b-it

INT4-MIXED

1024

228.3

28.8

2393.4

34.72222222

gemma-3-1b-it

INT8-CW

32

97.5

28.8

2555.6

34.72222222

minicpm-1b-sft

INT4-MIXED

31

114.6

28.9

2509.7

34.60207612

minicpm-1b-sft

INT8-CW

31

94.2

28.9

2936.2

34.60207612

qwen2.5-1.5b-instruct

INT8-CW

1024

175.8

28.9

2404.6

34.60207612

gemma-3-1b-it

INT8-CW

1024

147.6

29.4

2049.8

34.01360544

glm-edge-1.5b-chat

INT8-CW

32

95.9

29.4

3031.8

34.01360544

smolvlm2-256m-video-instruct

INT4-MIXED

1141

630.4

29.5

2988.6

33.89830508

gemma-2b-it

INT4-MIXED

32

55.9

29.6

3051.8

33.78378378

flan-t5-large-grammar-synthesis

FP16

1024

67.9

29.7

4481.7

33.67003367

gemma-2b-it

INT4-MIXED

1024

293.7

30.1

2546.2

33.22259136

minicpm-1b-sft

INT4-MIXED

1014

230.9

30.5

1922.6

32.78688525

glm-edge-1.5b-chat

INT8-CW

1024

241.4

30.8

2409.1

32.46753247

minicpm-1b-sft

INT4-MIXED

1014

199.6

31.6

1801.1

31.64556962

minicpm-1b-sft

INT8-CW

1014

202.9

31.7

2356.2

31.54574132

nanollava

FP16

760

301.8

32.4

4296.9

30.86419753

nanollava

FP16

1752

422.3

34

5973.7

29.41176471

phi-2

INT4-MIXED

32

141.6

34.1

2842.2

29.3255132

gemma-2-2b

INT4-MIXED

33

98.1

34.7

2957.5

28.8184438

gemma-2-2b

INT4-MIXED

1025

291.6

35.9

2661.5

27.8551532

llama-3.2-3b-instruct

INT4-MIXED

32

93.7

35.9

3091.4

27.8551532

qwen2.5-coder-3b-instruct

INT4-MIXED

32

106.3

36.3

3112.6

27.54820937

phi-2

INT4-MIXED

1024

391.1

36.7

2865.2

27.2479564

gemma-2-2b

INT4-MIXED

33

100.8

36.8

3181

27.17391304

stablelm-3b-4e1t

INT4-MIXED

32

143.2

36.8

2848.2

27.17391304

llama-3.2-3b-instruct

INT4-MIXED

1024

324.7

36.9

2797.1

27.100271

stable-zephyr-3b-dpo

INT4-MIXED

32

149.4

36.9

2834.2

27.100271

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

303.1

37.2

2657.7

26.88172043

gemma-2-2b

INT4-MIXED

1025

332.8

37.6

2846.1

26.59574468

phi-2

INT4-MIXED

32

146.6

37.9

3067.2

26.38522427

llama-3.2-3b-instruct

INT4-MIXED

32

98.8

38.3

3178.1

26.10966057

qwen2.5-coder-3b-instruct

INT4-MIXED

32

122

38.5

3303.3

25.97402597

stable-zephyr-3b-dpo

INT4-MIXED

1024

394.5

38.6

2858.5

25.90673575

stablelm-3b-4e1t

INT4-MIXED

1024

389.8

38.6

2849.7

25.90673575

tiny-llama-1.1b-chat

FP16

32

60.3

39

3427.6

25.64102564

llama-3.2-3b-instruct

INT4-MIXED

1024

421.4

39.1

2873.9

25.57544757

phi-3-mini-4k-instruct

INT4-MIXED

32

122.8

39.3

3421.7

25.44529262

phi-3.5-mini-instruct

INT4-MIXED

32

110.1

39.4

3426.2

25.38071066

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

486.2

39.4

2807.8

25.38071066

tiny-llama-1.1b-chat

FP16

1024

184.4

39.4

3013.6

25.38071066

stable-zephyr-3b-dpo

INT4-MIXED

32

156.8

39.7

3157

25.18891688

phi-3-mini-128k-instruct

INT4-MIXED

32

123.6

39.8

3501

25.12562814

phi-2

INT4-MIXED

1024

487

40.1

3098.8

24.93765586

gemma-2b-it

INT8-CW

32

67.4

40.3

3814

24.81389578

gemma-2b-it

INT8-CW

1024

246.4

40.9

3367.4

24.44987775

phi-3-mini-4k-instruct

INT4-MIXED

1024

504.6

41.4

3370

24.15458937

phi-3.5-mini-instruct

INT4-MIXED

1024

510

41.8

3370

23.92344498

stable-zephyr-3b-dpo

INT4-MIXED

1024

474.8

41.8

3190.4

23.92344498

phi-3-mini-128k-instruct

INT4-MIXED

1024

510.3

41.9

3466.9

23.86634845

phi-3-mini-128k-instruct

INT4-MIXED

32

113.7

42

3457

23.80952381

gemma-3-1b-it

FP16

32

92.3

42.1

3227.1

23.75296912

gemma-3-1b-it

FP16

1024

178.4

42.4

2997.7

23.58490566

phi-3-mini-128k-instruct

INT4-MIXED

1024

653.4

44.2

3552.8

22.62443439

gemma-2-2b

INT8-CW

33

92

44.3

3957

22.57336343

llama-3.2-3b-instruct

INT4-MIXED

32

89.2

45.1

3154.7

22.172949

gemma-2-2b

INT8-CW

1025

297.7

45.5

3657.9

21.97802198

phi-3-mini-4k-instruct

INT4-MIXED

32

121.1

45.5

3535.2

21.97802198

phi-4-mini-reasoning

INT4-MIXED

32

131.2

45.7

3512.5

21.88183807

llama-3.2-1b-instruct

FP16

32

54.5

45.8

3557.3

21.83406114

phi-2

INT8-CW

32

127.4

45.8

4154.6

21.83406114

stablelm-3b-4e1t

INT4-MIXED

32

139.6

46

3287

21.73913043

qwen3-4b

INT4-MIXED

32

150

46.1

3522.3

21.69197397

llama-3.2-1b-instruct

FP16

1024

202.1

46.4

3332.1

21.55172414

llama-3.2-3b-instruct

INT4-MIXED

1024

334.3

46.4

2843.4

21.55172414

phi-4-mini-instruct

INT4-MIXED

32

141.4

46.4

3537.8

21.55172414

stable-zephyr-3b-dpo

INT8-CW

32

135.1

46.5

4243.2

21.50537634

stablelm-3b-4e1t

INT8-CW

32

137.2

46.5

4144.2

21.50537634

phi-3.5-mini-instruct

INT4-MIXED

32

140.1

46.7

3446.2

21.41327623

phi-4-mini-instruct

INT4-MIXED

1024

425.9

47.1

3287.3

21.23142251

phi-4-mini-reasoning

INT4-MIXED

1024

423.4

47.1

3286

21.23142251

phi-2

INT8-CW

1024

407.7

48.1

4077.8

20.79002079

phi-3-mini-4k-instruct

INT4-MIXED

1024

655.9

48.2

3637.5

20.74688797

afm-4.5b

INT4-MIXED

32

127.4

48.3

4145.2

20.70393375

qwen3-4b

INT4-MIXED

1024

479.8

48.5

3308.2

20.6185567

phi-4-mini-reasoning

INT4-MIXED

32

159.6

48.7

3705.2

20.5338809

phi-3.5-mini-instruct

INT4-MIXED

1024

671.4

48.8

3559.8

20.49180328

stablelm-3b-4e1t

INT4-MIXED

1024

419.7

48.8

3309.3

20.49180328

stablelm-3b-4e1t

INT8-CW

1024

415.1

48.9

4092.6

20.44989775

afm-4.5b

INT4-MIXED

1024

558

49.2

3735.3

20.32520325

stable-zephyr-3b-dpo

INT8-CW

1024

412.6

49.3

4223.7

20.28397566

internvl2-4b

INT4-MIXED

297

317.6

49.4

4276.1

20.24291498

phi-4-mini-reasoning

INT4-MIXED

1024

535.1

50.5

3452.1

19.8019802

llama-3.2-3b-instruct

INT8-CW

32

90.2

51

4561.4

19.60784314

qwen2.5-coder-3b-instruct

INT8-CW

32

113.6

51.4

4461.1

19.45525292

phi-3.5-mini-instruct

INT4-MIXED

32

105.2

51.6

3345.5

19.37984496

phi-3-mini-4k-instruct

INT4-MIXED

32

106.8

51.7

3355.2

19.34235977

phi-4-mini-instruct

INT4-MIXED

32

160.3

52

3864.3

19.23076923

llama-3.2-3b-instruct

INT8-CW

1024

354.2

52.3

4271.7

19.12045889

phi-3.5-vision-instruct

INT4-MIXED

802

698.5

52.3

5042.9

19.12045889

qwen2.5-coder-3b-instruct

INT8-CW

1024

405.8

52.3

4001.2

19.12045889

phi-3.5-vision-instruct

INT4-MIXED

1032

872.8

52.4

5254.7

19.08396947

qwen2.5-coder-3b-instruct

INT4-MIXED

32

66.8

52.7

3056.7

18.97533207

chatglm3-6b

INT4-MIXED

32

90.9

52.9

4587.8

18.90359168

internvl2-4b

INT4-MIXED

1027

812.3

53.1

5112

18.83239171

phi-4-mini-instruct

INT4-MIXED

1024

563

53.6

3554.3

18.65671642

chatglm3-6b

INT4-MIXED

1024

607.2

53.9

4178.8

18.5528757

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

308

54

2611.4

18.51851852

phi-3-mini-4k-instruct

INT4-MIXED

1024

503

54.4

3426.3

18.38235294

phi-3.5-mini-instruct

INT4-MIXED

1024

503.1

54.5

3412.1

18.34862385

minicpm-1b-sft

FP16

31

152.5

55.1

3885.2

18.14882033

chatglm3-6b

INT4-MIXED

32

92.1

55.8

4827.9

17.92114695

minicpm-1b-sft

FP16

1014

280.5

55.8

3849.4

17.92114695

phi-4-mini-instruct

INT4-MIXED

32

138.5

55.9

3641.7

17.88908766

phi-4-mini-reasoning

INT4-MIXED

32

144.5

56

3657

17.85714286

deepseek-r1-distill-qwen-1.5b

FP16

32

86.8

56.7

4630.7

17.6366843

internvl2-4b

INT4-MIXED

297

306.6

56.9

4261

17.57469244

phi-4-mini-reasoning

INT4-MIXED

1024

443.5

57.1

3335

17.51313485

qwen2.5-1.5b-instruct

FP16

32

84.3

57.1

4096

17.51313485

deepseek-r1-distill-qwen-1.5b

FP16

1024

253.8

57.2

4365.2

17.48251748

phi-4-mini-instruct

INT4-MIXED

1024

443.2

57.2

3331.3

17.48251748

glm-edge-4b-chat

INT4-MIXED

32

230.1

57.3

3892.8

17.45200698

chatglm3-6b

INT4-MIXED

1024

848.8

57.4

4402.1

17.42160279

qwen2.5-1.5b-instruct

FP16

1024

247.6

57.4

3827.1

17.42160279

whisper-large-v3

INT4-MIXED

32

485

57.7

3586

17.33102253

whisper-large-v3

INT8-CW

1024

511.6

57.8

3761.8

17.30103806

glm-edge-1.5b-chat

FP16

32

137.2

57.9

4261.6

17.27115717

whisper-large-v3

INT8-CW

32

500.5

58

4232.8

17.24137931

codellama-7b

INT4-MIXED

32

102.6

58.2

4712.1

17.18213058

whisper-large-v3

INT4-MIXED

1024

514.1

58.4

3295.1

17.12328767

flan-t5-xxl

INT4-MIXED

33

78.2

58.6

13267.6

17.06484642

phi-3-mini-4k-instruct

INT8-CW

32

107

58.6

5175.1

17.06484642

llama-2-7b-chat-hf

INT4-MIXED

32

112.7

58.7

4714.6

17.03577513

phi-3-mini-128k-instruct

INT8-CW

32

108.5

58.8

5100.8

17.00680272

baichuan2-7b-chat

INT4-MIXED

32

98.1

58.9

6354.6

16.97792869

glm-edge-4b-chat

INT4-MIXED

1024

701.5

58.9

3598

16.97792869

phi-3.5-mini-instruct

INT8-CW

32

106.9

58.9

5147

16.97792869

glm-edge-1.5b-chat

FP16

1024

310.9

59.7

3917.2

16.75041876

qwen3-4b

INT4-MIXED

32

122.6

60.3

3691.7

16.58374793

neural-chat-7b-v3-3

INT4-MIXED

32

113.6

61.1

4937.8

16.36661211

zephyr-7b-beta

INT4-MIXED

32

111.8

61.1

4949.6

16.36661211

codellama-7b

INT4-MIXED

32

116.8

61.2

4968.6

16.33986928

biomistral-7b-slerp

INT4-MIXED

7

109.2

61.4

4935.6

16.28664495

phi-3-mini-128k-instruct

INT8-CW

1024

545.4

61.6

5112

16.23376623

phi-3.5-mini-instruct

INT8-CW

1024

540.3

61.7

5120.9

16.20745543

gemma-3-4b-it

INT4-MIXED

32

178.8

61.9

4932.9

16.15508885

phi-3-mini-4k-instruct

INT8-CW

1024

543.1

61.9

5106.8

16.15508885

internvl2-4b

INT4-MIXED

1027

729.9

62

4958.7

16.12903226

mistral-7b-instruct-v0.2

INT4-MIXED

32

120.6

62

4942.4

16.12903226

mistral-7b-instruct-v0.3

INT4-MIXED

32

109

62.2

4954

16.07717042

phi-4-mini-instruct

INT8-CW

32

146

62.4

5081.5

16.02564103

phi-4-mini-reasoning

INT8-CW

32

127.8

62.4

5060.8

16.02564103

qwen3-4b

INT4-MIXED

1024

482.2

62.4

3442.7

16.02564103

codellama-7b

INT4-MIXED

1024

709.7

62.6

4967.8

15.97444089

baichuan2-7b-chat

INT4-MIXED

1024

768

62.7

6614.9

15.94896332

falcon-7b-instruct

INT4-MIXED

32

118.6

62.8

4801.7

15.92356688

neural-chat-7b-v3-3

INT4-MIXED

1024

725.8

62.8

4726.6

15.92356688

llama-2-7b-chat-hf

INT4-MIXED

1024

708.8

63

4968.6

15.87301587

zephyr-7b-beta

INT4-MIXED

1024

734.9

63.2

4726.9

15.82278481

gemma-3-4b-it

INT4-MIXED

32

211

63.5

5118.4

15.7480315

mistral-7b-instruct-v0.2

INT4-MIXED

1024

729.2

63.5

4729.9

15.7480315

mistral-7b-instruct-v0.3

INT4-MIXED

1024

733.7

63.5

4732.8

15.7480315

falcon-7b-instruct

INT4-MIXED

1024

733.3

63.6

4375.9

15.72327044

phi-4-mini-instruct

INT8-CW

1024

463.9

63.7

4840.9

15.69858713

phi-4-mini-reasoning

INT8-CW

1024

461.9

63.7

4843.8

15.69858713

gemma-3-4b-it

INT4-MIXED

1024

572.5

64.7

6744.2

15.45595054

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

101.1

64.8

5522.2

15.43209877

qwen2.5-7b-instruct-1m

INT4-MIXED

32

100.3

64.8

5537.9

15.43209877

qwen3-4b

INT8-CW

32

122.8

64.9

5276.8

15.40832049

internvl2-4b

INT8-CW

297

303.2

65

5784.3

15.38461538

qwen2.5-7b-instruct

INT4-MIXED

32

102.1

65

5537.8

15.38461538

neural-chat-7b-v3-3

INT4-MIXED

32

115.7

65.1

5118

15.3609831

mistral-7b-instruct-v0.3

INT4-MIXED

32

113.8

65.2

5184.6

15.33742331

biomistral-7b-slerp

INT4-MIXED

7

101.9

65.3

5257.9

15.31393568

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

652.1

65.4

5168.2

15.29051988

codellama-7b

INT4-MIXED

1024

972

65.6

5129.2

15.24390244

mistral-7b-instruct-v0.1

INT4-MIXED

32

125.4

65.6

5121.9

15.24390244

qwen2.5-7b-instruct

INT4-MIXED

1024

645.9

65.6

5180.9

15.24390244

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

661.6

65.6

5178.1

15.24390244

mistral-7b-instruct-v0.2

INT4-MIXED

32

125.8

65.7

5097.2

15.22070015

bloomz-7b1

INT4-MIXED

32

107.8

65.9

5255.9

15.17450683

qwen3-vl-4b-thinking

INT4-MIXED

4909

8353.3

66

14334.2

15.15151515

gemma-3-4b-it

INT4-MIXED

1024

687.7

66.8

6933.3

14.97005988

deepseek-r1-distill-llama-8b

INT4-MIXED

32

111.6

66.9

5709.2

14.94768311

neural-chat-7b-v3-3

INT4-MIXED

1024

988.8

67

4890.5

14.92537313

qwen3-4b

INT8-CW

1024

502.8

67

5087.2

14.92537313

qwen3-vl-4b-thinking

INT4-MIXED

4939

8254.8

67

16940.5

14.92537313

llama-3-8b-instruct

INT4-MIXED

32

115.3

67.1

5736.4

14.90312966

phi-3.5-vision-instruct

INT8-CW

802

594.4

67.1

6514.1

14.90312966

afm-4.5b

INT8-CW

32

108

67.3

5825.3

14.85884101

mistral-7b-instruct-v0.1

INT4-MIXED

1025

1004

67.3

4869.9

14.85884101

llama-3.1-8b-instruct

INT4-MIXED

32

122.2

67.4

5718.4

14.83679525

mistral-7b-instruct-v0.2

INT4-MIXED

1024

1000.3

67.5

4889.5

14.81481481

mistral-7b-instruct-v0.3

INT4-MIXED

1024

1019.4

67.7

4988.9

14.77104874

minicpm-v-2_6

INT4-MIXED

228

743.2

68.2

6749.1

14.6627566

whisper-large-v3

FP16

32

507.2

68.2

5213.9

14.6627566

whisper-large-v3

FP16

1024

596.5

68.3

4894.2

14.64128843

phi-4-multimodal-instruct

INT4-MIXED

578

693.4

68.4

6162.3

14.61988304

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

115.1

68.5

5778.3

14.59854015

afm-4.5b

INT8-CW

1024

440.7

68.6

5471.2

14.57725948

phi-3.5-vision-instruct

INT8-CW

1032

793.2

68.6

6591.6

14.57725948

qwen2.5-7b-instruct-1m

INT4-MIXED

32

109.9

68.6

5777.1

14.57725948

internvl2-4b

INT8-CW

1027

740

68.7

6491.5

14.55604076

qwen2.5-7b-instruct

INT4-MIXED

32

122.1

68.7

5697.7

14.55604076

qwen2-7b-instruct

INT4-MIXED

32

116.3

68.8

5685

14.53488372

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

736.3

69.1

5509.9

14.47178003

minicpm4-8b

INT4-MIXED

32

124

69.1

5587.8

14.47178003

llama-3.1-8b-instruct

INT4-MIXED

1024

726.1

69.2

5513.2

14.45086705

bloomz-7b1

INT4-MIXED

32

109.1

69.3

5436.7

14.43001443

minicpm-o-2_6

INT4-MIXED

238

743.6

69.3

6851.2

14.43001443

llama-3-8b-instruct

INT4-MIXED

1024

729.9

69.4

5504.4

14.4092219

qwen3-vl-4b-thinking

INT4-MIXED

4909

8859

69.4

14496.9

14.4092219

minicpm3-4b

INT4-MIXED

32

340.9

69.5

3629.7

14.38848921

qwen3-vl-4b-thinking

INT4-MIXED

4939

8708.8

69.5

17092.5

14.38848921

phi-4-multimodal-instruct

INT4-MIXED

786

753

69.6

6185.6

14.36781609

qwen2-7b-instruct

INT4-MIXED

1024

903.6

69.7

5342.7

14.3472023

qwen2.5-7b-instruct

INT4-MIXED

1024

892.9

69.7

5342.4

14.3472023

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

882.9

69.8

5438.3

14.32664756

glm-edge-4b-chat

INT8-CW

32

212.4

69.8

5580.3

14.32664756

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

892.6

69.8

5434.8

14.32664756

bloomz-7b1

INT4-MIXED

1024

784.1

70.1

5458.3

14.26533524

minicpm4-8b

INT4-MIXED

1024

794.5

70.1

5178.5

14.26533524

qwen3-8b

INT4-MIXED

32

148.3

70.1

5895.8

14.26533524

phi-4-multimodal-instruct

INT4-MIXED

1362

1573.2

70.2

8029.9

14.24501425

phi-4-multimodal-instruct

INT4-MIXED

1570

1774.1

70.5

8405.6

14.18439716

llama-3-8b-instruct

INT4-MIXED

32

124.8

71.1

5899.1

14.06469761

qwen3-8b

INT4-MIXED

1024

778

71.9

5712.7

13.90820584

glm-edge-4b-chat

INT8-CW

1024

613.2

72.1

5334.6

13.86962552

falcon-7b-instruct

INT4-MIXED

32

122

72.3

5254.9

13.83125864

gemma-3-4b-it

INT4-MIXED

32

180.5

72.3

5068.8

13.83125864

minicpm-v-2_6

INT4-MIXED

228

802

73

6984.7

13.69863014

llama-3-8b-instruct

INT4-MIXED

1024

979.9

73.1

5678.3

13.67989056

falcon-7b-instruct

INT4-MIXED

1024

978.3

73.2

4786.3

13.66120219

minicpm4-8b

INT4-MIXED

32

133.8

73.2

5864.7

13.66120219

bloomz-7b1

INT4-MIXED

1024

1024.4

73.4

5601.2

13.6239782

minicpm-o-2_6

INT4-MIXED

238

804.1

73.5

6734.8

13.60544218

qwen3-8b

INT4-MIXED

32

136

73.8

6176.3

13.5501355

minicpm4-8b

INT4-MIXED

1024

1078.9

74.4

5444.7

13.44086022

minicpm-v-4_5

INT4-MIXED

217

792.8

75.7

7180.4

13.21003963

qwen3-8b

INT4-MIXED

1024

1034

75.7

5959.2

13.21003963

minicpm3-4b

INT4-MIXED

32

349.3

76.2

3819.4

13.12335958

gemma-3-4b-it

INT8-CW

32

182.1

76.3

6465.5

13.1061599

gemma-3-4b-it

INT4-MIXED

1024

594.6

77.6

6893.1

12.88659794

glm-4-9b-chat-hf

INT4-MIXED

32

124.4

78.2

6492

12.78772379

minicpm3-4b

INT4-MIXED

1024

834.6

78.4

4424.7

12.75510204

minicpm-v-4_5

INT4-MIXED

217

831.8

79

7412.9

12.65822785

minicpm3-4b

INT4-MIXED

32

301.7

79.6

3738.4

12.56281407

gemma-7b-it

INT4-MIXED

32

108.8

79.8

5895.4

12.53132832

glm-4-9b-chat-hf

INT4-MIXED

1024

890.5

79.9

6112.2

12.51564456

gemma-3-4b-it

INT8-CW

1024

611

80.6

8310.2

12.40694789

qwen3-vl-4b-thinking

INT4-MIXED

4939

8401.5

81.1

17067.5

12.33045623

baichuan2-7b-chat

INT8-CW

32

103.3

81.3

8607.3

12.300123

qwen3-vl-4b-thinking

INT4-MIXED

4909

8568.3

81.4

14372.9

12.28501229

gemma-2b-it

FP16

32

95.6

81.9

6044.5

12.21001221

minicpm3-4b

INT8-CW

32

325.9

81.9

5515.4

12.21001221

gemma-2b-it

FP16

1024

398.3

82.1

5798.5

12.18026797

glm-4-9b-chat-hf

INT4-MIXED

32

149.3

82.4

6789.5

12.13592233

gemma-7b-it

INT4-MIXED

1024

842.4

82.5

6108.5

12.12121212

llama-2-7b-chat-hf

INT4-MIXED

32

101.7

82.9

4815.8

12.06272618

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

95.8

83.4

5645.9

11.99040767

zephyr-7b-beta

INT4-MIXED

32

113.1

83.5

5829

11.9760479

deepseek-r1-distill-llama-8b

INT4-MIXED

32

138.7

83.6

6778.1

11.96172249

llama-3.1-8b-instruct

INT4-MIXED

32

141.3

83.8

6682.7

11.93317422

minicpm3-4b

INT4-MIXED

1024

959.9

83.8

4656.1

11.93317422

phi-4-multimodal-instruct

INT8-CW

578

725.5

83.8

8060.8

11.93317422

phi-4-multimodal-instruct

INT8-CW

786

798.8

83.8

8200.9

11.93317422

qwen2.5-vl-7b-instruct

INT4-MIXED

32

328.8

83.8

6553.8

11.93317422

gemma-7b-it

INT4-MIXED

32

113.4

84.2

6166.6

11.87648456

glm-4-9b-chat-hf

INT4-MIXED

1024

1210.2

84.2

6401.6

11.87648456

qwen2-7b-instruct

INT4-MIXED

32

104.9

84.2

5623.8

11.87648456

qwen2.5-7b-instruct

INT4-MIXED

32

102.8

84.2

5623.9

11.87648456

mistral-7b-instruct-v0.2

INT4-MIXED

32

103.4

84.4

5032

11.84834123

mistral-7b-instruct-v0.3

INT4-MIXED

32

101.1

84.5

5071.2

11.83431953

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

660.1

84.6

5283.4

11.82033097

qwen3-vl-4b-thinking

INT8-CW

4939

8513.4

84.7

18797.4

11.80637544

qwen3-vl-4b-thinking

INT8-CW

4909

8608.3

84.8

16095

11.79245283

phi-4-multimodal-instruct

INT8-CW

1362

1679

85

9867

11.76470588

baichuan2-7b-chat

INT8-CW

1024

603.2

85.1

8894.7

11.75088132

phi-4-multimodal-instruct

INT8-CW

1570

1861.7

85.1

10623.9

11.75088132

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

942

85.5

6584.5

11.69590643

qwen2.5-7b-instruct

INT4-MIXED

1024

657.5

85.6

5285.4

11.68224299

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

915.2

85.6

7543.6

11.68224299

llama-3.1-8b-instruct

INT4-MIXED

1024

946.4

85.7

6474.4

11.66861144

qwen2-7b-instruct

INT4-MIXED

1024

670.7

85.7

5283.7

11.66861144

zephyr-7b-beta

INT4-MIXED

1024

733.1

85.7

5665.4

11.66861144

mistral-7b-instruct-v0.2

INT4-MIXED

1024

720.1

86.6

4829.1

11.54734411

llava-next-video-7b-hf

INT4-MIXED

2945

4277.4

86.8

9040.3

11.52073733

mistral-7b-instruct-v0.3

INT4-MIXED

1024

726.9

86.8

4835.9

11.52073733

llama-2-7b-chat-hf

INT4-MIXED

1024

706.3

86.9

5065.4

11.50747986

minicpm3-4b

INT4-MIXED

1024

865.2

87

4566.5

11.49425287

minicpm-v-2_6

INT4-MIXED

228

717

87.3

6844.7

11.45475372

phi-4-multimodal-instruct

INT4-MIXED

578

791.6

87.4

6823.2

11.4416476

qwen2.5-vl-7b-instruct

INT4-MIXED

32

350.5

87.4

6701.9

11.4416476

gemma-7b-it

INT4-MIXED

1024

1139.7

87.6

6374.4

11.41552511

chatglm3-6b

INT8-CW

32

120.9

88

7413.5

11.36363636

gemma-2-2b

FP16

33

115.7

88.3

7447.3

11.32502831

phi-4-multimodal-instruct

INT4-MIXED

1362

1920.2

88.3

8544.4

11.32502831

phi-4-multimodal-instruct

INT4-MIXED

786

971.2

88.5

7181.3

11.29943503

gemma-2-9b-it

INT4-MIXED

32

170.4

88.6

6361.2

11.28668172

deepseek-r1-distill-llama-8b

INT4-MIXED

32

102

88.9

5812.6

11.24859393

phi-4-multimodal-instruct

INT4-MIXED

1570

2152.8

88.9

9329.7

11.24859393

chatglm3-6b

INT8-CW

1024

620.4

89.5

7040.1

11.17318436

gemma-2-2b

FP16

1025

464.7

89.7

7430.9

11.14827202

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

1123.4

89.7

7703.3

11.14827202

minicpm3-4b

INT8-CW

1024

893.1

89.8

6342.4

11.13585746

llama-3-8b-instruct

INT4-MIXED

32

106.2

90.2

5827.8

11.0864745

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

733.8

90.4

5608.9

11.0619469

llama-3-8b-instruct

INT4-MIXED

32

130

90.4

5816.9

11.0619469

llama-3.1-8b-instruct

INT4-MIXED

32

113.5

90.4

5839.3

11.0619469

phi-2

FP16

32

125.9

90.8

6745.8

11.01321586

gemma-2-9b-it

INT4-MIXED

1024

997.1

92.2

6452.3

10.84598698

llama-3-8b-instruct

INT4-MIXED

1024

724.9

92.3

5606.4

10.83423619

llama-3-8b-instruct

INT4-MIXED

1024

737.2

92.4

5612.2

10.82251082

llama-3.1-8b-instruct

INT4-MIXED

1024

728.5

92.4

5608

10.82251082

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

92

92.4

6785.5

10.82251082

qwen3-8b

INT4-MIXED

32

149.9

93

6076.2

10.75268817

stable-zephyr-3b-dpo

FP16

32

153.5

93.6

6664.6

10.68376068

gemma-2-9b-it

INT4-MIXED

32

173.4

93.7

6653.9

10.67235859

minicpm4-8b

INT4-MIXED

32

133.8

94.4

5777.2

10.59322034

phi-2

FP16

1024

599.9

94.9

7085.8

10.5374078

qwen3-8b

INT4-MIXED

1024

774

95.5

5884.6

10.47120419

minicpm4-8b

INT4-MIXED

1024

798

95.6

5376

10.46025105

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

100.5

96.4

6971.5

10.37344398

codellama-7b

INT8-CW

32

127.9

96.6

7829.5

10.35196687

stable-zephyr-3b-dpo

FP16

1024

613.4

97.1

7105

10.29866117

gemma-2-9b-it

INT4-MIXED

1024

1322.4

97.7

6768.1

10.23541453

ltx-video

INT8-CW

11

95.7

98.4

9779.2

10.16260163

llama-2-7b-chat-hf

INT8-CW

32

136.3

98.8

7925.9

10.12145749

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

distil-large-v2

INT4-MIXED

32

165.2

5.9

1578.7

169.4915254

distil-large-v2

INT4-MIXED

1024

209.7

6

1609.8

166.6666667

distil-large-v2

INT8-CW

1024

209

6.1

1868.3

163.9344262

distil-large-v2

INT8-CW

32

164.3

6.1

1835.8

163.9344262

gemma-3-270m

INT4-MIXED

32

29

7.2

1257.8

138.8888889

minicpm4-0.5b

INT4-MIXED

32

28.5

7.2

1117.5

138.8888889

gemma-3-270m

INT4-MIXED

1024

34

7.3

1326.7

136.9863014

minicpm4-0.5b

INT4-MIXED

1024

44.2

7.3

1166.3

136.9863014

gemma-3-270m

INT8-CW

32

30.1

7.3

1310.4

136.9863014

minicpm4-0.5b

INT4-MIXED

32

27.9

7.3

1131.4

136.9863014

gemma-3-270m

INT8-CW

1024

34.8

7.4

1351.6

135.1351351

tiny-llama-1.1b-chat

INT4-MIXED

32

25.3

7.4

1557.4

135.1351351

minicpm4-0.5b

INT4-MIXED

1024

47.3

7.6

1157.1

131.5789474

minicpm4-0.5b

INT8-CW

32

30.8

7.6

1283.3

131.5789474

minicpm4-0.5b

INT4-MIXED

1024

53.8

7.8

1312.8

128.2051282

minicpm4-0.5b

INT8-CW

1024

51

7.8

1331.8

128.2051282

minicpm4-0.5b

INT4-MIXED

32

30

7.8

1247.3

128.2051282

tiny-llama-1.1b-chat

INT4-MIXED

1024

88.3

8

1606.4

125

gemma-3-270m

FP16

1024

31

8.1

1626.7

123.4567901

gemma-3-270m

FP16

32

24.8

8.1

1526.7

123.4567901

tiny-llama-1.1b-chat

INT4-MIXED

32

26.6

8.1

1638

123.4567901

distil-large-v2

FP16

1024

209.3

8.2

2639.9

121.9512195

distil-large-v2

FP16

32

164.5

8.2

2607.5

121.9512195

tiny-llama-1.1b-chat

INT4-MIXED

1024

110.9

8.7

1736.2

114.9425287

codet5-base-sum

INT4-MIXED

291

20.4

8.9

1235.6

112.3595506

codet5-base-sum

FP16

26

15.9

8.9

1914.2

112.3595506

codet5-base-sum

FP16

291

16.1

9

1970.1

111.1111111

llama-3.2-1b-instruct

INT4-MIXED

32

22

9.4

1611.4

106.3829787

llama-3.2-1b-instruct

INT4-MIXED

32

22.7

9.5

1626.4

105.2631579

codet5-base-sum

INT4-MIXED

26

19.4

9.5

1326.7

105.2631579

codet5-base-sum

INT8-CW

26

19.9

9.5

1471.8

105.2631579

codet5-base-sum

INT4-MIXED

26

18.9

9.6

1183.9

104.1666667

codet5-base-sum

INT4-MIXED

291

21.1

9.8

1365.9

102.0408163

codet5-base-sum

INT8-CW

291

20.8

9.9

1522

101.010101

llama-3.2-1b-instruct

INT4-MIXED

1024

85.4

10

1695.3

100

llama-3.2-1b-instruct

INT4-MIXED

1024

98.2

10.1

1760.9

99.00990099

minicpm4-0.5b

FP16

32

25.1

10.8

1765.2

92.59259259

minicpm4-0.5b

FP16

1024

61

11.1

1813.2

90.09009009

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

33.9

11.1

1958.6

90.09009009

gemma-3-1b-it

INT4-MIXED

32

37.8

11.3

1595.3

88.49557522

gemma-3-1b-it

INT4-MIXED

32

38

11.3

1692.9

88.49557522

gemma-3-1b-it

INT4-MIXED

1024

88.9

11.5

1700.7

86.95652174

gemma-3-1b-it

INT4-MIXED

1024

91.8

11.5

1799.7

86.95652174

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

34.3

11.5

2012.5

86.95652174

qwen2.5-1.5b-instruct

INT4-MIXED

32

34.2

11.5

1790.4

86.95652174

qwen2.5-1.5b-instruct

INT4-MIXED

32

34.5

11.5

1791.8

86.95652174

gemma-3-1b-it

INT4-MIXED

32

39.2

11.7

1778.1

85.47008547

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

126.5

11.8

2055.8

84.74576271

gemma-3-1b-it

INT4-MIXED

1024

97.1

11.8

1868

84.74576271

nanollava

INT4-MIXED

760

133.1

11.8

2985.7

84.74576271

tiny-llama-1.1b-chat

INT8-CW

32

27.9

11.8

1980.6

84.74576271

nanollava

INT8-CW

760

134.6

12

3084.5

83.33333333

nanollava

INT4-MIXED

1752

230.1

12.1

4775.2

82.6446281

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

136.9

12.1

2206

82.6446281

qwen2.5-1.5b-instruct

INT4-MIXED

1024

137.3

12.2

1977.8

81.96721311

qwen2.5-1.5b-instruct

INT4-MIXED

1024

136.1

12.2

1959.4

81.96721311

nanollava

INT8-CW

1752

223.3

12.3

4759.4

81.30081301

qwen2.5-1.5b-instruct

INT4-MIXED

32

38.1

12.4

1902.1

80.64516129

tiny-llama-1.1b-chat

INT8-CW

1024

100.2

12.5

2061.5

80

qwen2.5-1.5b-instruct

INT4-MIXED

32

37.8

12.5

1900.7

80

qwen2.5-1.5b-instruct

INT4-MIXED

1024

145.1

13

2057

76.92307692

qwen2.5-1.5b-instruct

INT4-MIXED

1024

145.7

13.1

2060.4

76.33587786

nanollava

FP16

760

129.1

13.1

4024.5

76.33587786

gemma-3-1b-it

INT8-CW

32

42

13.5

2023.4

74.07407407

flan-t5-large-grammar-synthesis

FP16

32

35.6

13.9

4197.8

71.94244604

gemma-3-1b-it

INT8-CW

1024

99.8

14.2

2119.6

70.42253521

flan-t5-large-grammar-synthesis

FP16

1024

33.9

14.3

4685.5

69.93006993

llama-3.2-1b-instruct

INT8-CW

32

25.3

14.4

2129.5

69.44444444

nanollava

FP16

1752

236.1

14.5

5889.7

68.96551724

llama-3.2-1b-instruct

INT8-CW

1024

105.2

14.9

2248.8

67.11409396

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

38.4

14.9

2444.3

67.11409396

flan-t5-large-grammar-synthesis

INT4-MIXED

32

38.9

15.1

2105.4

66.22516556

flan-t5-large-grammar-synthesis

INT8-CW

32

42.9

15.4

2706.9

64.93506494

flan-t5-large-grammar-synthesis

INT4-MIXED

1024

41.5

15.5

2614.1

64.51612903

flan-t5-large-grammar-synthesis

INT8-CW

1024

42.9

15.5

3215.6

64.51612903

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

153.5

15.6

2511.5

64.1025641

minicpm-1b-sft

INT4-MIXED

31

61.9

16

1709

62.5

minicpm-1b-sft

INT4-MIXED

31

60.5

16.3

1819.8

61.34969325

minicpm-1b-sft

INT4-MIXED

1014

136

16.4

1891.8

60.97560976

minicpm-1b-sft

INT4-MIXED

1014

140

16.7

1986.7

59.88023952

minicpm-1b-sft

INT4-MIXED

31

66

16.8

1875.8

59.52380952

minicpm-1b-sft

INT4-MIXED

1014

159.6

17.1

2006.7

58.47953216

gemma-2b-it

INT4-MIXED

32

29.3

17.1

2359.4

58.47953216

smolvlm2-256m-video-instruct

FP16

1141

384.7

17.2

3353.8

58.13953488

smolvlm2-256m-video-instruct

INT8-CW

1141

391.5

17.6

2935.6

56.81818182

qwen2.5-1.5b-instruct

INT8-CW

32

38.7

17.8

2372.7

56.17977528

qwen2.5-1.5b-instruct

INT8-CW

32

39.8

17.8

2374.1

56.17977528

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

41.7

17.9

2696.1

55.86592179

gemma-2b-it

INT4-MIXED

32

30.9

17.9

2561.3

55.86592179

smolvlm2-256m-video-instruct

INT4-MIXED

1141

407.1

18.3

2839

54.64480874

gemma-2b-it

INT4-MIXED

1024

151.3

18.4

2497.3

54.34782609

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

133.3

18.5

2792.6

54.05405405

phi-2

INT4-MIXED

32

58.9

18.5

2522.9

54.05405405

qwen2.5-1.5b-instruct

INT8-CW

1024

133

18.6

2474.9

53.76344086

qwen2.5-1.5b-instruct

INT8-CW

1024

133.5

18.6

2477.1

53.76344086

minicpm-1b-sft

INT8-CW

31

72.1

18.6

2416.8

53.76344086

gemma-2b-it

INT4-MIXED

1024

207.6

19.2

2659.7

52.08333333

gemma-2-2b

INT4-MIXED

33

49.7

19.3

2536.8

51.8134715

minicpm-1b-sft

INT8-CW

1014

150.3

20.1

2581.6

49.75124378

phi-2

INT4-MIXED

32

56

20.2

2873

49.5049505

gemma-2-2b

INT4-MIXED

33

44.4

20.6

2771.3

48.54368932

phi-2

INT4-MIXED

1024

310.8

20.9

3058.4

47.84688995

qwen2.5-coder-3b-instruct

INT4-MIXED

32

62.7

21

2553.1

47.61904762

qwen2.5-coder-3b-instruct

INT4-MIXED

32

47.9

21

2640.8

47.61904762

gemma-2-2b

INT4-MIXED

1025

190.4

21.3

2802.8

46.94835681

qwen2.5-1.5b-instruct

INT4-MIXED

32

85.2

21.3

2145.9

46.94835681

qwen2.5-1.5b-instruct

INT4-MIXED

1024

186.6

21.5

2345.2

46.51162791

qwen2.5-1.5b-instruct

INT4-MIXED

32

78.8

21.5

2242.5

46.51162791

llama-3.2-3b-instruct

INT4-MIXED

32

47.1

21.6

2670.1

46.2962963

qwen2.5-1.5b-instruct

INT4-MIXED

1024

191.6

21.8

2437

45.87155963

gemma-2-2b

INT4-MIXED

1025

230

22.2

2975.9

45.04504505

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

231.9

22.4

2679.1

44.64285714

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

239.9

22.5

2725.3

44.44444444

llama-3.2-3b-instruct

INT4-MIXED

32

43.3

22.5

2754.8

44.44444444

qwen2.5-coder-3b-instruct

INT4-MIXED

32

55.1

22.5

2910.8

44.44444444

phi-2

INT4-MIXED

1024

374.6

22.6

3268.8

44.24778761

llama-3.2-3b-instruct

INT4-MIXED

32

40.5

22.9

2805.1

43.66812227

llama-3.2-3b-instruct

INT4-MIXED

1024

243.6

23.3

2936.5

42.91845494

gemma-3-1b-it

FP16

32

46.4

23.3

2963.8

42.91845494

gemma-3-1b-it

FP16

1024

112.7

23.7

3126.1

42.19409283

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

428.9

23.8

2906.6

42.01680672

phi-3.5-mini-instruct

INT4-MIXED

32

57.9

23.9

2995.1

41.84100418

phi-3-mini-128k-instruct

INT4-MIXED

32

57.7

23.9

3086.4

41.84100418

phi-3-mini-4k-instruct

INT4-MIXED

32

59.9

23.9

2993.6

41.84100418

llama-3.2-3b-instruct

INT4-MIXED

1024

252

24.2

2955.4

41.32231405

qwen2.5-1.5b-instruct

INT8-CW

32

91.6

24.3

2813.1

41.15226337

llama-3.2-3b-instruct

INT4-MIXED

1024

275.4

24.5

2994.2

40.81632653

phi-3-mini-4k-instruct

INT4-MIXED

32

51.4

24.9

3103.8

40.16064257

qwen2.5-1.5b-instruct

INT8-CW

1024

187.9

25

2947.1

40

phi-3.5-mini-instruct

INT4-MIXED

32

51.9

25

3102.4

40

phi-3-mini-128k-instruct

INT4-MIXED

32

52.6

25.3

3255.6

39.5256917

tiny-llama-1.1b-chat

FP16

32

30.4

25.6

3022.4

39.0625

tiny-llama-1.1b-chat

FP16

1024

116.2

26.3

3125.8

38.02281369

qwen3-4b

INT4-MIXED

32

71

26.6

3111.2

37.59398496

phi-3.5-mini-instruct

INT4-MIXED

1024

382.2

26.8

3577.1

37.31343284

phi-3-mini-128k-instruct

INT4-MIXED

1024

378.8

26.8

3671.6

37.31343284

phi-3-mini-4k-instruct

INT4-MIXED

1024

382.5

26.8

3576.1

37.31343284

phi-3-mini-4k-instruct

INT4-MIXED

32

50.8

26.9

3417.1

37.17472119

phi-3.5-mini-instruct

INT4-MIXED

32

55

27.2

3331.6

36.76470588

internvl2-4b

INT4-MIXED

297

224

27.5

4421.1

36.36363636

phi-4-mini-instruct

INT4-MIXED

32

74.8

27.6

3104.1

36.23188406

phi-4-mini-reasoning

INT4-MIXED

32

75.2

27.6

3105.9

36.23188406

qwen3-4b

INT4-MIXED

32

61.6

27.7

3306.9

36.10108303

phi-3-mini-4k-instruct

INT4-MIXED

1024

389.8

27.8

3623.5

35.97122302

internvl2-4b

INT4-MIXED

297

221.9

27.8

4435.7

35.97122302

phi-3.5-mini-instruct

INT4-MIXED

1024

387.1

27.9

3608.9

35.84229391

afm-4.5b

INT4-MIXED

32

54

28.2

3743.2

35.46099291

phi-3-mini-128k-instruct

INT4-MIXED

1024

476.3

28.3

3744.1

35.33568905

phi-4-mini-instruct

INT4-MIXED

32

67.8

28.5

3203.6

35.0877193

phi-4-mini-reasoning

INT4-MIXED

32

68.8

28.5

3201

35.0877193

gemma-2b-it

INT8-CW

32

39.7

28.6

3402.1

34.96503497

phi-4-mini-reasoning

INT4-MIXED

32

68.1

28.8

3353.2

34.72222222

gemma-3-4b-it

INT4-MIXED

32

92.6

29.1

4554.7

34.36426117

phi-4-mini-instruct

INT4-MIXED

1024

378.2

29.4

3422.6

34.01360544

phi-4-mini-reasoning

INT4-MIXED

1024

378.3

29.4

3423.3

34.01360544

qwen3-4b

INT4-MIXED

1024

353

29.4

3445

34.01360544

gemma-2b-it

INT8-CW

1024

217.2

29.5

3536.8

33.89830508

afm-4.5b

INT4-MIXED

1024

387.2

29.7

3830.9

33.67003367

phi-3-mini-4k-instruct

INT4-MIXED

1024

506

29.8

3829

33.55704698

phi-3.5-vision-instruct

INT4-MIXED

802

507.2

29.8

5366.7

33.55704698

llama-3.2-1b-instruct

FP16

32

35.9

29.9

3306.9

33.44481605

phi-3.5-mini-instruct

INT4-MIXED

1024

545.3

30

3761.1

33.33333333

gemma-3-4b-it

INT4-MIXED

32

90.9

30.1

4690.4

33.22259136

phi-4-mini-reasoning

INT4-MIXED

1024

380.9

30.2

3446.4

33.11258278

internvl2-4b

INT4-MIXED

1027

590

30.3

5622.2

33.00330033

phi-4-mini-instruct

INT4-MIXED

1024

384.7

30.3

3478.8

33.00330033

gemma-3-4b-it

INT4-MIXED

32

94.1

30.3

4742.4

33.00330033

qwen3-4b

INT4-MIXED

1024

362.9

30.4

3593.4

32.89473684

phi-4-mini-instruct

INT4-MIXED

32

74.3

30.4

3501.2

32.89473684

llama-3.2-1b-instruct

FP16

1024

136.4

30.5

3445.6

32.78688525

phi-4-mini-reasoning

INT4-MIXED

1024

403

30.6

3585.7

32.67973856

phi-3.5-vision-instruct

INT4-MIXED

1032

616.5

30.7

5853.6

32.5732899

gemma-2-2b

INT8-CW

33

55.4

30.7

3596.8

32.5732899

internvl2-4b

INT4-MIXED

1027

596.5

30.8

5620.7

32.46753247

gemma-3-4b-it

INT4-MIXED

1024

410.6

30.8

6846.8

32.46753247

phi-2

INT8-CW

32

78.8

31.7

3736.3

31.54574132

gemma-3-4b-it

INT4-MIXED

1024

427.8

31.8

6962.5

31.44654088

gemma-3-4b-it

INT4-MIXED

1024

448.9

32

7037.8

31.25

gemma-2-2b

INT8-CW

1025

245

32.2

3825.7

31.05590062

phi-4-mini-instruct

INT4-MIXED

1024

472.2

32.2

3645.3

31.05590062

minicpm-1b-sft

FP16

31

62

32.7

3677.6

30.58103976

chatglm3-6b

INT4-MIXED

32

49.3

33.3

4117.6

30.03003003

qwen2.5-1.5b-instruct

FP16

32

40.7

33.5

3812.1

29.85074627

deepseek-r1-distill-qwen-1.5b

FP16

32

41.8

33.6

4351.5

29.76190476

qwen2.5-coder-3b-instruct

INT8-CW

32

71.4

33.9

3890.2

29.49852507

phi-2

INT8-CW

1024

342.2

34

4182.9

29.41176471

minicpm-1b-sft

FP16

1014

186.7

34.2

3974.8

29.23976608

qwen2.5-1.5b-instruct

FP16

1024

166.7

34.3

3926.1

29.15451895

deepseek-r1-distill-qwen-1.5b

FP16

1024

165

34.4

4461.5

29.06976744

flan-t5-xxl

INT4-MIXED

33

58.2

34.5

12873.2

28.98550725

phi-4-multimodal-instruct

INT4-MIXED

578

484.1

34.8

5784.4

28.73563218

gpt-oss-20b

INT4-MIXED

32

228.1

35.1

13065.5

28.49002849

chatglm3-6b

INT4-MIXED

32

50

35.2

4471.1

28.40909091

chatglm3-6b

INT4-MIXED

1024

464.1

35.3

4286.6

28.3286119

qwen2.5-coder-3b-instruct

INT8-CW

1024

281.7

35.3

3997.4

28.3286119

phi-4-multimodal-instruct

INT4-MIXED

786

577

36.1

6726.3

27.70083102

gpt-oss-20b

INT4-MIXED

1024

716

36.8

13273.7

27.17391304

llama-2-7b-chat-hf

INT4-MIXED

32

62.5

36.8

4493.9

27.17391304

codellama-7b

INT4-MIXED

32

65.2

36.9

4495.3

27.100271

phi-4-multimodal-instruct

INT4-MIXED

1570

1350.2

37

8869.1

27.02702703

llama-3.2-3b-instruct

INT8-CW

32

55.6

37

4116

27.02702703

phi-4-multimodal-instruct

INT4-MIXED

1362

1206

37.1

8188.8

26.9541779

chatglm3-6b

INT4-MIXED

1024

540.3

37.2

4520.4

26.88172043

llama-3.2-3b-instruct

INT8-CW

1024

283.4

38.6

4343.2

25.90673575

llama-2-7b-chat-hf

INT4-MIXED

32

55.5

38.6

4671

25.90673575

minicpm3-4b

INT4-MIXED

32

204.2

38.6

3565.6

25.90673575

biomistral-7b-slerp

INT4-MIXED

7

47.3

38.6

4512.9

25.90673575

zephyr-7b-beta

INT4-MIXED

32

62.8

38.8

4526.7

25.77319588

mistral-7b-instruct-v0.2

INT4-MIXED

32

62.9

38.9

4525.4

25.70694087

mistral-7b-instruct-v0.3

INT4-MIXED

32

61.9

39

4531.9

25.64102564

neural-chat-7b-v3-3

INT4-MIXED

32

62.7

39

4522.5

25.64102564

codellama-7b

INT4-MIXED

32

56.7

39.5

4871.4

25.3164557

minicpm3-4b

INT4-MIXED

32

206.9

39.5

3632.2

25.3164557

falcon-7b-instruct

INT4-MIXED

32

64.2

40.3

4332.1

24.81389578

minicpm3-4b

INT4-MIXED

32

206.6

40.4

3854.5

24.75247525

llama-2-7b-chat-hf

INT4-MIXED

1024

534.2

40.6

5094.6

24.63054187

codellama-7b

INT4-MIXED

1024

530.8

40.8

5096.4

24.50980392

mistral-7b-instruct-v0.2

INT4-MIXED

32

59.1

40.9

4704

24.44987775

mistral-7b-instruct-v0.3

INT4-MIXED

32

59.1

40.9

4700.7

24.44987775

phi-4-multimodal-instruct

INT4-MIXED

578

559.7

41.1

6677.2

24.33090024

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

42.6

41.1

6862.7

24.33090024

phi-4-multimodal-instruct

INT4-MIXED

786

674.1

41.2

7218.8

24.27184466

zephyr-7b-beta

INT4-MIXED

1024

543.5

41.4

4833.1

24.15458937

biomistral-7b-slerp

INT4-MIXED

7

47.1

41.4

4912.6

24.15458937

mistral-7b-instruct-v0.2

INT4-MIXED

1024

545.5

41.5

4831.8

24.09638554

mistral-7b-instruct-v0.1

INT4-MIXED

32

59.8

41.5

4829.2

24.09638554

mistral-7b-instruct-v0.2

INT4-MIXED

32

58.7

41.5

4826.9

24.09638554

neural-chat-7b-v3-3

INT4-MIXED

32

58.5

41.5

4825.9

24.09638554

mistral-7b-instruct-v0.3

INT4-MIXED

1024

545

41.6

4838.2

24.03846154

neural-chat-7b-v3-3

INT4-MIXED

1024

556.6

41.6

4829.7

24.03846154

mistral-7b-instruct-v0.3

INT4-MIXED

32

58.8

41.6

4836.5

24.03846154

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

60.8

41.9

5072.2

23.86634845

qwen2.5-7b-instruct

INT4-MIXED

32

60.8

41.9

5069.7

23.86634845

qwen2.5-7b-instruct

INT4-MIXED

32

59.8

41.9

5068.4

23.86634845

qwen2.5-7b-instruct-1m

INT4-MIXED

32

60.2

41.9

5069.1

23.86634845

falcon-7b-instruct

INT4-MIXED

1024

572.1

42.2

4476.9

23.69668246

llama-2-7b-chat-hf

INT4-MIXED

1024

557.5

42.5

5161.9

23.52941176

phi-4-multimodal-instruct

INT4-MIXED

1570

1535

42.6

9687.5

23.4741784

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

44.1

42.6

6896.5

23.4741784

minicpm3-4b

INT4-MIXED

1024

756.2

42.7

4524.9

23.41920375

llama-3-8b-instruct

INT4-MIXED

32

67

42.7

5305.2

23.41920375

deepseek-r1-distill-llama-8b

INT4-MIXED

32

68.4

42.8

5303.7

23.36448598

llama-3.1-8b-instruct

INT4-MIXED

32

68

42.8

5302.2

23.36448598

phi-4-multimodal-instruct

INT4-MIXED

1362

1388

43

8972.4

23.25581395

codellama-7b

INT4-MIXED

1024

623.6

43.2

5334.6

23.14814815

minicpm3-4b

INT4-MIXED

1024

828.5

43.2

4587.8

23.14814815

minicpm-v-2_6

INT4-MIXED

228

547.3

43.3

6302.9

23.09468822

phi-3-mini-4k-instruct

INT8-CW

32

62.9

43.3

4722.9

23.09468822

mistral-7b-instruct-v0.3

INT4-MIXED

1024

565.5

43.4

4913.5

23.04147465

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

60

43.4

5252.5

23.04147465

phi-3.5-mini-instruct

INT8-CW

32

62

43.4

4724.1

23.04147465

qwen2.5-7b-instruct

INT4-MIXED

32

59.2

43.4

5254.5

23.04147465

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

500.6

43.5

5277

22.98850575

mistral-7b-instruct-v0.2

INT4-MIXED

1024

560.9

43.5

4909.1

22.98850575

qwen2.5-7b-instruct

INT4-MIXED

1024

486.3

43.5

5275.5

22.98850575

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

487

43.5

5274

22.98850575

qwen2.5-7b-instruct

INT4-MIXED

32

59.3

43.5

5252.6

22.98850575

qwen2-7b-instruct

INT4-MIXED

32

59.4

43.5

5255.5

22.98850575

qwen2.5-7b-instruct

INT4-MIXED

1024

489.2

43.6

5273.3

22.93577982

minicpm-o-2_6

INT4-MIXED

238

539.4

43.6

6409.8

22.93577982

minicpm3-4b

INT4-MIXED

1024

867

43.8

4743.3

22.83105023

mistral-7b-instruct-v0.1

INT4-MIXED

1025

694

43.9

4969.7

22.77904328

neural-chat-7b-v3-3

INT4-MIXED

1024

653.5

43.9

4980.3

22.77904328

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

59

43.9

5458.8

22.77904328

qwen2.5-7b-instruct

INT4-MIXED

32

59.2

43.9

5370.3

22.77904328

qwen2.5-7b-instruct

INT4-MIXED

32

59.6

43.9

5373.5

22.77904328

qwen2.5-7b-instruct-1m

INT4-MIXED

32

59.2

43.9

5367.9

22.77904328

qwen2-7b-instruct

INT4-MIXED

32

59.4

43.9

5370.4

22.77904328

mistral-7b-instruct-v0.2

INT4-MIXED

1024

646

44

4978.7

22.72727273

mistral-7b-instruct-v0.3

INT4-MIXED

1024

646.2

44.1

4988.5

22.67573696

bloomz-7b1

INT4-MIXED

32

62.8

44.4

5010.2

22.52252252

falcon-7b-instruct

INT4-MIXED

32

62.3

44.6

5026.1

22.42152466

minicpm4-8b

INT4-MIXED

32

65.9

44.7

5111.5

22.37136465

minicpm-v-2_6

INT4-MIXED

228

570.2

44.8

6456.6

22.32142857

llama-3-8b-instruct

INT4-MIXED

32

64.5

44.8

5461.5

22.32142857

llama-3-8b-instruct

INT4-MIXED

32

63.5

44.8

5464

22.32142857

deepseek-r1-distill-llama-8b

INT4-MIXED

32

63.4

44.9

5464.9

22.27171492

llama-3.1-8b-instruct

INT4-MIXED

32

63.6

44.9

5462.2

22.27171492

phi-4-mini-instruct

INT8-CW

32

81.9

44.9

4649.1

22.27171492

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

520.8

45.1

5369.9

22.172949

qwen2.5-7b-instruct

INT4-MIXED

1024

509.2

45.1

5371.1

22.172949

qwen2-7b-instruct

INT4-MIXED

1024

518.6

45.1

5354.4

22.172949

qwen2.5-7b-instruct

INT4-MIXED

1024

513.8

45.2

5354.4

22.12389381

phi-4-mini-reasoning

INT8-CW

32

79.8

45.2

4648.3

22.12389381

llama-3-8b-instruct

INT4-MIXED

1024

545.6

45.3

5611.1

22.07505519

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

547.8

45.4

5610.8

22.02643172

llama-3.1-8b-instruct

INT4-MIXED

1024

549.3

45.4

5609.5

22.02643172

minicpm-v-2_6

INT4-MIXED

228

585.5

45.4

6679.8

22.02643172

llama-3-8b-instruct

INT4-MIXED

32

63.6

45.4

5574.1

22.02643172

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

589.6

45.5

5529.9

21.97802198

minicpm-o-2_6

INT4-MIXED

238

566.7

45.5

6672.9

21.97802198

gemma-3-4b-it

INT8-CW

32

98

45.5

6144.7

21.97802198

phi-3-mini-128k-instruct

INT8-CW

32

84.4

45.5

4725.1

21.97802198

qwen2.5-7b-instruct

INT4-MIXED

1024

587.6

45.6

5438.2

21.92982456

qwen2.5-7b-instruct

INT4-MIXED

1024

589

45.6

5435.3

21.92982456

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

587.6

45.6

5433

21.92982456

qwen2-7b-instruct

INT4-MIXED

1024

588.8

45.6

5433.9

21.92982456

internvl2-4b

INT8-CW

297

219

45.9

5858.2

21.78649237

minicpm-v-4_5

INT4-MIXED

217

567.6

46

6820.1

21.73913043

phi-3-mini-4k-instruct

INT8-CW

1024

489.4

46.1

5207.2

21.69197397

phi-3.5-mini-instruct

INT8-CW

1024

499

46.2

5211.1

21.64502165

qwen3-4b

INT8-CW

32

77.5

46.2

4949.6

21.64502165

qwen3-8b

INT4-MIXED

32

74.5

46.6

5736

21.45922747

falcon-7b-instruct

INT4-MIXED

1024

673.9

46.7

4919.7

21.41327623

minicpm4-8b

INT4-MIXED

1024

574.6

46.7

5276.5

21.41327623

phi-4-mini-instruct

INT8-CW

1024

437.7

46.7

4925.6

21.41327623

afm-4.5b

INT8-CW

32

65.5

46.9

5361.7

21.32196162

bloomz-7b1

INT4-MIXED

32

68

46.9

5258

21.32196162

minicpm4-8b

INT4-MIXED

32

70.4

46.9

5324.8

21.32196162

phi-4-mini-reasoning

INT8-CW

1024

431

47

4924.2

21.27659574

qwen3-8b

INT4-MIXED

1024

631.5

47.2

5812.6

21.18644068

qwen3-8b

INT4-MIXED

32

72.9

47.2

5760.9

21.18644068

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

568.7

47.3

5685

21.14164905

gemma-3-4b-it

INT8-CW

1024

462.4

47.3

8462.5

21.14164905

llama-3-8b-instruct

INT4-MIXED

1024

575.1

47.3

5683.6

21.14164905

llama-3-8b-instruct

INT4-MIXED

1024

571.7

47.3

5685

21.14164905

llama-3.1-8b-instruct

INT4-MIXED

1024

571.8

47.4

5699.6

21.09704641

minicpm4-8b

INT4-MIXED

32

70.4

47.5

5535.6

21.05263158

phi-3.5-vision-instruct

INT8-CW

802

521.6

47.8

6783.7

20.92050209

bloomz-7b1

INT4-MIXED

1024

678.6

48

5571.2

20.83333333

llama-3-8b-instruct

INT4-MIXED

1024

642.2

48.1

5755.2

20.79002079

qwen3-8b

INT4-MIXED

32

101.2

48.1

5503.4

20.79002079

phi-3.5-vision-instruct

INT8-CW

1032

627.9

48.5

7122.5

20.6185567

afm-4.5b

INT8-CW

1024

381.9

48.5

5519.4

20.6185567

phi-3-mini-128k-instruct

INT8-CW

1024

519.2

48.5

5210.2

20.6185567

minicpm-v-4_5

INT4-MIXED

217

591

48.8

7147.1

20.49180328

internvl2-4b

INT8-CW

1027

627.6

48.9

6898.9

20.44989775

qwen3-4b

INT8-CW

1024

402.4

48.9

5232.3

20.44989775

qwen3-8b

INT4-MIXED

1024

592.8

49.2

5997.3

20.32520325

minicpm4-8b

INT4-MIXED

1024

617.5

49.3

5363.9

20.28397566

minicpm4-8b

INT4-MIXED

1024

691.7

49.6

5565.2

20.16129032

qwen3-8b

INT4-MIXED

1024

668.1

49.9

5954.1

20.04008016

zephyr-7b-beta

INT4-MIXED

32

67

49.9

5438.5

20.04008016

glm-4-9b-chat-hf

INT4-MIXED

32

77.2

50.1

6038.3

19.96007984

bloomz-7b1

INT4-MIXED

1024

786.7

50.3

5796

19.88071571

qwen2.5-vl-7b-instruct

INT4-MIXED

32

185.1

50.3

6220.1

19.88071571

gemma-7b-it

INT4-MIXED

32

74.4

51.1

5633.1

19.56947162

baichuan2-7b-chat

INT4-MIXED

32

67.5

51.5

6147.4

19.41747573

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

643.1

52.1

7603.7

19.19385797

glm-4-9b-chat-hf

INT4-MIXED

32

74.3

52.1

6334.3

19.19385797

zephyr-7b-beta

INT4-MIXED

1024

571.9

52.4

5700

19.08396947

gpt-oss-20b

INT4-MIXED

32

273.3

52.4

12343.5

19.08396947

glm-4-9b-chat-hf

INT4-MIXED

32

74.1

52.7

6482.8

18.97533207

glm-4-9b-chat-hf

INT4-MIXED

1024

674.3

52.9

6220.4

18.90359168

qwen2.5-vl-7b-instruct

INT4-MIXED

32

186.8

53

6434.9

18.86792453

qwen2.5-vl-7b-instruct

INT4-MIXED

32

187.3

53.3

6550.4

18.76172608

gemma-7b-it

INT4-MIXED

32

78.7

53.6

5931.9

18.65671642

gpt-oss-20b

INT4-MIXED

1024

691.4

53.8

12528.5

18.58736059

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

681.6

54.1

7682

18.48428835

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

746.2

54.3

7768.1

18.41620626

ltx-video

INT4-MIXED

11

55.5

54.5

6160.4

18.34862385

glm-4-9b-chat-hf

INT4-MIXED

1024

692.1

54.8

6413.3

18.24817518

phi-4-multimodal-instruct

INT8-CW

578

524.6

54.9

7564.4

18.21493625

gemma-7b-it

INT4-MIXED

1024

640.3

55

6203.8

18.18181818

phi-4-multimodal-instruct

INT8-CW

786

631.8

55.2

8482

18.11594203

baichuan2-7b-chat

INT4-MIXED

1024

650

55.4

6694.2

18.05054152

glm-4-9b-chat-hf

INT4-MIXED

1024

778.2

55.5

6481.4

18.01801802

llava-next-video-7b-hf

INT4-MIXED

2945

3136.4

56.3

9258

17.76198934

minicpm3-4b

INT8-CW

32

197

56.3

5459.5

17.76198934

phi-4-multimodal-instruct

INT8-CW

1570

1471.2

56.5

10810.3

17.69911504

llama-3.1-8b-instruct

INT4-MIXED

32

75.6

56.5

6452.1

17.69911504

phi-4-multimodal-instruct

INT8-CW

1362

1318.5

56.6

9941.4

17.66784452

deepseek-r1-distill-llama-8b

INT4-MIXED

32

75

56.6

6541.9

17.66784452

ltx-video

INT8-CW

11

57.3

56.7

9508.1

17.6366843

phi-2

FP16

32

66.7

57.6

6547.1

17.36111111

gemma-2-9b-it

INT4-MIXED

32

91.9

57.7

6056.6

17.33102253

gemma-7b-it

INT4-MIXED

1024

749.6

57.8

6465.9

17.30103806

llama-3.1-8b-instruct

INT4-MIXED

1024

711.3

58.8

6550.5

17.00680272

gemma-2-2b

FP16

33

66.4

58.8

7241.3

17.00680272

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

718.4

58.9

6642.4

16.97792869

gemma-2-9b-it

INT4-MIXED

32

83.1

59.8

6276.2

16.72240803

gemma-2-2b

FP16

1025

335.5

60.4

7549.3

16.55629139

gemma-2-9b-it

INT4-MIXED

32

84.3

60.5

6434.5

16.52892562

gemma-2b-it

FP16

32

68.5

60.9

5774

16.42036125

phi-2

FP16

1024

423.8

61.6

7173

16.23376623

gemma-2b-it

FP16

1024

260.4

61.7

5915.1

16.20745543

gemma-2-9b-it

INT4-MIXED

1024

717.1

62.2

6555.7

16.07717042

minicpm3-4b

INT8-CW

1024

902.9

62.3

6404.7

16.05136437

stable-diffusion-xl-1.0-inpainting-0.1

FP16

32

64.1

63.1

9346.9

15.84786054

gemma-2-9b-it

INT4-MIXED

1024

741.3

64.6

6761

15.47987616

gemma-2-9b-it

INT4-MIXED

1024

806.7

65.2

6850

15.33742331

lcm-dreamshaper-v7

INT8-HYBRID

32

69.5

66.3

3567.4

15.08295626

lcm-dreamshaper-v7

INT8-HYBRID

1024

67.9

66.7

3273.3

14.99250375

chatglm3-6b

INT8-CW

32

81.8

67.1

6940.8

14.90312966

llama-3.2-3b-instruct

FP16

32

78

67.4

7170.1

14.83679525

dolly-v2-12b

INT4-MIXED

32

104.2

68.2

7600.2

14.6627566

chatglm3-6b

INT8-CW

1024

530.4

69

7102.7

14.49275362

llama-3.2-3b-instruct

FP16

1024

410.6

69.2

7458.2

14.45086705

llama-2-13b-chat-hf

INT4-MIXED

32

102.9

70.3

7696.4

14.22475107

qwen2.5-coder-3b-instruct

FP16

32

80

72.7

6876.7

13.75515818

dolly-v2-12b

INT4-MIXED

1024

1257.7

73.3

8405.5

13.6425648

gemma-3-12b-it

INT4-MIXED

32

168

73.5

9114.4

13.60544218

flan-t5-xxl

INT8-CW

33

247.2

73.7

22627.8

13.56852103

qwen2.5-coder-3b-instruct

FP16

1024

317.4

74.3

7032

13.4589502

falcon-7b-instruct

INT8-CW

32

93.2

74.7

7580.7

13.38688086

qwen2.5-7b-instruct-1m

INT8-CW

32

100.5

74.9

8186

13.35113485

qwen2.5-7b-instruct

INT8-CW

32

99

75

8186.7

13.33333333

qwen2.5-7b-instruct

INT8-CW

32

98.3

75

8185.9

13.33333333

qwen2-7b-instruct

INT8-CW

32

99.6

75

8187.4

13.33333333

deepseek-r1-distill-qwen-7b

INT8-CW

32

99.3

75.1

8187.5

13.31557923

llama-2-13b-chat-hf

INT4-MIXED

1024

1016.5

75.9

8610.4

13.17523057

falcon-7b-instruct

INT8-CW

1024

830.2

76.5

7740.6

13.07189542

qwen2.5-7b-instruct

INT8-CW

1024

577.1

76.5

8395.8

13.07189542

minicpm-o-2_6

INT8-CW

238

550

76.5

9494

13.07189542

minicpm-v-2_6

INT8-CW

228

557.2

76.5

9481.8

13.07189542

deepseek-r1-distill-qwen-7b

INT8-CW

1024

581.4

76.6

8395.8

13.05483029

qwen2-7b-instruct

INT8-CW

1024

575.5

76.6

8395.1

13.05483029

qwen2.5-7b-instruct

INT8-CW

1024

575.5

76.7

8395.9

13.03780965

qwen2.5-7b-instruct-1m

INT8-CW

1024

577.2

76.7

8397

13.03780965

llama-2-7b-chat-hf

INT8-CW

32

92.2

76.7

7681.5

13.03780965

gemma-3-12b-it

INT4-MIXED

32

174.6

77.3

9390.6

12.93661061

phi-3-mini-128k-instruct

FP16

32

86.9

77.5

8582.3

12.90322581

gemma-3-12b-it

INT4-MIXED

1024

1105.7

77.6

11768.6

12.88659794

codellama-7b

INT8-CW

32

93.6

77.6

7680.1

12.88659794

phi-4

INT4-MIXED

32

114.5

77.6

8505.8

12.88659794

phi-4-reasoning

INT4-MIXED

32

114.9

77.6

8506.8

12.88659794

phi-3.5-mini-instruct

FP16

32

90.1

77.7

8487

12.87001287

phi-3-mini-4k-instruct

FP16

32

88.5

77.7

8486.7

12.87001287

lcm-dreamshaper-v7

INT8-CW

1024

81.5

78.3

3490.1

12.77139208

lcm-dreamshaper-v7

INT8-CW

32

79.8

78.6

5041

12.72264631

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

109.6

79.2

8841.3

12.62626263

lcm-dreamshaper-v7

FP16

32

80

79.3

4024.1

12.61034048

lcm-dreamshaper-v7

FP16

1024

80.2

79.4

3930.4

12.59445844

lcm-dreamshaper-v7

INT8-CW

1024

83.4

79.5

3493.8

12.57861635

lcm-dreamshaper-v7

INT8-CW

32

80.6

79.6

5194.2

12.56281407

qwen1.5-14b-chat

INT4-MIXED

32

116.6

79.7

9202.9

12.54705144

internvl2-4b

FP16

297

287.7

79.8

9597.4

12.53132832

phi-4-mini-reasoning

FP16

32

91.2

80.1

8410.1

12.48439451

llama-2-7b-chat-hf

INT8-CW

1024

592.1

80.4

8262.7

12.43781095

phi-4-mini-instruct

FP16

32

92.1

80.7

8406.9

12.39157373

phi-4-reasoning

INT4-MIXED

32

122.8

80.7

8926

12.39157373

baichuan2-7b-chat

INT8-CW

32

97.9

80.9

8456.2

12.36093943

codellama-7b

INT8-CW

1024

682.5

81.2

8263.5

12.31527094

mistral-7b-instruct-v0.2

INT8-CW

32

102.7

81.2

7862.6

12.31527094

phi-4-reasoning

INT4-MIXED

1024

1107.8

81.3

8982.9

12.300123

zephyr-7b-beta

INT8-CW

32

99.3

81.3

7861.6

12.300123

gemma-3-12b-it

INT4-MIXED

1024

1232.1

81.4

12032.5

12.28501229

phi-4

INT4-MIXED

1024

1122.2

81.5

8982.4

12.26993865

phi-4

INT4-MIXED

32

123

81.5

9155.6

12.26993865

phi-3-mini-128k-instruct

FP16

1024

567

81.6

9341.5

12.25490196

biomistral-7b-slerp

INT8-CW

7

88.5

81.7

7852.1

12.23990208

phi-3.5-mini-instruct

FP16

1024

565.9

81.9

9247.1

12.21001221

mistral-7b-instruct-v0.1

INT8-CW

32

100.6

82

7860.3

12.19512195

neural-chat-7b-v3-3

INT8-CW

32

98

82

7858.8

12.19512195

phi-3.5-vision-instruct

FP16

802

699.6

82.1

10357.6

12.18026797

phi-3-mini-4k-instruct

FP16

1024

569.5

82.2

9247

12.16545012

mistral-7b-instruct-v0.3

INT8-CW

32

98.6

82.2

7868.9

12.16545012

phi-4-mini-reasoning

FP16

1024

554.2

82.4

8729.5

12.13592233

internvl2-4b

FP16

1027

849.8

82.7

10403.4

12.09189843

phi-4-mini-instruct

FP16

1024

558.5

82.8

8730.7

12.07729469

phi-3.5-vision-instruct

FP16

1032

866.9

83

10576.5

12.04819277

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

1028.7

83.3

9242.7

12.00480192

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

114.8

83.4

9494.7

11.99040767

mistral-7b-instruct-v0.2

INT8-CW

1024

639.4

83.7

8168.7

11.9474313

qwen2.5-vl-7b-instruct

INT8-CW

32

218

83.7

9415.1

11.9474313

zephyr-7b-beta

INT8-CW

1024

612.5

83.8

8163.9

11.93317422

mistral-7b-instruct-v0.1

INT8-CW

1025

660.7

84.3

8155.5

11.8623962

neural-chat-7b-v3-3

INT8-CW

1024

613.9

84.5

8161.2

11.83431953

phi-4-reasoning

INT4-MIXED

1024

1158.4

84.5

9270.2

11.83431953

baichuan2-7b-chat

INT8-CW

1024

647.5

84.6

9043.3

11.82033097

mistral-7b-instruct-v0.3

INT8-CW

1024

616.3

84.6

8170.7

11.82033097

llama-3.1-8b-instruct

INT8-CW

32

104.5

84.7

8640.4

11.80637544

bloomz-7b1

INT8-CW

32

95.2

84.9

7994.2

11.77856302

qwen2.5-vl-7b-instruct

INT8-CW

1024

752

85

10806.4

11.76470588

llama-3-8b-instruct

INT8-CW

32

102.9

85.1

8640

11.75088132

qwen1.5-14b-chat

INT4-MIXED

1024

1154.6

85.3

10086.1

11.72332943

phi-4

INT4-MIXED

1024

1306.3

85.4

9376.9

11.70960187

deepseek-r1-distill-llama-8b

INT8-CW

32

104.1

85.6

8726.7

11.68224299

qwen3-4b

FP16

32

96.4

85.7

8773.1

11.66861144

phi-4-reasoning

INT4-MIXED

32

123.9

86.3

9505.1

11.58748552

llama-3.1-8b-instruct

INT8-CW

1024

618.6

87

8944.8

11.49425287

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

1205.8

87.4

9653.2

11.4416476

llama-3-8b-instruct

INT8-CW

1024

618.6

87.4

8944.1

11.4416476

deepseek-r1-distill-llama-8b

INT8-CW

1024

622.9

87.7

9040

11.40250855

gemma-3-4b-it

FP16

32

127.8

87.9

10778.6

11.37656428

llama-2-13b-chat-hf

INT4-MIXED

32

121.4

87.9

9388.1

11.37656428

bloomz-7b1

INT8-CW

1024

792.8

88.3

8547.2

11.32502831

llava-v1.6-mistral-7b-hf

INT8-CW

2944

2891.8

88.8

10236.5

11.26126126

qwen3-4b

FP16

1024

496.2

88.9

9122.1

11.24859393

afm-4.5b

FP16

32

101.1

89.1

9829.9

11.22334456

gemma-3-4b-it

FP16

1024

605

89.3

13129.3

11.19820829

starcoder

INT4-MIXED

32

141.2

89.3

9249.4

11.19820829

phi-4-reasoning

INT4-MIXED

1024

1365.3

90.2

9669.2

11.0864745

afm-4.5b

FP16

1024

496.9

90.9

10076.8

11.00110011

qwen3-8b

INT8-CW

32

105.3

90.9

8916.7

11.00110011

minicpm-v-4_5

INT8-CW

217

571.7

92.3

10151.6

10.83423619

starcoder

INT4-MIXED

1024

1426.3

92.7

9069.2

10.78748652

minicpm4-8b

INT8-CW

32

111.1

93

8735.4

10.75268817

qwen3-8b

INT8-CW

1024

664.4

93.4

9222.7

10.70663812

gemma-7b-it

INT8-CW

32

119.4

93.4

9334.7

10.70663812

llama-2-13b-chat-hf

INT4-MIXED

1024

1164.9

93.5

10230

10.69518717

llava-next-video-7b-hf

INT8-CW

2945

3277.1

93.6

12069.4

10.68376068

minicpm4-8b

INT8-CW

1024

727.2

95.2

8900.9

10.50420168

minicpm3-4b

FP16

32

176.5

95.5

9596.7

10.47120419

glm-4-9b-chat-hf

INT8-CW

32

118.5

96.9

10027.2

10.31991744

gemma-7b-it

INT8-CW

1024

764.7

97.2

9901.3

10.28806584

phi-4-multimodal-instruct

FP16

578

659.8

99.3

12503.4

10.07049345

glm-4-9b-chat-hf

INT8-CW

1024

782.8

99.8

10198.1

10.02004008

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

t5-small

FP16

1024

6.7

3.1

1301.9

322.5806452

t5-small

FP16

32

6.3

3.1

1180.4

322.5806452

t5-small

INT4-MIXED

1024

7.2

3.2

1129.4

312.5

t5-small

INT4-MIXED

1024

7.3

3.2

1022.7

312.5

t5-small

INT4-MIXED

1024

7.2

3.2

1125.6

312.5

t5-small

INT8-CW

1024

7.5

3.2

1169.2

312.5

t5-small

INT4-MIXED

32

6.8

3.3

1019.4

303.030303

t5-small

INT4-MIXED

32

6.8

3.3

909.2

303.030303

t5-small

INT4-MIXED

32

7

3.3

1009.1

303.030303

t5-small

INT8-CW

32

6.8

3.4

1053.2

294.1176471

minicpm4-0.5b

INT4-MIXED

32

16.3

4.1

1056.3

243.902439

minicpm4-0.5b

INT4-MIXED

32

17.6

4.2

1195.6

238.0952381

minicpm4-0.5b

INT4-MIXED

1024

23.6

4.2

1109.1

238.0952381

minicpm4-0.5b

INT4-MIXED

32

16.4

4.2

1155.1

238.0952381

minicpm4-0.5b

INT4-MIXED

1024

29.1

4.4

1259.2

227.2727273

minicpm4-0.5b

INT4-MIXED

1024

24.4

4.4

1190.9

227.2727273

gemma-3-270m

INT4-MIXED

32

17.4

4.5

1207

222.2222222

gemma-3-270m

INT8-CW

1024

21.2

4.5

1285.8

222.2222222

gemma-3-270m

INT8-CW

32

18

4.5

1241.3

222.2222222

whisper-large-v3-turbo

INT4-MIXED

1024

173.9

4.5

1711.3

222.2222222

distil-large-v2

INT4-MIXED

1024

163.7

4.6

1639.3

217.3913043

gemma-3-270m

INT4-MIXED

1024

20.2

4.6

1256.1

217.3913043

distil-large-v2

INT4-MIXED

32

111.3

4.7

1615.2

212.7659574

whisper-large-v3-turbo

INT4-MIXED

32

117.8

4.7

1678.5

212.7659574

distil-large-v2

INT8-CW

1024

155.4

4.8

1899

208.3333333

whisper-large-v3-turbo

INT8-CW

1024

166.9

4.8

1996

208.3333333

distil-large-v2

INT8-CW

32

105.3

4.9

1866

204.0816327

whisper-large-v3-turbo

INT8-CW

32

112

4.9

1963.7

204.0816327

minicpm4-0.5b

INT8-CW

32

18.4

5.7

1315.8

175.4385965

whisper-small

INT4-MIXED

32

66.5

6

1291.6

166.6666667

minicpm4-0.5b

INT8-CW

1024

26.4

6.1

1364.4

163.9344262

whisper-small

INT4-MIXED

1024

118.7

6.2

1324.4

161.2903226

whisper-small

INT4-MIXED

32

71.1

6.3

1444

158.7301587

whisper-small

INT4-MIXED

1024

123.2

6.3

1477.7

158.7301587

whisper-small

INT8-CW

1024

122.4

6.3

1531.2

158.7301587

whisper-small

INT4-MIXED

1024

120.2

6.5

1427.5

153.8461538

whisper-small

INT8-CW

32

69.2

6.5

1497.2

153.8461538

whisper-large-v3-turbo

FP16

1024

175

6.6

2752.3

151.5151515

whisper-large-v3-turbo

FP16

32

119.9

6.7

2721.3

149.2537313

distil-large-v2

FP16

32

113.5

6.8

2682.2

147.0588235

distil-large-v2

FP16

1024

163.8

6.8

2714.8

147.0588235

gemma-3-270m

FP16

1024

19

6.8

1565.7

147.0588235

gemma-3-270m

FP16

32

15.6

6.8

1481

147.0588235

whisper-small

INT4-MIXED

32

69.6

6.8

1396.7

147.0588235

nanollava

INT4-MIXED

760

87.9

7.1

2898.9

140.8450704

nanollava

INT4-MIXED

1752

133.7

7.7

4601

129.8701299

nanollava

INT8-CW

760

91

7.7

2983.1

129.8701299

llama-3.2-1b-instruct

INT4-MIXED

32

13.8

8.1

1662.4

123.4567901

llama-3.2-1b-instruct

INT4-MIXED

32

13.5

8.1

1639.2

123.4567901

whisper-small

FP16

1024

114.8

8.1

1722.2

123.4567901

whisper-small

FP16

32

62.4

8.2

1691.4

121.9512195

gemma-3-1b-it

INT4-MIXED

32

22.8

8.3

1521.2

120.4819277

gemma-3-1b-it

INT4-MIXED

32

23

8.5

1646.4

117.6470588

llama-3.2-1b-instruct

INT4-MIXED

1024

43.6

8.5

1730.3

117.6470588

gemma-3-1b-it

INT4-MIXED

32

23.9

8.6

1704.5

116.2790698

llama-3.2-1b-instruct

INT4-MIXED

1024

51.2

8.6

1787.3

116.2790698

gemma-3-1b-it

INT4-MIXED

1024

48.9

8.7

1615.7

114.9425287

gemma-3-1b-it

INT4-MIXED

1024

49

8.8

1737.7

113.6363636

minicpm4-0.5b

FP4-NORMALIZED

32

14.8

8.8

1645.9

113.6363636

nanollava

INT8-CW

1752

132.5

8.8

4709

113.6363636

minicpm4-0.5b

FP4-NORMALIZED

1024

31.5

9

1704.9

111.1111111

gemma-3-1b-it

INT4-MIXED

1024

55.7

9.1

1800

109.8901099

minicpm4-0.5b

FP16

32

15.7

9.6

1717.5

104.1666667

minicpm4-0.5b

FP16

1024

32.5

9.8

1790.4

102.0408163

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

20.9

9.9

2001.7

101.010101

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

20.8

10.1

2049.5

99.00990099

qwen2.5-1.5b-instruct

INT4-MIXED

32

21.3

10.2

1729.2

98.03921569

smolvlm2-256m-video-instruct

INT8-CW

1141

220.7

10.2

2645.7

98.03921569

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

75.7

10.5

2097.6

95.23809524

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

79.8

10.5

2241.5

95.23809524

glm-edge-1.5b-chat

INT4-MIXED

32

33.3

10.5

1809.5

95.23809524

qwen2.5-1.5b-instruct

INT4-MIXED

1024

80.4

10.6

1908

94.33962264

glm-edge-1.5b-chat

INT4-MIXED

1024

116.8

11

1922.2

90.90909091

qwen2.5-1.5b-instruct

INT4-MIXED

32

25.5

11.1

1940.2

90.09009009

qwen2.5-1.5b-instruct

INT4-MIXED

1024

85.7

11.3

2143.6

88.49557522

gemma-3-1b-it

INT8-CW

32

25.5

11.6

1972.8

86.20689655

nanollava

FP16

760

94

11.7

3970.7

85.47008547

gemma-3-1b-it

INT8-CW

1024

61.1

11.9

2052.9

84.03361345

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

24.4

12.7

2399

78.74015748

llama-3.2-1b-instruct

INT8-CW

32

15.3

12.8

2121.6

78.125

nanollava

FP16

1752

140.1

12.9

6028.1

77.51937984

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

90.7

13.2

2463

75.75757576

llama-3.2-1b-instruct

INT8-CW

1024

56.7

13.2

2228

75.75757576

smolvlm2-256m-video-instruct

INT4-MIXED

1141

244.6

14.5

2797.1

68.96551724

smolvlm2-256m-video-instruct

FP16

1141

225

15.1

3104.3

66.22516556

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

24.6

15.9

2650.1

62.89308176

qwen2.5-1.5b-instruct

INT8-CW

32

25

16

2336.6

62.5

stable-zephyr-3b-dpo

INT4-MIXED

32

37.1

16

2574.8

62.5

stablelm-3b-4e1t

INT4-MIXED

32

37.3

16

2489.1

62.5

phi-2

INT4-MIXED

32

33

16.2

2475.4

61.72839506

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

75.2

16.3

2749.2

61.34969325

glm-edge-1.5b-chat

INT8-CW

32

36.1

16.5

2395.9

60.60606061

qwen2.5-1.5b-instruct

INT8-CW

1024

75.3

16.5

2433.3

60.60606061

phi-2

INT4-MIXED

32

33.1

17.2

2829.3

58.13953488

glm-edge-1.5b-chat

INT8-CW

1024

121.6

17.3

2533

57.80346821

whisper-large-v3

INT4-MIXED

32

187.9

17.3

2997.6

57.80346821

gemma-3-1b-it

FP4-NORMALIZED

32

22.1

17.7

2613.4

56.49717514

stable-zephyr-3b-dpo

INT4-MIXED

32

41.4

17.9

2822.7

55.86592179

stablelm-3b-4e1t

INT4-MIXED

1024

163

17.9

2999

55.86592179

whisper-large-v3

INT4-MIXED

1024

245

17.9

3027.1

55.86592179

gemma-3-1b-it

FP4-NORMALIZED

1024

70.8

18

2722.8

55.55555556

stable-zephyr-3b-dpo

INT4-MIXED

1024

163.3

18

3096.8

55.55555556

phi-2

INT4-MIXED

1024

170.4

18.1

3010.8

55.24861878

llama-3.2-3b-instruct

INT4-MIXED

32

25.4

18.3

2632.2

54.64480874

qwen2.5-coder-3b-instruct

INT4-MIXED

32

33

18.4

2593

54.34782609

llama-3.2-3b-instruct

INT4-MIXED

32

27.3

18.8

2686.8

53.19148936

qwen2.5-coder-3b-instruct

INT4-MIXED

32

29.5

18.8

2586.7

53.19148936

llama-3.2-3b-instruct

INT4-MIXED

32

27.6

19

2828.6

52.63157895

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

124.5

19.1

2683.7

52.35602094

phi-2

INT4-MIXED

1024

217

19.2

3241.9

52.08333333

llama-3.2-3b-instruct

INT4-MIXED

1024

126.2

19.3

2896.1

51.8134715

qwen2.5-coder-3b-instruct

INT4-MIXED

32

40.1

19.3

2859.2

51.8134715

stablelm-3b-4e1t

INT4-MIXED

32

41.7

19.4

2943.9

51.54639175

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

124.6

19.5

2711.6

51.28205128

llama-3.2-3b-instruct

INT4-MIXED

1024

128.2

19.9

2929.4

50.25125628

stable-zephyr-3b-dpo

INT4-MIXED

1024

186.3

19.9

3302.6

50.25125628

llama-3.2-3b-instruct

INT4-MIXED

1024

147.6

20

3062.1

50

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

216.2

20

2867

50

phi-3-mini-128k-instruct

INT4-MIXED

32

29

20.1

3060.5

49.75124378

phi-3-mini-4k-instruct

INT4-MIXED

32

29.2

20.1

2957.9

49.75124378

phi-3.5-mini-instruct

INT4-MIXED

32

29.5

20.1

2953.5

49.75124378

gemma-3-1b-it

FP16

32

24.8

20.4

2912.9

49.01960784

whisper-large-v3

INT8-CW

32

188.3

20.4

3603.5

49.01960784

phi-3-mini-4k-instruct

INT4-MIXED

32

29.2

20.7

3033.6

48.30917874

phi-3.5-mini-instruct

INT4-MIXED

32

29.7

20.7

3037.1

48.30917874

qwen3-30b-a3b

INT4-MIXED

32

147.7

20.7

16556.5

48.30917874

whisper-large-v3

INT8-CW

1024

245

20.7

3637.6

48.30917874

qwen3-30b-a3b

INT4-MIXED

32

148.1

20.8

16457.2

48.07692308

gemma-3-1b-it

FP16

1024

73.5

20.9

3062.3

47.84688995

phi-3-mini-128k-instruct

INT4-MIXED

32

30.7

21.1

3214.2

47.39336493

stablelm-3b-4e1t

INT4-MIXED

1024

182.5

21.1

3436.5

47.39336493

phi-3-mini-4k-instruct

INT4-MIXED

32

30.9

22

3367.2

45.45454545

phi-3.5-mini-instruct

INT4-MIXED

32

33

22.2

3370.1

45.04504505

phi-3-mini-128k-instruct

INT4-MIXED

1024

190.3

22.4

3666.8

44.64285714

qwen3-30b-a3b

INT4-MIXED

1024

553

22.4

16809.1

44.64285714

phi-3-mini-4k-instruct

INT4-MIXED

1024

190.8

22.5

3564.4

44.44444444

phi-3.5-mini-instruct

INT4-MIXED

1024

189.8

22.5

3558.3

44.44444444

qwen3-30b-a3b

INT4-MIXED

1024

554.7

22.5

16688.5

44.44444444

internvl2-4b

INT4-MIXED

297

124.2

22.9

4369.1

43.66812227

qwen3-4b

INT4-MIXED

32

36.8

22.9

3065.8

43.66812227

phi-3-mini-4k-instruct

INT4-MIXED

1024

196.2

23.1

3613.7

43.29004329

phi-3.5-mini-instruct

INT4-MIXED

1024

195.4

23.1

3600.9

43.29004329

internvl2-4b

INT4-MIXED

297

125.8

23.2

4420.1

43.10344828

phi-4-mini-instruct

INT4-MIXED

32

41

23.2

3159.8

43.10344828

phi-4-mini-reasoning

INT4-MIXED

32

41.1

23.2

3063.7

43.10344828

llama-3.2-1b-instruct

FP16

32

25.3

23.4

3257.7

42.73504274

phi-3-mini-128k-instruct

INT4-MIXED

1024

252.1

23.5

3736.6

42.55319149

qwen3-4b

INT4-MIXED

32

36.8

23.5

3237.2

42.55319149

llama-3.2-1b-instruct

FP16

1024

78.2

23.8

3416.3

42.01680672

phi-4-mini-instruct

INT4-MIXED

32

41.1

23.8

3231.5

42.01680672

phi-4-mini-reasoning

INT4-MIXED

32

41.4

23.8

3143.8

42.01680672

phi-4-mini-reasoning

INT4-MIXED

32

42.7

24.1

3298

41.49377593

afm-4.5b

INT4-MIXED

32

32.1

24.2

3702.7

41.32231405

phi-3-mini-4k-instruct

INT4-MIXED

1024

271.3

24.3

3802

41.15226337

phi-4-mini-instruct

INT4-MIXED

1024

191.9

24.3

3473.7

41.15226337

phi-4-mini-reasoning

INT4-MIXED

1024

193.1

24.3

3376.6

41.15226337

qwen3-4b

INT4-MIXED

1024

183.6

24.4

3402.5

40.98360656

phi-3.5-mini-instruct

INT4-MIXED

1024

291.4

24.5

3819.3

40.81632653

phi-4-mini-instruct

INT4-MIXED

32

45.5

24.9

3450.1

40.16064257

phi-4-mini-instruct

INT4-MIXED

1024

196.8

24.9

3526.9

40.16064257

phi-4-mini-reasoning

INT4-MIXED

1024

197.5

24.9

3440.4

40.16064257

afm-4.5b

INT4-MIXED

1024

200

25

3803.5

40

gemma-3-4b-it

INT4-MIXED

32

50.8

25

4448.2

40

phi-3.5-vision-instruct

INT4-MIXED

802

270.2

25

5425.9

40

qwen3-4b

INT4-MIXED

1024

186.9

25.1

3554

39.84063745

glm-edge-4b-chat

INT4-MIXED

32

58.9

25.2

3470.7

39.68253968

phi-4-mini-reasoning

INT4-MIXED

1024

217.7

25.2

3551.2

39.68253968

internvl2-4b

INT4-MIXED

1027

319

25.5

5736.7

39.21568627

gemma-3-4b-it

INT4-MIXED

32

52.3

25.7

4631.8

38.91050584

phi-3.5-vision-instruct

INT4-MIXED

1032

331.7

25.7

5981.5

38.91050584

internvl2-4b

INT4-MIXED

1027

333.2

25.8

5794.7

38.75968992

gemma-3-4b-it

INT4-MIXED

32

51.5

25.9

4690.2

38.61003861

phi-4-mini-instruct

INT4-MIXED

1024

253.5

26

3653.5

38.46153846

gemma-3-4b-it

INT4-MIXED

1024

246.5

26.5

6784.8

37.73584906

glm-edge-4b-chat

INT4-MIXED

1024

318

26.6

3781.7

37.59398496

gpt-oss-20b

INT4-MIXED

32

124.9

26.6

13007.7

37.59398496

gemma-3-4b-it

INT4-MIXED

1024

253.5

27.1

6959.3

36.900369

phi-2

INT8-CW

32

40.6

27.3

3773.7

36.63003663

gemma-3-4b-it

INT4-MIXED

1024

269.3

27.4

7015

36.49635036

gpt-oss-20b

INT4-MIXED

1024

401.4

27.5

13243.6

36.36363636

gpt-oss-20b

INT4-MIXED

32

130.6

27.6

12064

36.23188406

stable-zephyr-3b-dpo

INT8-CW

32

40.3

27.6

3802.1

36.23188406

stablelm-3b-4e1t

INT8-CW

32

39.6

27.6

3703.5

36.23188406

deepseek-r1-distill-qwen-1.5b

FP4-NORMALIZED

32

30.6

27.7

3850

36.10108303

qwen2.5-1.5b-instruct

FP4-NORMALIZED

32

31

27.7

3531.7

36.10108303

deepseek-r1-distill-qwen-1.5b

FP4-NORMALIZED

1024

89.5

28.1

3927

35.58718861

qwen2.5-1.5b-instruct

FP4-NORMALIZED

1024

90

28.1

3615.4

35.58718861

gpt-oss-20b

INT4-MIXED

1024

388.2

28.6

12273.9

34.96503497

phi-4-multimodal-instruct

INT4-MIXED

578

270

28.7

5685.6

34.84320557

glm-edge-1.5b-chat

FP16

32

36

29.1

3772.5

34.36426117

phi-4-multimodal-instruct

INT4-MIXED

786

322.4

29.1

6599.5

34.36426117

phi-2

INT8-CW

1024

183.1

29.2

4289.2

34.24657534

stable-zephyr-3b-dpo

INT8-CW

1024

173.9

29.4

4292.4

34.01360544

stablelm-3b-4e1t

INT8-CW

1024

173.5

29.4

4195.2

34.01360544

minicpm3-4b

INT4-MIXED

32

107.4

29.5

3501.2

33.89830508

glm-edge-1.5b-chat

FP16

1024

128

29.7

3989.8

33.67003367

chatglm3-6b

INT4-MIXED

32

34.4

29.9

4072.9

33.44481605

deepseek-r1-distill-qwen-1.5b

FP16

32

32.2

29.9

4315.2

33.44481605

flan-t5-xxl

INT4-MIXED

33

41.3

29.9

12873.3

33.44481605

qwen2.5-1.5b-instruct

FP16

32

35.5

29.9

3759.1

33.44481605

phi-4-multimodal-instruct

INT4-MIXED

1362

654.6

30

8108.2

33.33333333

deepseek-r1-distill-qwen-1.5b

FP16

1024

92.4

30.3

4450.4

33.00330033

phi-4-multimodal-instruct

INT4-MIXED

1570

728.1

30.3

9127.2

33.00330033

qwen2.5-1.5b-instruct

FP16

1024

92

30.3

3889.5

33.00330033

whisper-large-v3

FP16

1024

237.2

30.3

4883.3

33.00330033

whisper-large-v3

FP16

32

181.4

30.4

4856

32.89473684

minicpm3-4b

INT4-MIXED

32

109.4

30.5

3687.3

32.78688525

qwen2.5-coder-3b-instruct

INT8-CW

32

36.6

30.8

3924.5

32.46753247

chatglm3-6b

INT4-MIXED

1024

227.7

30.9

4254.3

32.36245955

minicpm3-4b

INT4-MIXED

32

113.1

31

3810.8

32.25806452

chatglm3-6b

INT4-MIXED

32

37.3

31.2

4421.1

32.05128205

qwen2.5-coder-3b-instruct

INT8-CW

1024

143.3

31.6

4039

31.64556962

chatglm3-6b

INT4-MIXED

1024

269.8

32.2

4519.8

31.05590062

llama-3.2-3b-instruct

INT8-CW

32

36.4

32.5

4080.6

30.76923077

llama-2-7b-chat-hf

INT4-MIXED

32

37.9

33.1

4451.3

30.21148036

qwen3-vl-4b-thinking

INT4-MIXED

4909

4168.5

33.5

13927

29.85074627

qwen3-vl-4b-thinking

INT4-MIXED

4939

4080.5

33.5

17375.5

29.85074627

llama-3.2-3b-instruct

INT8-CW

1024

141.1

33.6

4306.7

29.76190476

phi-4-multimodal-instruct

INT4-MIXED

578

313.9

33.9

6945.6

29.49852507

llama-2-7b-chat-hf

INT4-MIXED

32

40.6

34.1

4709

29.3255132

phi-4-multimodal-instruct

INT4-MIXED

786

379.9

34.1

6802.1

29.3255132

qwen3-vl-4b-thinking

INT4-MIXED

4909

4194.1

34.1

14034

29.3255132

qwen3-vl-4b-thinking

INT4-MIXED

4939

4070.9

34.2

17418.8

29.23976608

qwen3-vl-4b-thinking

INT4-MIXED

4909

4329.1

34.4

14213.5

29.06976744

qwen3-vl-4b-thinking

INT4-MIXED

4939

4215.9

34.5

17392.9

28.98550725

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

35.1

34.6

6692.8

28.9017341

minicpm3-4b

INT4-MIXED

1024

474.2

34.7

4493.6

28.8184438

biomistral-7b-slerp

INT4-MIXED

7

37

34.8

4474.5

28.73563218

ltx-video

INT4-MIXED

11

35.6

34.8

6461.5

28.73563218

falcon-7b-instruct

INT4-MIXED

32

40.5

34.9

4289.3

28.65329513

mistral-7b-instruct-v0.2

INT4-MIXED

32

40.9

34.9

4486.1

28.65329513

mistral-7b-instruct-v0.3

INT4-MIXED

32

40.8

34.9

4493

28.65329513

phi-4-multimodal-instruct

INT4-MIXED

1362

751.1

35.2

8514.6

28.40909091

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

35.8

35.4

6838.1

28.24858757

phi-4-multimodal-instruct

INT4-MIXED

1570

838.2

35.5

9375.3

28.16901408

minicpm3-4b

INT4-MIXED

1024

484.5

35.7

4611.2

28.01120448

llama-2-7b-chat-hf

INT4-MIXED

1024

249.9

35.8

5060.5

27.93296089

falcon-7b-instruct

INT4-MIXED

1024

276.7

35.9

4436.8

27.8551532

mistral-7b-instruct-v0.2

INT4-MIXED

32

44.1

36

4662

27.77777778

mistral-7b-instruct-v0.3

INT4-MIXED

32

43.8

36

4663.2

27.77777778

minicpm3-4b

INT4-MIXED

1024

517.5

36.1

4680.7

27.70083102

mistral-7b-instruct-v0.2

INT4-MIXED

1024

261.5

36.1

4789.7

27.70083102

mistral-7b-instruct-v0.3

INT4-MIXED

1024

262.6

36.2

4797.9

27.62430939

biomistral-7b-slerp

INT4-MIXED

7

38.8

36.4

4864.7

27.47252747

mistral-7b-instruct-v0.1

INT4-MIXED

32

44.9

36.4

4782.3

27.47252747

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

42.4

36.5

5028.1

27.39726027

mistral-7b-instruct-v0.2

INT4-MIXED

32

44.8

36.5

4780.7

27.39726027

mistral-7b-instruct-v0.3

INT4-MIXED

32

44.9

36.5

4875

27.39726027

qwen2.5-7b-instruct

INT4-MIXED

32

42.6

36.5

5023.5

27.39726027

qwen2.5-7b-instruct-1m

INT4-MIXED

32

42.8

36.5

5027.1

27.39726027

llama-2-7b-chat-hf

INT4-MIXED

1024

259.1

36.8

5254.1

27.17391304

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

257.1

37.3

5232.8

26.80965147

mistral-7b-instruct-v0.2

INT4-MIXED

1024

272

37.3

4877.9

26.80965147

mistral-7b-instruct-v0.3

INT4-MIXED

1024

271.6

37.3

4885.3

26.80965147

qwen2.5-7b-instruct

INT4-MIXED

1024

257.9

37.4

5227.9

26.73796791

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

258.2

37.4

5231.1

26.73796791

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

44.8

37.5

5267.4

26.66666667

qwen2-7b-instruct

INT4-MIXED

32

44.7

37.5

5184

26.66666667

minicpm-v-2_6

INT4-MIXED

228

326.3

37.6

6198.1

26.59574468

qwen2.5-7b-instruct

INT4-MIXED

32

44.9

37.6

5202

26.59574468

minicpm-o-2_6

INT4-MIXED

238

331.7

37.7

6299.7

26.52519894

mistral-7b-instruct-v0.1

INT4-MIXED

1025

348.9

37.7

4944.1

26.52519894

mistral-7b-instruct-v0.2

INT4-MIXED

1024

319.2

37.8

4954.6

26.45502646

qwen2-7b-instruct

INT4-MIXED

32

46

37.9

5318.8

26.38522427

mistral-7b-instruct-v0.3

INT4-MIXED

1024

319.3

38

5051

26.31578947

qwen2.5-7b-instruct

INT4-MIXED

32

46

38

5315.1

26.31578947

qwen2.5-7b-instruct-1m

INT4-MIXED

32

46.2

38

5415

26.31578947

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

46

38.1

5426.5

26.24671916

falcon-7b-instruct

INT4-MIXED

32

47.1

38.1

4972.9

26.24671916

phi-3-mini-4k-instruct

INT8-CW

32

43.4

38.1

4782.8

26.24671916

phi-3.5-mini-instruct

INT8-CW

32

43.1

38.2

4685.1

26.17801047

ltx-video

INT8-CW

11

38.7

38.3

9352.3

26.10966057

phi-3-mini-128k-instruct

INT8-CW

32

43.2

38.3

4774.2

26.10966057

deepseek-r1-distill-llama-8b

INT4-MIXED

32

44.3

38.4

5346.5

26.04166667

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

263.8

38.4

5405.5

26.04166667

llama-3-8b-instruct

INT4-MIXED

32

44.3

38.4

5260.9

26.04166667

llama-3.1-8b-instruct

INT4-MIXED

32

44.1

38.4

5260.9

26.04166667

qwen2-7b-instruct

INT4-MIXED

1024

266

38.4

5323.5

26.04166667

qwen2.5-7b-instruct

INT4-MIXED

1024

266.4

38.5

5339

25.97402597

minicpm-v-2_6

INT4-MIXED

228

331.5

38.6

6353.7

25.90673575

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

319.3

38.9

5493.6

25.70694087

qwen2-7b-instruct

INT4-MIXED

1024

322.4

38.9

5402.6

25.70694087

qwen2.5-7b-instruct

INT4-MIXED

1024

325

39

5406.9

25.64102564

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

323.5

39

5498.1

25.64102564

falcon-7b-instruct

INT4-MIXED

1024

349

39.1

4894.4

25.57544757

minicpm-v-2_6

INT4-MIXED

228

339.8

39.1

6582.6

25.57544757

minicpm4-8b

INT4-MIXED

32

48.3

39.1

5069

25.57544757

qwen3-4b

INT8-CW

32

45.1

39.1

4893.9

25.57544757

lcm-dreamshaper-v7

INT8-HYBRID

32

41.4

39.2

3706.3

25.51020408

minicpm-o-2_6

INT4-MIXED

238

347.3

39.2

6570.2

25.51020408

phi-4-mini-reasoning

INT8-CW

32

44.6

39.2

4704.2

25.51020408

lcm-dreamshaper-v7

INT8-HYBRID

1024

40.7

39.3

3439.2

25.44529262

phi-4-mini-instruct

INT8-CW

32

44.9

39.3

4707.5

25.44529262

deepseek-r1-distill-llama-8b

INT4-MIXED

32

47.5

39.5

5420.7

25.3164557

llama-3-8b-instruct

INT4-MIXED

32

47.2

39.5

5427.9

25.3164557

llama-3-8b-instruct

INT4-MIXED

32

47.3

39.5

5429.2

25.3164557

llama-3.1-8b-instruct

INT4-MIXED

32

47.4

39.5

5427.2

25.3164557

qwen3-8b

INT4-MIXED

32

45.9

39.5

5449.1

25.3164557

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

265.1

39.6

5652

25.25252525

gemma-3-4b-it

INT8-CW

32

54.8

39.6

6046.7

25.25252525

llama-3-8b-instruct

INT4-MIXED

1024

264.9

39.6

5564.9

25.25252525

llama-3.1-8b-instruct

INT4-MIXED

1024

264.9

39.7

5565.6

25.18891688

llama-3-8b-instruct

INT4-MIXED

32

48.3

40

5560.4

25

minicpm4-8b

INT4-MIXED

1024

292.6

40

5233.2

25

internvl2-4b

INT8-CW

297

129.7

40.1

5828.4

24.93765586

minicpm4-8b

INT4-MIXED

32

51.4

40.3

5349.5

24.81389578

phi-4-mini-reasoning

INT8-CW

1024

211.7

40.3

4965.1

24.81389578

phi-4-mini-instruct

INT8-CW

1024

210.6

40.4

4972.9

24.75247525

phi-3-mini-128k-instruct

INT8-CW

1024

244.4

40.6

5262

24.63054187

phi-3-mini-4k-instruct

INT8-CW

1024

244.3

40.6

5263.6

24.63054187

phi-3.5-mini-instruct

INT8-CW

1024

243.8

40.6

5164.6

24.63054187

qwen3-4b

INT8-CW

1024

201.3

40.6

5172.7

24.63054187

qwen3-8b

INT4-MIXED

32

48.8

40.6

5687.1

24.63054187

llama-3-8b-instruct

INT4-MIXED

1024

276.5

40.7

5675.4

24.57002457

minicpm-v-4_5

INT4-MIXED

217

336.3

40.7

6746.5

24.57002457

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

279.1

40.8

5671.3

24.50980392

llama-3-8b-instruct

INT4-MIXED

1024

275.5

40.8

5674.5

24.50980392

llama-3.1-8b-instruct

INT4-MIXED

1024

274.9

40.8

5675.1

24.50980392

minicpm4-8b

INT4-MIXED

32

52.5

40.8

5485.8

24.50980392

qwen3-8b

INT4-MIXED

1024

279.6

40.9

5758

24.44987775

afm-4.5b

INT8-CW

32

45.9

41.1

5320

24.33090024

gemma-3-4b-it

INT8-CW

1024

261.3

41.1

8383.7

24.33090024

qwen3-8b

INT4-MIXED

32

49.7

41.1

5830.2

24.33090024

llama-3-8b-instruct

INT4-MIXED

1024

322.2

41.3

5734

24.21307506

minicpm4-8b

INT4-MIXED

1024

309.2

41.3

5421.1

24.21307506

minicpm4-8b

INT4-MIXED

1024

351.9

41.8

5537.2

23.92344498

qwen2.5-vl-7b-instruct

INT4-MIXED

32

109.5

41.8

6170.8

23.92344498

phi-3.5-vision-instruct

INT8-CW

802

267.3

41.9

6844.5

23.86634845

afm-4.5b

INT8-CW

1024

192.3

42

5467.2

23.80952381

qwen3-8b

INT4-MIXED

1024

288.5

42.1

5963.4

23.75296912

minicpm-v-4_5

INT4-MIXED

217

350

42.5

7096.6

23.52941176

internvl2-4b

INT8-CW

1027

335.1

42.7

6893

23.41920375

qwen3-8b

INT4-MIXED

1024

332.8

42.7

6016.9

23.41920375

phi-3.5-vision-instruct

INT8-CW

1032

323.6

42.8

7183.6

23.36448598

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

388.2

42.9

7458.9

23.31002331

qwen2.5-vl-7b-instruct

INT4-MIXED

32

112.8

43.6

6350.4

22.93577982

glm-edge-4b-chat

INT8-CW

32

61.5

44

5182.5

22.72727273

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

397.2

44

7580.2

22.72727273

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

451.4

44.5

7653.3

22.47191011

gemma-7b-it

INT4-MIXED

32

51.9

44.6

5585.2

22.42152466

qwen2.5-vl-7b-instruct

INT4-MIXED

32

114

44.7

6467.7

22.37136465

glm-4-9b-chat-hf

INT4-MIXED

32

51.1

45

5958.7

22.22222222

glm-edge-4b-chat

INT8-CW

1024

308.7

45.2

5437.9

22.12389381

glm-4-9b-chat-hf

INT4-MIXED

32

54.1

46.1

6282.4

21.69197397

glm-4-9b-chat-hf

INT4-MIXED

1024

328.4

46.2

6147.9

21.64502165

gemma-7b-it

INT4-MIXED

32

55.1

46.3

5930.1

21.59827214

glm-4-9b-chat-hf

INT4-MIXED

32

55.3

46.7

6433.4

21.41327623

gemma-7b-it

INT4-MIXED

1024

323.7

46.8

6157.6

21.36752137

phi-4-multimodal-instruct

INT8-CW

578

287.5

47.1

7480

21.23142251

llama-3.1-8b-instruct

INT4-MIXED

32

60.4

47.4

6392.7

21.09704641

phi-4-multimodal-instruct

INT8-CW

786

351.7

47.4

8414.2

21.09704641

deepseek-r1-distill-llama-8b

INT4-MIXED

32

62

47.5

6495.6

21.05263158

glm-4-9b-chat-hf

INT4-MIXED

1024

345.5

47.5

6387.3

21.05263158

lcm-dreamshaper-v7

INT8-CW

1024

48.7

47.5

3673

21.05263158

lcm-dreamshaper-v7

INT8-CW

32

48.6

47.5

5312.1

21.05263158

minicpm3-4b

INT8-CW

32

117.8

47.6

5462.8

21.00840336

lcm-dreamshaper-v7

INT8-CW

1024

49.1

47.8

3620.2

20.92050209

lcm-dreamshaper-v7

INT8-CW

32

49

47.9

4465.1

20.87682672

glm-4-9b-chat-hf

INT4-MIXED

1024

396.1

48

6482.5

20.83333333

phi-4-multimodal-instruct

INT8-CW

1362

694.2

48.3

9880.1

20.70393375

gemma-7b-it

INT4-MIXED

1024

388.2

48.6

6442.1

20.57613169

phi-4-multimodal-instruct

INT8-CW

1570

767.3

48.6

10504.4

20.57613169

llama-3.1-8b-instruct

INT4-MIXED

1024

367.5

48.7

6513

20.5338809

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

369.5

48.8

6609.4

20.49180328

lcm-dreamshaper-v7

FP16

32

50

49.2

4330.4

20.32520325

lcm-dreamshaper-v7

FP16

1024

50

49.3

4019.3

20.28397566

qwen3-vl-4b-thinking

INT8-CW

4909

4313.6

49.4

15766.1

20.24291498

qwen3-vl-4b-thinking

INT8-CW

4939

4201.1

49.6

18495.2

20.16129032

llava-next-video-7b-hf

INT4-MIXED

2945

1895.1

50

10084.3

20

gemma-2-9b-it

INT4-MIXED

32

58.3

50.1

6008.3

19.96007984

phi-2

FP16

32

54.5

51

6524.6

19.60784314

gemma-2-9b-it

INT4-MIXED

32

61.5

51.5

6236.1

19.41747573

stable-zephyr-3b-dpo

FP16

32

55.3

51.5

6550.7

19.41747573

gemma-2-9b-it

INT4-MIXED

32

63

52.1

6404.7

19.19385797

gemma-2-9b-it

INT4-MIXED

1024

385.3

52.4

6502.5

19.08396947

minicpm3-4b

INT8-CW

1024

505.7

52.7

6396.2

18.97533207

flan-t5-xxl

INT8-CW

33

66.5

53.1

22610.6

18.83239171

stable-diffusion-xl-1.0-inpainting-0.1

FP16

32

53.9

53.4

9291.6

18.72659176

gemma-2-9b-it

INT4-MIXED

1024

394.4

53.9

6735.1

18.5528757

phi-2

FP16

1024

225.1

54

7164.3

18.51851852

gemma-2-9b-it

INT4-MIXED

1024

462.9

54.5

6823.8

18.34862385

stable-zephyr-3b-dpo

FP16

1024

215.6

54.5

7201.8

18.34862385

chatglm3-6b

INT8-CW

32

60

55.3

6896.8

18.08318264

ltx-video

FP16

11

55.9

55.4

15108.8

18.05054152

chatglm3-6b

INT8-CW

1024

265.9

56.2

7061.8

17.79359431

llama-3.2-3b-instruct

FP4-NORMALIZED

32

59.2

56.3

6748.8

17.76198934

llama-3.2-3b-instruct

FP4-NORMALIZED

1024

198.1

57.2

6975.7

17.48251748

qwen2.5-coder-3b-instruct

FP16

32

61.3

57.6

6827

17.36111111

qwen2.5-coder-3b-instruct

FP16

1024

183.6

58.1

6986.4

17.21170396

llama-3.2-3b-instruct

FP16

32

62.8

60

7154.3

16.66666667

llama-3.2-3b-instruct

FP16

1024

202

61.5

7442.8

16.2601626

llama-2-13b-chat-hf

INT4-MIXED

32

69.6

61.8

7747.8

16.18122977

llama-2-7b-chat-hf

INT8-CW

32

67.7

61.9

7640.5

16.15508885

gemma-3-12b-it

INT4-MIXED

32

87

64

9044.4

15.625

falcon-7b-instruct

INT8-CW

32

70.7

64.5

7541.9

15.50387597

llama-2-7b-chat-hf

INT8-CW

1024

286.7

64.8

8217.8

15.43209877

falcon-7b-instruct

INT8-CW

1024

407.6

65.5

7677.6

15.26717557

biomistral-7b-slerp

INT8-CW

7

68.2

65.7

7809.9

15.22070015

mistral-7b-instruct-v0.2

INT8-CW

32

72.9

65.9

7821.3

15.17450683

mistral-7b-instruct-v0.3

INT8-CW

32

72.3

65.9

7913

15.17450683

llama-2-13b-chat-hf

INT4-MIXED

1024

482.8

66

8665

15.15151515

mistral-7b-instruct-v0.1

INT8-CW

32

72.4

66

7820.7

15.15151515

phi-4-mini-reasoning

FP4-NORMALIZED

32

71.6

66.2

7657.3

15.10574018

deepseek-r1-distill-qwen-7b

INT8-CW

32

72.7

66.5

8231.8

15.03759398

phi-4-mini-instruct

FP4-NORMALIZED

32

71.3

66.5

7760.4

15.03759398

qwen2-7b-instruct

INT8-CW

32

72.5

66.6

8145.1

15.01501502

qwen2.5-7b-instruct

INT8-CW

32

72.8

66.6

8143.3

15.01501502

qwen2.5-7b-instruct-1m

INT8-CW

32

73.1

66.6

8234.3

15.01501502

gemma-3-12b-it

INT4-MIXED

32

86.5

66.8

9363.2

14.97005988

deepseek-r1-distill-qwen-7b

INT8-CW

1024

298.9

67.3

8434.2

14.85884101

mistral-7b-instruct-v0.1

INT8-CW

1025

336.1

67.3

8106.1

14.85884101

mistral-7b-instruct-v0.2

INT8-CW

1024

306

67.3

8115.6

14.85884101

mistral-7b-instruct-v0.3

INT8-CW

1024

305.1

67.3

8211.7

14.85884101

phi-4-mini-reasoning

FP4-NORMALIZED

1024

273.8

67.3

7918

14.85884101

qwen2-7b-instruct

INT8-CW

1024

300.8

67.4

8338.2

14.83679525

qwen2.5-7b-instruct

INT8-CW

1024

300.9

67.4

8342

14.83679525

qwen2.5-7b-instruct-1m

INT8-CW

1024

302.4

67.4

8435.1

14.83679525

minicpm-v-2_6

INT8-CW

228

344.6

67.5

9379.3

14.81481481

gemma-3-4b-it

FP4-NORMALIZED

32

74.8

67.6

9047.3

14.79289941

phi-4-mini-instruct

FP4-NORMALIZED

1024

272.8

67.6

8012.8

14.79289941

minicpm-o-2_6

INT8-CW

238

353.7

67.7

9391.9

14.77104874

gemma-3-12b-it

INT4-MIXED

1024

589.2

67.8

11696.5

14.74926254

phi-4

INT4-MIXED

32

79.1

68.6

8458.7

14.57725948

gemma-3-4b-it

FP4-NORMALIZED

1024

314.3

69

11387

14.49275362

phi-4-reasoning

INT4-MIXED

32

78.9

69

8458.2

14.49275362

llama-3-8b-instruct

INT8-CW

32

76.2

69.4

8595

14.4092219

llama-3.1-8b-instruct

INT8-CW

32

76.4

69.4

8594.5

14.4092219

deepseek-r1-distill-llama-8b

INT8-CW

32

76.7

69.5

8696.7

14.38848921

phi-3-mini-128k-instruct

FP16

32

73.6

69.5

8556.8

14.38848921

phi-3-mini-4k-instruct

FP16

32

72.9

69.5

8544.3

14.38848921

phi-3.5-mini-instruct

FP16

32

72.9

69.5

8467.2

14.38848921

qwen1.5-14b-chat

INT4-MIXED

32

80

69.8

9218.9

14.32664756

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

79.2

70

8788.4

14.28571429

internvl2-4b

FP4-NORMALIZED

297

167.4

70.4

9063.7

14.20454545

phi-4

INT4-MIXED

1024

549

70.5

8931.9

14.18439716

qwen3-8b

INT8-CW

32

77.8

70.5

8869.1

14.18439716

gemma-3-12b-it

INT4-MIXED

1024

656.4

70.6

12009.2

14.16430595

phi-4-reasoning

INT4-MIXED

1024

541.6

70.7

8930.9

14.14427157

phi-4-reasoning

INT4-MIXED

32

83.7

70.7

8875.2

14.14427157

deepseek-r1-distill-llama-8b

INT8-CW

1024

309

70.8

8984.3

14.12429379

llama-3-8b-instruct

INT8-CW

1024

307.7

70.8

8892.2

14.12429379

llama-3.1-8b-instruct

INT8-CW

1024

308.3

70.9

8891.9

14.10437236

stable-diffusion-v1-5

INT8-HYBRID

1024

72.1

70.9

3253.1

14.10437236

stable-diffusion-v1-5

INT8-HYBRID

32

72.2

70.9

3771.1

14.10437236

internvl2-4b

FP16

297

173.8

71.4

9576.9

14.00560224

llava-v1.6-mistral-7b-hf

INT8-CW

2944

1595.6

71.6

10146.1

13.96648045

minicpm-v-4_5

INT8-CW

217

357.6

71.6

10045.4

13.96648045

phi-4

INT4-MIXED

32

85.7

71.8

9104.9

13.9275766

qwen3-8b

INT8-CW

1024

323

71.9

9158.6

13.90820584

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

518.4

72

9179.6

13.88888889

phi-4-mini-instruct

FP16

32

76.3

72

8385.9

13.88888889

phi-4-mini-reasoning

FP16

32

76.5

72

8391.2

13.88888889

qwen2.5-vl-7b-instruct

INT8-CW

32

141.5

72.1

9374.7

13.86962552

phi-4-reasoning

INT4-MIXED

1024

572.3

72.5

9243.9

13.79310345

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

85.5

72.8

9437

13.73626374

internvl2-4b

FP4-NORMALIZED

1027

445.7

73

9942

13.69863014

phi-3.5-vision-instruct

FP16

802

353.1

73

10393

13.69863014

phi-3-mini-4k-instruct

FP16

1024

286.1

73.1

9309.7

13.67989056

qwen2.5-vl-7b-instruct

INT8-CW

1024

429.7

73.1

10679

13.67989056

phi-3-mini-128k-instruct

FP16

1024

286.4

73.2

9319.8

13.66120219

phi-3.5-mini-instruct

FP16

1024

286.6

73.2

9223.8

13.66120219

gemma-3-4b-it

FP16

32

80.5

73.5

10751.3

13.60544218

phi-4-mini-instruct

FP16

1024

282

73.5

8702.3

13.60544218

phi-4-mini-reasoning

FP16

1024

282.1

73.6

8705.3

13.58695652

phi-3.5-vision-instruct

FP16

1032

452.4

73.7

10608.3

13.56852103

phi-4

INT4-MIXED

1024

662.1

73.9

9345.8

13.53179973

internvl2-4b

FP16

1027

454.3

74

10432.5

13.51351351

qwen1.5-14b-chat

INT4-MIXED

1024

586.1

74.1

10122.9

13.49527665

phi-4-reasoning

INT4-MIXED

32

98.2

74.7

9678.3

13.38688086

minicpm4-8b

INT8-CW

32

82.3

74.9

8781.4

13.35113485

gemma-3-4b-it

FP16

1024

326.7

75

13114.3

13.33333333

qwen3-4b

FP16

32

79.5

75

8739.7

13.33333333

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

636.5

75.1

9619.6

13.31557923

minicpm4-8b

INT8-CW

1024

371.1

75.8

8939.7

13.19261214

sd-turbo

INT8-HYBRID

1024

77.1

75.9

4010.9

13.17523057

sd-turbo

INT8-HYBRID

32

76.9

76

4009.9

13.15789474

stable-diffusion-v2-1

INT8-HYBRID

1024

77

76.2

4015.7

13.12335958

stable-diffusion-v2-1

INT8-HYBRID

32

77.2

76.2

4058.2

13.12335958

phi-4-reasoning

INT4-MIXED

1024

733.4

76.7

9641.7

13.03780965

llava-next-video-7b-hf

INT8-CW

2945

1899.1

77

12181.7

12.98701299

qwen3-4b

FP16

1024

267.9

77

9093

12.98701299

llama-2-13b-chat-hf

INT4-MIXED

32

94

77.3

9332.7

12.93661061

starcoder2-15b

INT4-MIXED

32

91

77.8

9563.4

12.85347044

glm-edge-4b-chat

FP16

32

85.6

78.7

9338.3

12.7064803

starcoder2-15b

INT4-MIXED

1024

733.3

79.6

9486.1

12.56281407

afm-4.5b

FP16

32

83.7

80.1

9812.3

12.48439451

glm-edge-4b-chat

FP16

1024

363.1

80.3

9714

12.45330012

gemma-7b-it

INT8-CW

32

89.2

80.8

9375.9

12.37623762

glm-4-9b-chat-hf

INT8-CW

32

89.2

81

9974.2

12.34567901

afm-4.5b

FP16

1024

268.8

81.1

10050.4

12.33045623

llama-2-13b-chat-hf

INT4-MIXED

1024

578.1

81.6

10212.6

12.25490196

minicpm3-4b

FP4-NORMALIZED

32

103

81.6

9069.4

12.25490196

phi-4-multimodal-instruct

FP16

578

367.4

82.6

12458.2

12.10653753

glm-4-9b-chat-hf

INT8-CW

1024

390.1

82.9

10134.7

12.06272618

phi-4-multimodal-instruct

FP16

786

421

82.9

13020.7

12.06272618

gemma-7b-it

INT8-CW

1024

394.5

83.1

9944.3

12.03369434

stable-diffusion-v1-5

INT8-CW

32

84.5

83.1

5908.8

12.03369434

stable-diffusion-v1-5

INT8-CW

1024

84.3

83.3

3610.4

12.00480192

minicpm3-4b

FP16

32

103.9

83.5

9542

11.9760479

stable-diffusion-v1-5

INT8-CW

1024

84.8

83.7

3775.1

11.9474313

stable-diffusion-v1-5

INT8-CW

32

84.9

83.8

6709.7

11.93317422

phi-4-multimodal-instruct

FP16

1362

855

84

14975.4

11.9047619

phi-4-multimodal-instruct

FP16

1570

941.1

84.3

15668.7

11.8623962

qwen3-vl-4b-thinking

FP16

4909

4934.8

84.6

20141.7

11.82033097

qwen3-vl-4b-thinking

FP16

4939

4830.7

84.6

22849.6

11.82033097

stable-diffusion-v1-5

FP16

1024

85.5

84.9

4093.6

11.77856302

stable-diffusion-v1-5

FP16

32

85.9

85

4604.6

11.76470588

stable-diffusion-v2-1

INT8-CW

1024

86.6

85.7

4952.6

11.66861144

sd-turbo

INT8-CW

32

86.4

85.8

5353.3

11.65501166

sd-turbo

INT8-CW

1024

86.4

85.9

4917.4

11.64144354

stable-diffusion-v2-1

INT8-CW

32

86.5

85.9

5455

11.64144354

sd-turbo

INT8-CW

32

87

86.3

5208.7

11.58748552

stable-diffusion-v2-1

INT8-CW

32

87.3

86.4

5642

11.57407407

minicpm3-4b

FP4-NORMALIZED

1024

562.5

86.5

9963.3

11.56069364

sd-turbo

INT8-CW

1024

87.3

86.7

5007.1

11.53402537

stable-diffusion-v2-1

INT8-CW

1024

86.9

86.7

5251.7

11.53402537

stable-diffusion-v2-1

FP16

1024

87.6

87.1

5429.2

11.48105626

stable-diffusion-v2-1

FP16

32

87.9

87.1

5424.5

11.48105626

sd-turbo

FP16

32

87.3

87.2

5470.6

11.46788991

sd-turbo

FP16

1024

87.4

87.7

5476

11.40250855

gemma-2-9b-it

INT8-CW

32

97.1

88.2

10079.2

11.33786848

gemma-2-9b-it

INT8-CW

1024

445.2

90.5

10539.3

11.04972376

minicpm3-4b

FP16

1024

571.4

90.7

11081.7

11.02535832

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1