Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2026.0, 26 February 2026.

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

minicpm4-0.5b

INT4-MIXED

32

16.5

4.3

1012.4

232.56

minicpm4-0.5b

INT4-MIXED

32

17

4.4

1143.7

227.27

minicpm4-0.5b

INT4-MIXED

32

16.6

4.4

1025.8

227.27

minicpm4-0.5b

INT4-MIXED

1024

39.6

4.5

1065.7

222.22

minicpm4-0.5b

INT4-MIXED

1024

56.5

4.6

1201.1

217.39

gemma-3-270m

INT4-MIXED

1024

28.6

4.8

1116.8

208.33

minicpm4-0.5b

INT4-MIXED

1024

29

4.8

1059.7

208.33

gemma-3-270m

INT4-MIXED

32

17.8

4.8

1072.3

208.33

gemma-3-270m

INT8-CW

1024

27.2

4.8

1148.3

208.33

gemma-3-270m

INT8-CW

32

16.9

4.8

1091.9

208.33

distil-large-v2

INT4-MIXED

prompt0

122.3

4.9

1560.7

204.08

distil-large-v2

INT4-MIXED

prompt1

178.4

5

1592.4

200.00

distil-large-v2

INT8-CW

prompt1

175

5.1

1847

196.08

distil-large-v2

INT8-CW

prompt0

118.2

5.2

1814.4

192.31

minicpm4-0.5b

INT8-CW

32

18.2

5.7

1276.6

175.44

minicpm4-0.5b

INT8-CW

1024

60.6

6.2

1321.6

161.29

gemma-3-270m

FP16

32

15.5

6.9

1345.6

144.93

distil-large-v2

FP16

prompt1

194.1

7.1

2574.1

140.85

distil-large-v2

FP16

prompt0

133.5

7.2

2543

138.89

gemma-3-270m

FP16

1024

39.8

7.5

1433.4

133.33

llama-3.2-1b-instruct

INT4-MIXED

32

13.5

8.1

1501.3

123.46

llama-3.2-1b-instruct

INT4-MIXED

32

13.9

8.2

1522.4

121.95

gemma-3-1b-it

INT4-MIXED

32

22

8.4

1474.7

119.05

llama-3.2-1b-instruct

INT4-MIXED

1024

60.1

8.5

1599.7

117.65

gemma-3-1b-it

INT4-MIXED

1024

64

8.7

1567.9

114.94

gemma-3-1b-it

INT4-MIXED

32

23.1

8.7

1536.3

114.94

minicpm4-0.5b

FP4-NORMALIZED

32

14.5

9

1615.2

111.11

gemma-3-1b-it

INT4-MIXED

1024

73.2

9

1625.1

111.11

llama-3.2-1b-instruct

INT4-MIXED

1024

82.2

9.1

1626.2

109.89

minicpm4-0.5b

FP4-NORMALIZED

1024

60.9

9.3

1651.2

107.53

minicpm4-0.5b

FP16

32

15.2

9.7

1573.9

103.09

minicpm4-0.5b

FP16

1024

61.9

10.1

1618.8

99.01

qwen2.5-1.5b-instruct

INT4-MIXED

32

22.5

11

1790.1

90.91

qwen2.5-1.5b-instruct

INT4-MIXED

1024

101.4

11.6

1952.5

86.21

gemma-3-1b-it

INT8-CW

32

25.5

11.6

1840.8

86.21

gemma-3-1b-it

INT8-CW

1024

96.9

11.9

1915.2

84.03

llama-3.2-1b-instruct

INT8-CW

32

15.1

12.8

2065.2

78.13

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

23.1

13.3

2345.4

75.19

llama-3.2-1b-instruct

INT8-CW

1024

85.3

13.4

2168.9

74.63

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

116

14

2474.2

71.43

llama-3.2-3b-instruct

INT4-MIXED

32

22.5

18

2646.4

55.56

qwen2.5-coder-3b-instruct

INT4-MIXED

32

28.7

18.2

2534

54.95

qwen2.5-coder-3b-instruct

INT4-MIXED

32

28.7

18.4

2538.7

54.35

llama-3.2-3b-instruct

INT4-MIXED

32

24.1

18.5

2628.6

54.05

llama-3.2-3b-instruct

INT4-MIXED

32

23.7

18.8

2685

53.19

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

127.1

19.1

2662.8

52.36

qwen2.5-coder-3b-instruct

INT4-MIXED

32

30.7

19.1

2806.4

52.36

llama-3.2-3b-instruct

INT4-MIXED

1024

175.4

19.2

2878

52.08

phi-3-mini-128k-instruct

INT4-MIXED

32

25.4

19.4

2876

51.55

phi-3-mini-4k-instruct

INT4-MIXED

32

24.7

19.4

2787.3

51.55

phi-3.5-mini-instruct

INT4-MIXED

32

24.7

19.4

2790

51.55

llama-3.2-3b-instruct

INT4-MIXED

1024

150.4

19.6

2830.9

51.02

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

130

19.6

2664.2

51.02

phi-3-mini-4k-instruct

INT4-MIXED

32

26.1

19.9

2862.8

50.25

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

283.3

20

2924.2

50.00

phi-3.5-mini-instruct

INT4-MIXED

32

26.6

20

2878.3

50.00

llama-3.2-3b-instruct

INT4-MIXED

1024

206.4

20.2

2895.9

49.50

gemma-3-1b-it

FP16

32

24.4

20.4

2743.9

49.02

phi-3-mini-128k-instruct

INT4-MIXED

32

27.8

20.4

3029.5

49.02

gemma-3-1b-it

FP16

1024

124.8

20.8

2881.1

48.08

phi-3-mini-4k-instruct

INT4-MIXED

32

31.5

21.3

3091.5

46.95

phi-3.5-mini-instruct

INT4-MIXED

32

32.2

21.4

3095.2

46.73

phi-3-mini-128k-instruct

INT4-MIXED

1024

235.7

21.7

3511.9

46.08

phi-3-mini-4k-instruct

INT4-MIXED

1024

237.9

21.8

3418.8

45.87

phi-3.5-mini-instruct

INT4-MIXED

1024

237.7

21.8

3423.5

45.87

internvl2-4b

INT4-MIXED

297

163.2

22.2

4246.4

45.05

phi-3-mini-4k-instruct

INT4-MIXED

1024

202.5

22.4

3487.2

44.64

phi-3.5-mini-instruct

INT4-MIXED

1024

204.7

22.4

3509.2

44.64

phi-3-mini-128k-instruct

INT4-MIXED

1024

313

22.8

3647.4

43.86

phi-4-mini-instruct

INT4-MIXED

32

35.3

22.9

3091

43.67

phi-4-mini-reasoning

INT4-MIXED

32

34.2

22.9

2990.2

43.67

internvl2-4b

INT4-MIXED

297

170.8

23.3

4417

42.92

llama-3.2-1b-instruct

FP16

32

25.2

23.4

3104.9

42.74

phi-4-mini-instruct

INT4-MIXED

32

37.7

23.5

3166.5

42.55

phi-4-mini-reasoning

INT4-MIXED

32

35.7

23.5

3165

42.55

phi-3-mini-4k-instruct

INT4-MIXED

1024

341.7

23.7

3720.1

42.19

phi-4-mini-reasoning

INT4-MIXED

32

35.2

23.7

3212.8

42.19

llama-3.2-1b-instruct

FP16

1024

134.4

23.8

3241.8

42.02

phi-3.5-mini-instruct

INT4-MIXED

1024

366

23.8

3739.5

42.02

afm-4.5b

INT4-MIXED

32

29.4

23.8

3626.8

42.02

phi-4-mini-instruct

INT4-MIXED

1024

219.8

24.3

3365.5

41.15

phi-4-mini-reasoning

INT4-MIXED

1024

229.4

24.3

3264.4

41.15

phi-4-mini-instruct

INT4-MIXED

32

38.1

24.6

3275.2

40.65

internvl2-4b

INT4-MIXED

1027

367.3

24.8

5781.4

40.32

phi-4-mini-instruct

INT4-MIXED

1024

210.1

24.8

3443.2

40.32

phi-4-mini-reasoning

INT4-MIXED

1024

208.8

24.8

3440.8

40.32

phi-3.5-vision-instruct

INT4-MIXED

802

334.3

25

5540.8

40.00

afm-4.5b

INT4-MIXED

1024

265.4

25.2

3797.6

39.68

phi-4-mini-reasoning

INT4-MIXED

1024

289.5

25.2

3471.2

39.68

gemma-3-4b-it

INT4-MIXED

32

51.3

25.2

4406.1

39.68

gemma-3-4b-it

INT4-MIXED

32

51

25.7

4528.7

38.91

phi-3.5-vision-instruct

INT4-MIXED

1032

375

25.9

5829.4

38.61

internvl2-4b

INT4-MIXED

1027

389.5

25.9

5870.4

38.61

gemma-3-4b-it

INT4-MIXED

32

53.8

25.9

4595.5

38.61

phi-4-mini-instruct

INT4-MIXED

1024

328.4

26

3553.5

38.46

gemma-3-4b-it

INT4-MIXED

1024

266.6

26.6

6761.2

37.59

gemma-3-4b-it

INT4-MIXED

1024

275.3

27.1

6874.8

36.90

gemma-3-4b-it

INT4-MIXED

1024

348.3

27.4

6940

36.50

gpt-oss-20b

INT4-MIXED

32

135.9

27.4

12961.2

36.50

gpt-oss-20b

INT4-MIXED

1024

465.3

28.5

13189.8

35.09

minicpm3-4b

INT4-MIXED

32

92

28.5

3193.6

35.09

minicpm3-4b

INT4-MIXED

32

93.6

29.2

3317.6

34.25

minicpm3-4b

INT4-MIXED

32

99.2

29.7

3511.1

33.67

deepseek-r1-distill-qwen-1.5b

FP16

32

32.4

29.8

4152.9

33.56

qwen2.5-1.5b-instruct

FP16

32

32.3

29.8

3709.2

33.56

deepseek-r1-distill-qwen-1.5b

FP16

1024

161.2

30.3

4295.8

33.00

qwen2.5-1.5b-instruct

FP16

1024

161

30.3

3851.8

33.00

gpt-j-6b

INT4-MIXED

32

42

30.5

4024.6

32.79

qwen2.5-coder-3b-instruct

INT8-CW

32

35.1

30.5

3782.1

32.79

flan-t5-xxl

INT4-MIXED

33

46.7

30.6

12828.9

32.68

qwen2.5-coder-3b-instruct

INT8-CW

1024

205.1

31.6

3900.3

31.65

llama-3.2-3b-instruct

INT8-CW

32

37.2

31.6

4003.7

31.65

llama-2-7b-chat-hf

INT4-MIXED

32

38.1

32.5

4240

30.77

llama-3.2-3b-instruct

INT8-CW

1024

222.9

32.9

4222.2

30.40

gpt-j-6b

INT4-MIXED

1024

300.4

33.1

5174.8

30.21

gpt-j-6b

INT4-MIXED

32

49.9

33.1

4507.4

30.21

minicpm3-4b

INT4-MIXED

1024

527.3

33.4

4380.6

29.94

llama-2-7b-chat-hf

INT4-MIXED

32

41.1

33.5

4409.2

29.85

minicpm3-4b

INT4-MIXED

1024

493.8

34

4502.7

29.41

falcon-7b-instruct

INT4-MIXED

32

41.3

34.3

4342.1

29.15

chatglm3-6b

FP16

7

36.8

34.4

4391.2

29.07

biomistral-7b-slerp

INT4-MIXED

7

36.9

34.4

4391.2

29.07

chatglm3-6b

INT4-MIXED

7

36.8

34.4

4391.2

29.07

chatglm3-6b

INT4-MIXED

7

36.8

34.4

4391.2

29.07

chatglm3-6b

INT8-CW

7

36.8

34.4

4391.2

29.07

minicpm3-4b

INT4-MIXED

1024

600.1

34.5

4688.6

28.99

mistral-7b-instruct-v0.2

INT4-MIXED

32

41.1

34.5

4400.7

28.99

mistral-7b-instruct-v0.3

INT4-MIXED

32

41.7

34.5

4408.8

28.99

flan-t5-xxl

INT4-MIXED

1139

289.8

35

15033.6

28.57

falcon-7b-instruct

INT4-MIXED

1024

329.8

35.5

4489.3

28.17

llama-2-7b-chat-hf

INT4-MIXED

1024

291.9

35.5

4959.2

28.17

gpt-j-6b

INT4-MIXED

1024

437.4

35.6

5663.3

28.09

mistral-7b-instruct-v0.2

INT4-MIXED

32

44.7

35.6

4589.8

28.09

mistral-7b-instruct-v0.3

INT4-MIXED

32

45.1

35.7

4678.4

28.01

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

42.9

36

5057.1

27.78

qwen2-7b-instruct

INT4-MIXED

32

42.9

36

4970.5

27.78

qwen2.5-7b-instruct

INT4-MIXED

32

43.7

36

4971.2

27.78

biomistral-7b-slerp

INT4-MIXED

7

38.6

36

4779.3

27.78

mistral-7b-instruct-v0.2

INT4-MIXED

32

45.5

36.1

4794.1

27.70

qwen2.5-7b-instruct-1m

INT4-MIXED

32

43.4

36.1

4971.9

27.70

mistral-7b-instruct-v0.3

INT4-MIXED

1024

299

36.2

4713.1

27.62

mistral-7b-instruct-v0.1

INT4-MIXED

32

45.7

36.2

4695.9

27.62

mistral-7b-instruct-v0.3

INT4-MIXED

32

45.6

36.2

4799.9

27.62

mistral-7b-instruct-v0.2

INT4-MIXED

1024

301.3

36.3

4705.4

27.55

llama-2-7b-chat-hf

INT4-MIXED

1024

294.5

36.5

5113.9

27.40

phi-3-mini-128k-instruct

INT8-CW

32

43.4

36.7

4517

27.25

phi-3-mini-4k-instruct

INT8-CW

32

43.4

36.9

4520.5

27.10

phi-3.5-mini-instruct

INT8-CW

32

43.8

36.9

4517.5

27.10

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

45.8

37

5150.4

27.03

qwen2.5-7b-instruct

INT4-MIXED

32

45.7

37

5160.2

27.03

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

252

37.3

5279.3

26.81

mistral-7b-instruct-v0.2

INT4-MIXED

1024

286.6

37.3

4874.3

26.81

qwen2-7b-instruct

INT4-MIXED

1024

258.9

37.3

5190.9

26.81

qwen2.5-7b-instruct

INT4-MIXED

1024

253.9

37.3

5191.7

26.81

mistral-7b-instruct-v0.3

INT4-MIXED

1024

276.7

37.4

4970

26.74

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

262.4

37.4

5192.5

26.74

falcon-7b-instruct

INT4-MIXED

32

48.2

37.4

4927

26.74

qwen2-7b-instruct

INT4-MIXED

32

46.7

37.5

5365.9

26.67

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

46.7

37.6

5366.6

26.60

qwen2.5-7b-instruct

INT4-MIXED

32

46.5

37.6

5275.8

26.60

qwen2.5-7b-instruct-1m

INT4-MIXED

32

46.7

37.6

5274.2

26.60

mistral-7b-instruct-v0.1

INT4-MIXED

1025

416.4

37.9

4989.3

26.39

mistral-7b-instruct-v0.2

INT4-MIXED

1024

389.7

37.9

5089.3

26.39

mistral-7b-instruct-v0.3

INT4-MIXED

1024

400.4

38

5093.7

26.32

llama-3-8b-instruct

INT4-MIXED

32

44.9

38

5176.7

26.32

llama-3.1-8b-instruct

INT4-MIXED

32

44.7

38

5177.8

26.32

deepseek-r1-distill-llama-8b

INT4-MIXED

32

44.5

38.1

5177.6

26.25

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

272.5

38.3

5376.8

26.11

qwen2.5-7b-instruct

INT4-MIXED

1024

258.6

38.3

5383.1

26.11

phi-4-mini-instruct

INT8-CW

32

45.8

38.3

4626.1

26.11

phi-4-mini-reasoning

INT8-CW

32

45

38.3

4531.1

26.11

falcon-7b-instruct

INT4-MIXED

1024

414

38.7

5117.2

25.84

qwen2-7b-instruct

INT4-MIXED

1024

382.5

38.8

5571.3

25.77

qwen2.5-7b-instruct

INT4-MIXED

1024

373.5

38.8

5483.2

25.77

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

376.1

38.9

5572.7

25.71

minicpm4-8b

INT4-MIXED

32

46.2

38.9

5022.6

25.71

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

371.4

39

5479.5

25.64

llama-3-8b-instruct

INT4-MIXED

32

49.8

39.1

5378.2

25.58

llama-3.1-8b-instruct

INT4-MIXED

32

47.9

39.1

5456.7

25.58

phi-3-mini-128k-instruct

INT8-CW

1024

318.3

39.2

5135.6

25.51

phi-3-mini-4k-instruct

INT8-CW

1024

324.2

39.2

5138.6

25.51

phi-3.5-mini-instruct

INT8-CW

1024

320.9

39.2

5135.3

25.51

llama-3-8b-instruct

INT4-MIXED

32

48.9

39.5

5366

25.32

internvl2-4b

INT8-CW

297

162.9

39.5

5778.2

25.32

llama-3-8b-instruct

INT4-MIXED

1024

304.1

39.7

5482.5

25.19

llama-3.1-8b-instruct

INT4-MIXED

1024

309.8

39.7

5482.1

25.19

llama-3-8b-instruct

INT4-MIXED

32

50.1

39.7

5481.4

25.19

gemma-3-4b-it

INT8-CW

32

56.2

39.7

6003.9

25.19

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

307.8

39.8

5481.1

25.13

phi-4-mini-instruct

INT8-CW

1024

282.1

39.8

4882.7

25.13

phi-4-mini-reasoning

INT8-CW

1024

286.2

39.8

4790.1

25.13

minicpm4-8b

INT4-MIXED

32

51

40.2

5239.7

24.88

minicpm4-8b

INT4-MIXED

1024

332.4

40.3

5196.3

24.81

minicpm4-8b

INT4-MIXED

32

54.1

40.6

5452.6

24.63

afm-4.5b

INT8-CW

32

45.9

40.6

5349

24.63

llama-3-8b-instruct

INT4-MIXED

1024

301.9

40.8

5683.6

24.51

llama-3.1-8b-instruct

INT4-MIXED

1024

295.1

40.9

5764.8

24.45

llama-3-8b-instruct

INT4-MIXED

1024

293.3

41

5674.6

24.39

gemma-3-4b-it

INT8-CW

1024

407.2

41.2

8371.7

24.27

llama-3-8b-instruct

INT4-MIXED

1024

400.4

41.4

5774.6

24.15

phi-3.5-vision-instruct

INT8-CW

802

298

41.4

6797.3

24.15

minicpm4-8b

INT4-MIXED

1024

327.8

41.5

5414.6

24.10

afm-4.5b

INT8-CW

1024

269.3

41.9

5519.4

23.87

minicpm4-8b

INT4-MIXED

1024

427.3

42.1

5614.6

23.75

internvl2-4b

INT8-CW

1027

399.5

42.1

6911.6

23.75

phi-3.5-vision-instruct

INT8-CW

1032

360.5

42.2

7100.3

23.70

gemma-7b-it

INT4-MIXED

32

52.5

44

5402.3

22.73

gemma-7b-it

INT4-MIXED

32

56.3

45.7

5833

21.88

glm-4-9b-chat-hf

INT4-MIXED

32

58.1

46.4

6082

21.55

gemma-7b-it

INT4-MIXED

1024

351.8

46.5

6112.7

21.51

minicpm3-4b

INT8-CW

32

102.1

46.5

5179.6

21.51

deepseek-r1-distill-llama-8b

INT4-MIXED

32

62.2

47

6448.8

21.28

llama-3.1-8b-instruct

INT4-MIXED

32

61.2

47

6356.4

21.28

glm-4-9b-chat-hf

INT4-MIXED

32

62.6

47.6

6295.5

21.01

glm-4-9b-chat-hf

INT4-MIXED

1024

453.8

48

6639.5

20.83

gemma-7b-it

INT4-MIXED

1024

465.8

48.2

6526.7

20.75

llama-3.1-8b-instruct

INT4-MIXED

1024

449.2

48.7

6653.7

20.53

llava-next-video-7b-hf

INT4-MIXED

2945

1900

48.8

8941

20.49

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

448.3

48.8

6747.6

20.49

glm-4-9b-chat-hf

INT4-MIXED

32

64.9

48.8

6448.3

20.49

gemma-2-9b-it

INT4-MIXED

32

60.2

49.1

5851.1

20.37

glm-4-9b-chat-hf

INT4-MIXED

1024

479

49.5

6854.1

20.20

glm-4-9b-chat-hf

INT4-MIXED

1024

542.7

50.1

6993

19.96

gemma-2-9b-it

INT4-MIXED

32

64.7

50.4

6093.8

19.84

llava-next-video-7b-hf

INT4-MIXED

2945

2132.4

50.5

9169.6

19.80

gemma-2-9b-it

INT4-MIXED

32

65.5

51

6338.5

19.61

ltx-video

INT8-CW

11

52.1

51.4

9317.9

19.46

minicpm3-4b

INT8-CW

1024

605.6

51.8

6340

19.31

gemma-2-9b-it

INT4-MIXED

1024

408.5

52

6453.4

19.23

gemma-2-9b-it

INT4-MIXED

1024

398.2

53.3

6701.5

18.76

gemma-2-9b-it

INT4-MIXED

1024

521.5

54

6937.4

18.52

flan-t5-xxl

INT8-CW

33

128.5

54.4

22574

18.38

ltx-video

INT4-MIXED

11

56.5

55.4

6459

18.05

llama-3.2-3b-instruct

FP4-NORMALIZED

32

59.5

56.4

6668.8

17.73

qwen2.5-coder-3b-instruct

FP16

32

60.4

57.3

6762.7

17.45

gpt-j-6b

INT8-CW

32

68

57.3

6813.4

17.45

llama-3.2-3b-instruct

FP4-NORMALIZED

1024

265.8

57.6

6893.8

17.36

qwen2.5-coder-3b-instruct

FP16

1024

311.8

57.9

6936.2

17.27

llama-3.2-3b-instruct

FP16

32

65.6

59.7

6954.9

16.75

flan-t5-xxl

INT8-CW

1139

287.3

59.7

24753.2

16.75

gpt-j-6b

INT8-CW

1024

416.7

59.9

7948.1

16.69

lcm-dreamshaper-v7

INT8-CW

1024

63.5

60.6

5206.3

16.50

llama-2-13b-chat-hf

INT4-MIXED

32

72.3

60.7

7354.6

16.47

lcm-dreamshaper-v7

INT8-CW

1024

63.1

60.8

4943.5

16.45

llama-2-7b-chat-hf

INT8-CW

32

67.8

60.8

7432.6

16.45

llama-3.2-3b-instruct

FP16

1024

270.6

61.3

7279.4

16.31

lcm-dreamshaper-v7

INT8-CW

32

61.4

62.6

5030

15.97

lcm-dreamshaper-v7

INT8-CW

32

66.5

63

5204.7

15.87

gemma-3-12b-it

INT4-MIXED

32

85.9

64

8912.2

15.63

llama-2-7b-chat-hf

INT8-CW

1024

365.5

64

8138.2

15.63

biomistral-7b-slerp

INT8-CW

7

67.4

65

7821.6

15.38

mistral-7b-instruct-v0.1

INT8-CW

32

73.7

65.1

7737.2

15.36

mistral-7b-instruct-v0.2

INT8-CW

32

73.8

65.2

7736.5

15.34

mistral-7b-instruct-v0.3

INT8-CW

32

73.8

65.2

7840.8

15.34

llama-2-13b-chat-hf

INT4-MIXED

1024

556.8

65.9

8448.5

15.17

qwen2-7b-instruct

INT8-CW

32

75.1

65.9

8087.4

15.17

qwen2.5-7b-instruct

INT8-CW

32

75.1

65.9

8088.8

15.17

qwen2.5-7b-instruct-1m

INT8-CW

32

75.2

66

8182.4

15.15

phi-4-mini-instruct

FP4-NORMALIZED

32

71.9

66.1

7585

15.13

deepseek-r1-distill-qwen-7b

INT8-CW

32

74.9

66.1

8175.9

15.13

phi-4-mini-reasoning

FP4-NORMALIZED

32

71.8

66.3

7685.1

15.08

gemma-3-12b-it

INT4-MIXED

32

116.6

66.6

9357.9

15.02

mistral-7b-instruct-v0.2

INT8-CW

1024

383.2

67

8035.9

14.93

mistral-7b-instruct-v0.1

INT8-CW

1025

403.5

67.1

8023.1

14.90

mistral-7b-instruct-v0.3

INT8-CW

1024

385

67.2

8136.8

14.88

deepseek-r1-distill-qwen-7b

INT8-CW

1024

368.2

67.5

8391.1

14.81

phi-4-mini-reasoning

FP4-NORMALIZED

1024

338.2

67.6

7933.1

14.79

qwen2-7b-instruct

INT8-CW

1024

372.1

67.6

8299.1

14.79

qwen2.5-7b-instruct

INT8-CW

1024

373.5

67.6

8298.4

14.79

phi-4-mini-instruct

FP4-NORMALIZED

1024

363.4

67.8

7841.4

14.75

qwen2.5-7b-instruct-1m

INT8-CW

1024

370.1

67.8

8392.6

14.75

gemma-3-12b-it

INT4-MIXED

1024

653.3

68

11641

14.71

phi-4

INT4-MIXED

32

79.7

68.3

8359.4

14.64

phi-4-reasoning

INT4-MIXED

32

79.5

68.3

8358.1

14.64

llama-3.1-8b-instruct

INT8-CW

32

75.9

68.6

8521.5

14.58

deepseek-r1-distill-llama-8b

INT8-CW

32

76.1

68.7

8611.3

14.56

llama-3-8b-instruct

INT8-CW

32

76.9

68.7

8614.3

14.56

phi-3-mini-128k-instruct

FP16

32

73.4

68.8

8305

14.53

phi-3-mini-4k-instruct

FP16

32

75.5

68.9

8211.3

14.51

phi-3.5-mini-instruct

FP16

32

73.6

68.9

8191

14.51

baichuan2-13b-chat

INT4-MIXED

32

87.3

68.9

8815.4

14.51

qwen1.5-14b-chat

INT4-MIXED

32

84.4

69

9086.1

14.49

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

80.3

69.4

8779.6

14.41

gemma-3-12b-it

INT4-MIXED

1024

743

70.7

12037.7

14.14

llama-3.1-8b-instruct

INT8-CW

1024

383

70.8

8810.4

14.12

deepseek-r1-distill-llama-8b

INT8-CW

1024

376.6

70.9

8903.4

14.10

llama-3-8b-instruct

INT8-CW

1024

427.1

70.9

8902.9

14.10

phi-4-mini-instruct

FP16

32

77.3

71.4

8169.8

14.01

phi-4

INT4-MIXED

1024

619.7

71.4

8829.3

14.01

phi-4-reasoning

INT4-MIXED

1024

615.3

71.4

8826.8

14.01

internvl2-4b

FP16

297

216.4

71.6

9512.2

13.97

phi-4-mini-reasoning

FP16

32

77.1

71.6

8171.4

13.97

phi-4

INT4-MIXED

32

89.4

71.9

9057

13.91

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

90.6

72.4

9400.5

13.81

phi-3.5-mini-instruct

FP16

1024

389.2

72.7

9095.2

13.76

phi-3-mini-4k-instruct

FP16

1024

386.1

72.8

9094

13.74

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

592.9

72.9

9169.9

13.72

phi-3-mini-128k-instruct

FP16

1024

345

73.1

9186.6

13.68

phi-4-mini-instruct

FP16

1024

367.6

73.2

8562.5

13.66

phi-3.5-vision-instruct

FP16

802

406.7

73.2

10330.5

13.66

phi-4-mini-reasoning

FP16

1024

342.5

73.4

8560.5

13.62

baichuan2-13b-chat

INT4-MIXED

1024

1837.1

73.9

13059.7

13.53

minicpm4-8b

INT8-CW

32

83

73.9

8743

13.53

phi-3.5-vision-instruct

FP16

1032

518.5

74.1

10638.2

13.50

qwen1.5-14b-chat

INT4-MIXED

1024

664.3

74.1

10171.1

13.50

internvl2-4b

FP16

1027

543.4

74.3

10380.3

13.46

phi-4-reasoning

INT4-MIXED

32

103.3

74.7

9650.4

13.39

phi-4

INT4-MIXED

1024

738.8

74.9

9509.4

13.35

gemma-3-4b-it

FP16

32

84

75

10702.8

13.33

minicpm4-8b

INT8-CW

1024

428

75.7

8906.9

13.21

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

722.9

75.8

9786

13.19

gemma-3-4b-it

FP16

1024

399.2

76.7

13070.1

13.04

llava-next-video-7b-hf

INT8-CW

2945

2147.7

77

11933.6

12.99

phi-4-reasoning

INT4-MIXED

1024

823.7

77.7

9782.8

12.87

gemma-7b-it

INT8-CW

32

92.3

79.9

9104.5

12.52

afm-4.5b

FP16

32

84.4

80

9635.6

12.50

afm-4.5b

FP16

1024

314.6

81.2

9865.1

12.32

glm-4-9b-chat-hf

INT8-CW

32

95.2

82.6

9980.1

12.11

gemma-7b-it

INT8-CW

1024

446

82.7

9807.5

12.09

glm-4-9b-chat-hf

INT8-CW

1024

530.4

84.5

10503.6

11.83

phi-4-multimodal-instruct

FP16

578

472.2

85.2

12425.3

11.74

phi-4-multimodal-instruct

FP16

786

550.6

85.5

13352.9

11.70

phi-4-multimodal-instruct

FP16

1362

1114.8

86.5

14960.9

11.56

phi-4-multimodal-instruct

FP16

1570

1135.4

86.8

15711.6

11.52

gemma-2-9b-it

INT8-CW

32

99.4

87.6

9830.1

11.42

gemma-2-9b-it

INT8-CW

1024

503.3

90.7

10407.2

11.03

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

t5-small

INT4-MIXED

32

10

5

1052.1

200.00

t5-small

INT8-CW

32

11.7

5.2

1218.7

192.31

t5-small

INT4-MIXED

1024

14.1

5.3

1010.3

188.68

t5-small

INT4-MIXED

32

10.9

5.7

1142.7

175.44

t5-small

INT8-CW

1024

16.1

5.7

1216.8

175.44

t5-small

INT4-MIXED

1024

17.4

5.8

1114.4

172.41

t5-small

INT4-MIXED

32

16.1

5.9

1155.4

169.49

t5-small

FP16

32

18.5

6.1

1201.4

163.93

t5-small

INT4-MIXED

1024

17

6.1

1115

163.93

t5-small

FP16

1024

12.5

6.6

1327.6

151.52

distil-large-v2

INT8-CW

32

237.5

9

2643.4

111.11

minicpm4-0.5b

INT4-MIXED

32

28.2

9.1

1397.6

109.89

minicpm4-0.5b

INT4-MIXED

32

32.1

9.7

1445.1

103.09

minicpm4-0.5b

INT4-MIXED

1024

65.6

10.2

1274.1

98.04

whisper-large-v3-turbo

INT8-CW

32

257.8

10.5

2680.7

95.24

distil-large-v2

INT4-MIXED

32

260.5

10.8

2128.4

92.59

minicpm4-0.5b

INT4-MIXED

32

36.4

10.8

1414

92.59

whisper-large-v3-turbo

INT8-CW

1024

336.3

11.1

2463.5

90.09

distil-large-v2

INT4-MIXED

1024

348

11.2

1976

89.29

distil-large-v2

INT8-CW

1024

327.1

11.2

2441.6

89.29

whisper-large-v3-turbo

INT4-MIXED

1024

355.6

11.3

2082.5

88.50

minicpm4-0.5b

INT8-CW

32

35.8

11.4

1736.1

87.72

minicpm4-0.5b

INT4-MIXED

1024

73.1

11.5

1314.9

86.96

whisper-large-v3-turbo

INT4-MIXED

32

282

12

2229.2

83.33

minicpm4-0.5b

INT4-MIXED

1024

69.8

12.2

1295.2

81.97

whisper-small

INT4-MIXED

32

126.3

12.9

1554.1

77.52

gemma-3-270m

INT4-MIXED

32

42.5

13

1614.4

76.92

minicpm4-0.5b

INT8-CW

1024

65.3

13

1605.3

76.92

whisper-small

INT4-MIXED

1024

180.8

13.1

1410.2

76.34

gemma-3-270m

INT4-MIXED

1024

53.6

13.8

1470.5

72.46

whisper-small

INT4-MIXED

32

129.6

14.4

1671.9

69.44

gemma-3-270m

INT8-CW

1024

59.1

14.9

1581.3

67.11

gemma-3-270m

INT8-CW

32

40.3

14.9

1696.7

67.11

whisper-small

INT4-MIXED

1024

197.6

15.2

1533.7

65.79

gemma-3-270m

FP16

32

42.2

15.6

1586.8

64.10

whisper-small

INT4-MIXED

32

141.8

15.6

1652.4

64.10

whisper-small

INT4-MIXED

1024

184.6

15.7

1513.1

63.69

whisper-small

INT8-CW

1024

202.4

16.1

1702.5

62.11

whisper-small

INT8-CW

32

151.8

16.2

1817.9

61.73

whisper-large-v3-turbo

FP16

32

320.5

16.4

3183.7

60.98

whisper-large-v3-turbo

FP16

1024

418.2

16.5

3213.9

60.61

gemma-3-270m

FP16

1024

51.8

16.6

1661.4

60.24

distil-large-v2

FP16

1024

380.6

16.9

2693.5

59.17

llama-3.2-1b-instruct

INT4-MIXED

32

43.2

17.1

2470.4

58.48

distil-large-v2

FP16

32

304.1

17.2

2662.8

58.14

whisper-small

FP16

32

136

18

1888.7

55.56

llama-3.2-1b-instruct

INT4-MIXED

1024

143

18.4

2273.5

54.35

whisper-small

FP16

1024

203.4

18.5

1920.6

54.05

llama-3.2-1b-instruct

INT4-MIXED

32

32.2

21

2443.1

47.62

llama-3.2-1b-instruct

INT8-CW

32

48.1

21.8

2549.1

45.87

llama-3.2-1b-instruct

INT4-MIXED

1024

114.3

22

2240.9

45.45

gemma-3-1b-it

INT4-MIXED

32

75.1

22.6

2123.2

44.25

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

75.9

22.8

2662.2

43.86

qwen2.5-1.5b-instruct

INT4-MIXED

32

79.2

22.9

2610.7

43.67

gemma-3-1b-it

INT4-MIXED

1024

119.2

23

2013.9

43.48

llama-3.2-1b-instruct

INT8-CW

1024

133.5

23

2501.4

43.48

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

155.3

23.2

2381.6

43.10

qwen2.5-1.5b-instruct

INT4-MIXED

1024

159.5

23.8

2321.7

42.02

gemma-3-1b-it

INT4-MIXED

1024

138.9

24.1

2182.3

41.49

gemma-3-1b-it

INT4-MIXED

32

81.6

24.1

2299.9

41.49

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

69.6

24.8

2959.9

40.32

qwen2.5-1.5b-instruct

INT4-MIXED

32

88

25.1

2713.8

39.84

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

163.8

26.1

2634.8

38.31

qwen2.5-1.5b-instruct

INT4-MIXED

32

70.9

26.1

2434.7

38.31

qwen2.5-1.5b-instruct

INT4-MIXED

1024

203.7

26.4

2563.3

37.88

qwen2.5-1.5b-instruct

INT4-MIXED

1024

165.5

27

2263.4

37.04

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

90.3

27

3127.3

37.04

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

203.9

27.4

2954.8

36.50

smolvlm2-256m-video-instruct

INT4-MIXED

1141

618.3

27.7

2952.6

36.10

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

75.3

28

3540.4

35.71

gemma-3-1b-it

INT8-CW

32

90.7

28.6

2469.1

34.97

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

166.9

28.7

3240.5

34.84

qwen2.5-1.5b-instruct

INT8-CW

32

78.8

28.7

3004.4

34.84

gemma-3-1b-it

INT8-CW

1024

133.9

29.3

2368.4

34.13

qwen2.5-1.5b-instruct

INT8-CW

1024

169

29.5

2726.9

33.90

smolvlm2-256m-video-instruct

INT8-CW

1141

624.1

34

2964.9

29.41

smolvlm2-256m-video-instruct

FP16

1141

701.4

34.6

3688.7

28.90

phi-2

INT4-MIXED

32

134.8

34.6

2997.2

28.90

llama-3.2-3b-instruct

INT4-MIXED

32

103.1

36.2

3009.3

27.62

qwen2.5-coder-3b-instruct

INT4-MIXED

32

107.5

36.3

3018.4

27.55

afm-4.5b

INT4-MIXED

32

84.2

36.7

4387

27.25

stablelm-3b-4e1t

INT4-MIXED

32

133.2

36.7

2990.3

27.25

phi-2

INT4-MIXED

1024

386.3

36.8

3174.5

27.17

stable-zephyr-3b-dpo

INT4-MIXED

32

137.2

37.1

2990

26.95

llama-3.2-3b-instruct

INT4-MIXED

32

103.6

37.3

3486.2

26.81

llama-3.2-3b-instruct

INT4-MIXED

1024

327.5

37.5

3097.5

26.67

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

294.1

37.5

2987

26.67

phi-3-mini-128k-instruct

INT4-MIXED

32

111.1

37.9

3660.5

26.39

phi-3-mini-4k-instruct

INT4-MIXED

32

117.6

38.1

3653.5

26.25

qwen2.5-coder-3b-instruct

INT4-MIXED

32

141.4

38.2

3618.9

26.18

afm-4.5b

INT4-MIXED

1024

460.3

38.5

4232

25.97

llama-3.2-3b-instruct

INT4-MIXED

1024

424.5

38.8

3315.4

25.77

phi-3.5-mini-instruct

INT4-MIXED

32

113.9

38.8

3654.6

25.77

phi-2

INT4-MIXED

32

137.2

38.9

3268.1

25.71

stable-zephyr-3b-dpo

INT4-MIXED

1024

387.4

39.1

3157.7

25.58

stablelm-3b-4e1t

INT4-MIXED

1024

389.9

39.3

3153.9

25.45

stable-zephyr-3b-dpo

INT4-MIXED

32

152.4

39.8

3352.9

25.13

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

493.3

40.3

3249.6

24.81

phi-3-mini-128k-instruct

INT4-MIXED

1024

514

40.8

3714.6

24.51

phi-3-mini-4k-instruct

INT4-MIXED

1024

513.5

41.1

3715.5

24.33

stable-zephyr-3b-dpo

INT4-MIXED

1024

473.3

41.2

3488.4

24.27

phi-3-mini-128k-instruct

INT4-MIXED

32

122.8

41.3

3632

24.21

phi-2

INT4-MIXED

1024

485.9

41.5

3414.5

24.10

phi-3.5-mini-instruct

INT4-MIXED

1024

510.5

41.7

3720.9

23.98

gemma-3-1b-it

FP16

32

99.7

42.2

3200.1

23.70

gemma-3-1b-it

FP16

1024

182.7

42.4

3328.3

23.58

phi-3-mini-128k-instruct

INT4-MIXED

1024

653.9

43.4

3804

23.04

biomistral-7b-slerp

INT4-MIXED

7

62.4

43.4

5201.3

23.04

phi-3-mini-4k-instruct

INT4-MIXED

32

106.1

43.7

3649

22.88

llama-3.2-3b-instruct

INT4-MIXED

32

91.8

44.4

3332.6

22.52

phi-3.5-mini-instruct

INT4-MIXED

32

139.2

44.7

3677.5

22.37

phi-4-mini-reasoning

INT4-MIXED

32

137.8

45.1

3531.1

22.17

stablelm-3b-4e1t

INT4-MIXED

32

134.2

45.4

3506.4

22.03

phi-2

INT8-CW

32

134.2

45.8

4346.6

21.83

stablelm-3b-4e1t

INT8-CW

32

121.3

46.1

4345.1

21.69

llama-3.2-3b-instruct

INT4-MIXED

1024

328.6

46.2

3141.2

21.65

stable-zephyr-3b-dpo

INT8-CW

32

123.9

46.2

4443.6

21.65

biomistral-7b-slerp

INT4-MIXED

7

64.3

46.5

5410.6

21.51

internvl2-4b

INT4-MIXED

297

358.2

47

4322.1

21.28

phi-4-mini-instruct

INT4-MIXED

32

158.5

47.1

3432.6

21.23

phi-3-mini-4k-instruct

INT4-MIXED

1024

667.6

47.4

3885

21.10

phi-4-mini-instruct

INT4-MIXED

1024

431.6

47.6

3574.9

21.01

phi-3.5-mini-instruct

INT4-MIXED

1024

675.8

48.1

3908.7

20.79

phi-4-mini-reasoning

INT4-MIXED

1024

427.8

48.3

3675

20.70

stablelm-3b-4e1t

INT4-MIXED

1024

413.4

48.5

3617.1

20.62

phi-4-mini-reasoning

INT4-MIXED

32

166.6

48.7

3856.7

20.53

phi-2

INT8-CW

1024

415.4

48.7

4383.8

20.53

internvl2-4b

INT4-MIXED

297

397.2

48.9

4351.7

20.45

stable-zephyr-3b-dpo

INT8-CW

1024

421

48.9

4539.5

20.45

stablelm-3b-4e1t

INT8-CW

1024

414.7

48.9

4398.9

20.45

internvl2-4b

INT4-MIXED

1027

804.1

49.1

4896

20.37

phi-4-mini-reasoning

INT4-MIXED

1024

535.8

50.2

3751.2

19.92

phi-3.5-vision-instruct

INT4-MIXED

802

707.5

50.3

5136.7

19.88

phi-3.5-mini-instruct

INT4-MIXED

32

110.8

50.9

3521.5

19.65

afm-4.5b

INT8-CW

32

71.5

50.9

5726.2

19.65

phi-3-mini-4k-instruct

INT4-MIXED

32

116.9

51.2

3512.7

19.53

llama-3.2-3b-instruct

INT8-CW

32

90.7

51.2

4467

19.53

qwen2.5-coder-3b-instruct

INT8-CW

32

136

51.5

4367.4

19.42

internvl2-4b

INT4-MIXED

1027

884.3

51.7

4965.2

19.34

phi-4-mini-instruct

INT4-MIXED

32

169.7

51.8

3998.2

19.31

phi-3.5-vision-instruct

INT4-MIXED

1032

891.5

52.3

5107.8

19.12

llama-3.2-3b-instruct

INT8-CW

1024

347.6

52.4

4581.6

19.08

qwen2.5-coder-3b-instruct

INT8-CW

1024

407.7

52.6

4327

19.01

qwen2.5-coder-3b-instruct

INT4-MIXED

32

63.6

52.9

3224.7

18.90

afm-4.5b

INT8-CW

1024

346.2

53

5770.5

18.87

phi-4-mini-instruct

INT4-MIXED

1024

571.5

53.5

3749.6

18.69

phi-3.5-mini-instruct

INT4-MIXED

1024

509.1

54.2

3751

18.45

phi-3-mini-4k-instruct

INT4-MIXED

1024

500

54.4

3742.7

18.38

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

311.9

54.4

2949.9

18.38

phi-4-mini-instruct

INT4-MIXED

32

148.9

55.1

3964.9

18.15

phi-4-mini-reasoning

INT4-MIXED

32

136.9

55.3

3968.1

18.08

deepseek-r1-distill-qwen-1.5b

FP16

32

89.6

55.7

4594.4

17.95

deepseek-r1-distill-qwen-1.5b

FP16

1024

250.5

56

4698.2

17.86

whisper-large-v3

INT4-MIXED

1024

566.5

57.1

3372.1

17.51

qwen2.5-1.5b-instruct

FP16

32

96.7

57.2

4045.2

17.48

whisper-large-v3

INT8-CW

1024

536.1

57.4

4018.2

17.42

phi-4-mini-reasoning

INT4-MIXED

1024

437.4

57.5

3627.7

17.39

phi-4-mini-instruct

INT4-MIXED

1024

433.8

57.7

3625.9

17.33

qwen2.5-1.5b-instruct

FP16

1024

259.3

57.8

4158.7

17.30

llama-2-7b-chat-hf

INT4-MIXED

32

109.5

57.8

5118.3

17.30

phi-3-mini-128k-instruct

INT8-CW

32

123.6

57.9

5357.2

17.27

whisper-large-v3

INT8-CW

32

480.2

58

4230.1

17.24

phi-3-mini-4k-instruct

INT8-CW

32

101.2

58

5355.9

17.24

phi-3.5-mini-instruct

INT8-CW

32

108.6

58.2

5357.3

17.18

whisper-large-v3

INT4-MIXED

32

504.5

58.3

3718.6

17.15

gpt-j-6b

INT4-MIXED

32

200

58.3

4869.3

17.15

gemma-3-4b-it

INT4-MIXED

32

182.8

60.3

4828.2

16.58

phi-3.5-mini-instruct

INT8-CW

1024

535.8

61.1

5325.4

16.37

phi-3-mini-128k-instruct

INT8-CW

1024

541.2

61.2

5299

16.34

phi-3-mini-4k-instruct

INT8-CW

1024

550.5

61.2

5286.9

16.34

mistral-7b-instruct-v0.2

INT4-MIXED

32

114.6

61.3

5300.5

16.31

mistral-7b-instruct-v0.3

INT4-MIXED

32

104.9

61.9

5231.7

16.16

gpt-j-6b

INT4-MIXED

1024

715.2

62

5443.4

16.13

falcon-7b-instruct

INT4-MIXED

32

114.4

62.1

5113.3

16.10

phi-4-mini-reasoning

INT8-CW

32

137.9

62.4

4985

16.03

llama-2-7b-chat-hf

INT4-MIXED

1024

718.7

62.7

5154.4

15.95

phi-4-mini-instruct

INT8-CW

32

140.5

62.7

4986.5

15.95

falcon-7b-instruct

INT4-MIXED

1024

740.1

63.2

4690.1

15.82

gemma-3-4b-it

INT4-MIXED

1024

577.1

63.4

6729.3

15.77

gemma-3-4b-it

INT4-MIXED

32

206.3

63.4

5024.3

15.77

phi-4-mini-reasoning

INT8-CW

1024

458.7

63.7

5035.9

15.70

phi-4-mini-instruct

INT8-CW

1024

458.9

63.8

5060.4

15.67

mistral-7b-instruct-v0.3

INT4-MIXED

1024

709.5

63.9

4931.5

15.65

flan-t5-xxl

INT4-MIXED

33

296.5

63.9

13708.9

15.65

mistral-7b-instruct-v0.2

INT4-MIXED

1024

725.1

64

5022.1

15.63

internvl2-4b

INT8-CW

297

371.8

64.1

6073.4

15.60

qwen2.5-7b-instruct-1m

INT4-MIXED

32

101

64.6

5443.8

15.48

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

98.2

64.9

5442.1

15.41

qwen2-7b-instruct

INT4-MIXED

32

118.1

65

5537.5

15.38

qwen2.5-7b-instruct

INT4-MIXED

32

104.9

65

5447

15.38

mistral-7b-instruct-v0.1

INT4-MIXED

32

113.7

65.3

5306.7

15.31

qwen2.5-7b-instruct

INT4-MIXED

1024

653.2

65.6

5474.1

15.24

gpt-j-6b

INT4-MIXED

32

190

65.8

5243.2

15.20

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

652.2

65.9

5468.6

15.17

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

640.4

65.9

5469.1

15.17

mistral-7b-instruct-v0.2

INT4-MIXED

32

118.1

65.9

5271.6

15.17

mistral-7b-instruct-v0.3

INT4-MIXED

32

128.5

65.9

5425.7

15.17

gemma-3-4b-it

INT4-MIXED

1024

701.4

66.2

6928.2

15.11

whisper-large-v3

FP16

1024

605.4

66.3

5200.1

15.08

llama-3.1-8b-instruct

INT4-MIXED

32

108.8

66.4

5640.1

15.06

deepseek-r1-distill-llama-8b

INT4-MIXED

32

108

66.5

5636.3

15.04

llama-3-8b-instruct

INT4-MIXED

32

112.8

66.6

5637.1

15.02

phi-3.5-vision-instruct

INT8-CW

802

611.7

66.6

6908.6

15.02

qwen2-7b-instruct

INT4-MIXED

1024

656.8

66.8

5562.2

14.97

mistral-7b-instruct-v0.1

INT4-MIXED

1025

998.7

67.4

5059.1

14.84

mistral-7b-instruct-v0.3

INT4-MIXED

1024

984.9

67.5

5183.8

14.81

mistral-7b-instruct-v0.2

INT4-MIXED

1024

1003.8

67.6

5090.7

14.79

minicpm-v-2_6

INT4-MIXED

228

911.7

67.6

6679.1

14.79

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

112.1

67.9

6095.5

14.73

minicpm-o-2_6

INT4-MIXED

238

910

68.1

6699.5

14.68

phi-3.5-vision-instruct

INT8-CW

1032

791.4

68.2

6580.4

14.66

gpt-j-6b

INT4-MIXED

1024

951.4

68.4

5791.6

14.62

qwen2.5-7b-instruct-1m

INT4-MIXED

32

113.3

68.4

6062

14.62

internvl2-4b

INT8-CW

1027

795

68.4

6426.4

14.62

qwen2.5-7b-instruct

INT4-MIXED

32

110.7

68.5

5993

14.60

gemma-3-4b-it

INT4-MIXED

32

194.9

68.6

4971.6

14.58

llama-3.1-8b-instruct

INT4-MIXED

1024

722.8

68.7

5710.2

14.56

minicpm4-8b

INT4-MIXED

32

110.5

68.7

5503.7

14.56

qwen2-7b-instruct

INT4-MIXED

32

91.6

69

5996

14.49

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

721.8

69.3

5707.8

14.43

llama-3-8b-instruct

INT4-MIXED

1024

727.5

69.3

5706.1

14.43

qwen3-8b

INT4-MIXED

32

146.4

69.3

5828.2

14.43

phi-4-multimodal-instruct

INT4-MIXED

578

689

69.4

6106

14.41

minicpm3-4b

INT4-MIXED

32

317.7

69.5

3764.5

14.39

phi-4-multimodal-instruct

INT4-MIXED

786

755.7

69.6

6478.7

14.37

minicpm4-8b

INT4-MIXED

1024

787.4

69.7

5498.9

14.35

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

900.3

69.9

5722.3

14.31

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

886.8

70

5734.2

14.29

qwen2-7b-instruct

INT4-MIXED

1024

896.1

70

5624.3

14.29

flan-t5-xxl

INT4-MIXED

1139

321.1

70.2

14929

14.25

qwen2.5-7b-instruct

INT4-MIXED

1024

905.2

70.3

5631.7

14.22

llama-3-8b-instruct

INT4-MIXED

32

122.7

70.5

6107.2

14.18

phi-4-multimodal-instruct

INT4-MIXED

1362

1590.6

70.6

8154

14.16

whisper-large-v3

FP16

32

551.2

70.7

5248.6

14.14

phi-4-multimodal-instruct

INT4-MIXED

1570

1790

70.9

8750.6

14.10

falcon-7b-instruct

INT4-MIXED

32

119

71.5

5468

13.99

qwen3-8b

INT4-MIXED

1024

773.6

71.7

5889.9

13.95

minicpm-v-2_6

INT4-MIXED

228

992

71.9

6931.3

13.91

minicpm-o-2_6

INT4-MIXED

238

997.4

72.7

6860.1

13.76

falcon-7b-instruct

INT4-MIXED

1024

976.5

72.8

5105.8

13.74

llama-3-8b-instruct

INT4-MIXED

1024

990.2

73.2

5868.2

13.66

qwen3-8b

INT4-MIXED

32

136.5

73.4

6403.3

13.62

gemma-3-4b-it

INT4-MIXED

1024

589.2

73.9

6873.6

13.53

minicpm4-8b

INT4-MIXED

32

129.9

74

6026.1

13.51

minicpm-v-4_5

INT4-MIXED

217

987.9

74.3

7092.6

13.46

gemma-3-4b-it

INT8-CW

32

195.4

74.5

6371.2

13.42

minicpm4-8b

INT4-MIXED

1024

1056.3

74.7

5670.6

13.39

qwen3-8b

INT4-MIXED

1024

1038.2

75.9

6136.9

13.18

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

95.8

76.2

5917.6

13.12

minicpm3-4b

INT4-MIXED

32

360.4

76.4

3890.8

13.09

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

672

77.6

5651.2

12.89

minicpm-v-4_5

INT4-MIXED

217

1033.9

77.7

7334.1

12.87

minicpm3-4b

INT4-MIXED

32

321.5

78.2

4062.2

12.79

gemma-7b-it

INT4-MIXED

32

113.7

78.6

5804.4

12.72

minicpm3-4b

INT4-MIXED

1024

832.4

78.7

4366.5

12.71

gemma-3-4b-it

INT8-CW

1024

607.1

79.7

8300

12.55

biomistral-7b-slerp

INT8-CW

7

110.1

80.5

8559.4

12.42

llama-2-7b-chat-hf

INT4-MIXED

32

114.8

81.6

5220.3

12.25

gemma-7b-it

INT4-MIXED

1024

854.1

82

6099.7

12.20

minicpm3-4b

INT8-CW

32

346.1

82.2

5664.8

12.17

llava-next-video-7b-hf

INT4-MIXED

2945

3976.6

83.2

8722.1

12.02

deepseek-r1-distill-llama-8b

INT4-MIXED

32

133.8

83.2

7029.4

12.02

qwen2.5-vl-7b-instruct

INT4-MIXED

32

355.2

83.3

6747

12.00

gemma-7b-it

INT4-MIXED

32

122.3

83.4

6492.6

11.99

phi-4-multimodal-instruct

INT8-CW

786

797.7

83.7

8287.3

11.95

phi-4-multimodal-instruct

INT8-CW

578

711.3

83.7

7914.3

11.95

llama-3.1-8b-instruct

INT4-MIXED

32

145.3

83.9

7063.8

11.92

mistral-7b-instruct-v0.2

INT4-MIXED

32

99.9

84.2

5326.9

11.88

qwen2.5-7b-instruct

INT4-MIXED

32

90.9

84.2

5831.2

11.88

mistral-7b-instruct-v0.3

INT4-MIXED

32

105

84.3

5431.5

11.86

phi-4-multimodal-instruct

INT8-CW

1362

1655.9

84.6

10029.6

11.82

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

904.2

84.7

7499.3

11.81

phi-4-multimodal-instruct

INT4-MIXED

786

960.4

85

7092.1

11.76

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

926.6

85.2

6609.2

11.74

phi-4-multimodal-instruct

INT8-CW

1570

1853.7

85.2

10678

11.74

glm-4-9b-chat-hf

INT4-MIXED

32

288.7

85.5

6469.5

11.70

llama-3.1-8b-instruct

INT4-MIXED

1024

952.7

85.7

6557.9

11.67

phi-4-multimodal-instruct

INT4-MIXED

578

783.1

85.8

6684.5

11.66

glm-4-9b-chat-hf

INT4-MIXED

1024

952.9

86.1

6706.6

11.61

llama-2-7b-chat-hf

INT4-MIXED

1024

700.5

86.2

5083.9

11.60

qwen2.5-7b-instruct

INT4-MIXED

1024

650.9

86.2

5441.7

11.60

llava-next-video-7b-hf

INT4-MIXED

2945

4458.6

86.4

9034.3

11.57

phi-4-multimodal-instruct

INT4-MIXED

1570

2150.6

86.7

9445.3

11.53

phi-4-multimodal-instruct

INT4-MIXED

1362

1946

86.7

8733.7

11.53

minicpm-v-2_6

INT4-MIXED

228

878.4

86.7

6772.8

11.53

minicpm3-4b

INT4-MIXED

1024

944.6

86.8

4420.1

11.52

mistral-7b-instruct-v0.2

INT4-MIXED

1024

715.1

86.9

4909.6

11.51

minicpm3-4b

INT4-MIXED

1024

859.7

87.1

4331.2

11.48

mistral-7b-instruct-v0.3

INT4-MIXED

1024

716

87.1

4908

11.48

gemma-7b-it

INT4-MIXED

1024

1126.4

87.5

6346.7

11.43

qwen2.5-vl-7b-instruct

INT4-MIXED

32

351.1

87.6

6938.1

11.42

glm-4-9b-chat-hf

INT4-MIXED

32

293.3

87.7

7160

11.40

gemma-2-9b-it

INT4-MIXED

32

166.7

87.9

6266.3

11.38

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

1113.9

89.1

7658.6

11.22

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

90.4

89.1

6910.7

11.22

llama-3-8b-instruct

INT4-MIXED

32

112.5

89.3

6208.5

11.20

llama-3.1-8b-instruct

INT4-MIXED

32

109.5

89.4

5976.8

11.19

llama-3-8b-instruct

INT4-MIXED

32

110.5

89.9

6199.6

11.12

phi-2

FP16

32

127.7

90

7000.7

11.11

minicpm3-4b

INT8-CW

1024

892.6

90.3

6200.6

11.07

glm-4-9b-chat-hf

INT4-MIXED

1024

1273

91.4

7041.1

10.94

stablelm-3b-4e1t

FP16

32

136

91.7

6671.3

10.91

stable-zephyr-3b-dpo

FP16

32

148.2

92

7007.2

10.87

gemma-2-9b-it

INT4-MIXED

1024

982.1

92

6411.7

10.87

llama-3-8b-instruct

INT4-MIXED

1024

724.3

92.5

5694.5

10.81

llama-3.1-8b-instruct

INT4-MIXED

1024

721.3

92.5

5689.5

10.81

llama-3-8b-instruct

INT4-MIXED

1024

724.7

92.7

5695.4

10.79

baichuan2-13b-chat

INT4-MIXED

32

140.4

92.7

9190.6

10.79

gemma-2-9b-it

INT4-MIXED

32

173.3

93

7051.7

10.75

gpt-j-6b

INT8-CW

32

177.4

93

7473.9

10.75

minicpm4-8b

INT4-MIXED

32

107

93.9

5970.6

10.65

phi-2

FP16

1024

605.6

94.5

6967.8

10.58

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

96.3

94.8

6988.7

10.55

minicpm4-8b

INT4-MIXED

1024

791.8

96.1

5588

10.41

stable-zephyr-3b-dpo

FP16

1024

623.4

96.3

6975.3

10.38

stablelm-3b-4e1t

FP16

1024

616.7

96.3

6882.2

10.38

gpt-j-6b

INT8-CW

1024

749.8

96.7

7964

10.34

gemma-2-9b-it

INT4-MIXED

1024

1282.7

96.8

6753.7

10.33

llama-2-7b-chat-hf

INT8-CW

32

133.7

97.8

8213.7

10.22

baichuan2-13b-chat

INT4-MIXED

1024

2432.6

99.8

12861.6

10.02

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

t5-small

FP16

1024

10.1

5.2

1241.1

192.31

t5-small

INT4-MIXED

1024

11.5

5.2

1083.7

192.31

t5-small

INT4-MIXED

1024

11.8

5.3

1071.5

188.68

t5-small

INT8-CW

1024

12

5.3

1120.9

188.68

t5-small

INT4-MIXED

1024

11.8

5.4

1086.6

185.19

t5-small

INT4-MIXED

32

10.7

5.5

933.7

181.82

t5-small

INT4-MIXED

32

11.2

5.6

946.9

178.57

t5-small

INT4-MIXED

32

11

5.6

922.1

178.57

t5-small

INT8-CW

32

11.4

5.6

971.3

178.57

t5-small

FP16

32

10.3

6

1090.7

166.67

distil-large-v2

INT4-MIXED

1024

210.2

6

1573.9

166.67

distil-large-v2

INT4-MIXED

32

163.7

6

1542.2

166.67

whisper-large-v3-turbo

INT8-CW

1024

222.8

6.3

1945.7

158.73

whisper-large-v3-turbo

INT4-MIXED

1024

223.2

6.4

1672.6

156.25

whisper-large-v3-turbo

INT4-MIXED

32

176

6.4

1640.5

156.25

whisper-large-v3-turbo

INT8-CW

32

175.1

6.4

1913.2

156.25

minicpm4-0.5b

INT4-MIXED

32

28.2

7.2

1075.4

138.89

gemma-3-270m

INT4-MIXED

32

28.2

7.3

1130.4

136.99

minicpm4-0.5b

INT4-MIXED

32

28

7.3

1172.6

136.99

distil-large-v2

INT8-CW

1024

217.9

7.3

1829.7

136.99

minicpm4-0.5b

INT4-MIXED

1024

44.1

7.4

1127.7

135.14

gemma-3-270m

INT8-CW

32

30.4

7.4

1164.5

135.14

gemma-3-270m

INT4-MIXED

1024

33.6

7.5

1209

133.33

minicpm4-0.5b

INT4-MIXED

1024

47

7.5

1200.7

133.33

gemma-3-270m

INT8-CW

1024

34.6

7.5

1224.4

133.33

distil-large-v2

INT8-CW

32

168.2

7.5

1797.2

133.33

minicpm4-0.5b

INT8-CW

32

29.8

7.6

1338.9

131.58

minicpm4-0.5b

INT4-MIXED

32

29.4

7.7

1207.3

129.87

minicpm4-0.5b

INT4-MIXED

1024

52.9

7.9

1260.4

126.58

minicpm4-0.5b

INT8-CW

1024

51.6

7.9

1387.5

126.58

distil-large-v2

FP16

1024

207.7

8.2

2612.5

121.95

whisper-large-v3-turbo

FP16

1024

219.6

8.2

2647.9

121.95

distil-large-v2

FP16

32

161.4

8.2

2580.9

121.95

whisper-large-v3-turbo

FP16

32

170.7

8.2

2617.9

121.95

gemma-3-270m

FP16

32

24.6

8.3

1406.2

120.48

gemma-3-270m

FP16

1024

30.5

8.4

1496.9

119.05

llama-3.2-1b-instruct

INT4-MIXED

32

20.7

9.4

1565.7

106.38

llama-3.2-1b-instruct

INT4-MIXED

32

23.1

9.6

1585.4

104.17

llama-3.2-1b-instruct

INT4-MIXED

1024

83.9

10

1657.4

100

llama-3.2-1b-instruct

INT4-MIXED

1024

94.9

10.1

1687.6

99.01

whisper-small

INT4-MIXED

32

93.3

10.2

1342.4

98.04

whisper-small

INT4-MIXED

1024

138.1

10.4

1374.8

96.15

whisper-small

INT4-MIXED

1024

136.8

10.5

1390.8

95.24

whisper-small

INT8-CW

1024

141.5

10.6

1481

94.34

whisper-small

FP16

1024

128.8

10.8

1625.5

92.59

whisper-small

FP16

32

86

10.8

1593.2

92.59

whisper-small

INT4-MIXED

32

96.1

10.8

1359.1

92.59

whisper-small

INT4-MIXED

1024

141.2

10.9

1424.6

91.74

whisper-small

INT8-CW

32

97.2

10.9

1448.7

91.74

whisper-small

INT4-MIXED

32

96.8

11.2

1394.3

89.29

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

35.1

11.2

1921.3

89.29

qwen2.5-1.5b-instruct

INT4-MIXED

32

35.1

11.2

1787.9

89.29

gemma-3-1b-it

INT4-MIXED

32

37.1

11.3

1552.3

88.5

minicpm4-0.5b

FP16

32

25.2

11.4

1740.4

87.72

qwen2.5-1.5b-instruct

INT4-MIXED

32

33.5

11.5

1747.7

86.96

gemma-3-1b-it

INT4-MIXED

1024

83

11.6

1658

86.21

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

33.8

11.6

2062.1

86.21

gemma-3-1b-it

INT4-MIXED

32

40.5

11.7

1689.6

85.47

minicpm4-0.5b

FP16

1024

56.6

11.8

1787.2

84.75

qwen2.5-1.5b-instruct

INT4-MIXED

1024

123.8

11.9

1879.8

84.03

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

123

12

2009.9

83.33

gemma-3-1b-it

INT4-MIXED

1024

90.7

12

1800.9

83.33

qwen2.5-1.5b-instruct

INT4-MIXED

1024

134.5

12.2

1901.6

81.97

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

134.8

12.3

2221.6

81.3

qwen2.5-1.5b-instruct

INT4-MIXED

32

36.5

12.5

1858.5

80

qwen2.5-1.5b-instruct

INT4-MIXED

1024

141.2

13.2

2023.2

75.76

gemma-3-1b-it

INT8-CW

32

40

13.5

1890.8

74.07

gemma-3-1b-it

INT8-CW

1024

93.7

14.3

1981.7

69.93

llama-3.2-1b-instruct

INT8-CW

32

22.9

14.4

2083.3

69.44

smolvlm2-256m-video-instruct

INT4-MIXED

1141

385.3

14.7

2877.5

68.03

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

37.2

14.9

2405

67.11

llama-3.2-1b-instruct

INT8-CW

1024

101

15.2

2199.2

65.79

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

150

15.6

2542

64.1

smolvlm2-256m-video-instruct

FP16

1141

376.9

16

3507.1

62.5

smolvlm2-256m-video-instruct

INT8-CW

1141

385.3

17.1

2955.8

58.48

phi-2

INT4-MIXED

32

51.1

17.6

2380.5

56.82

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

37.6

17.8

2557.2

56.18

qwen2.5-1.5b-instruct

INT8-CW

32

38

17.8

2333

56.18

stable-zephyr-3b-dpo

INT4-MIXED

32

51

17.9

2373.7

55.87

stablelm-3b-4e1t

INT4-MIXED

32

50.9

17.9

2374.2

55.87

qwen2.5-1.5b-instruct

INT8-CW

32

37.5

18.1

2326.4

55.25

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

129

18.6

2652.9

53.76

qwen2.5-1.5b-instruct

INT8-CW

1024

129.5

18.6

2426.7

53.76

qwen2.5-1.5b-instruct

INT8-CW

1024

130.6

19

2429.7

52.63

phi-2

INT4-MIXED

32

53.1

19.4

2637.3

51.55

phi-2

INT4-MIXED

1024

269.4

19.8

2915.8

50.51

qwen2.5-coder-3b-instruct

INT4-MIXED

32

43.9

19.8

2499.2

50.51

stable-zephyr-3b-dpo

INT4-MIXED

1024

263.6

19.9

2892.7

50.25

stablelm-3b-4e1t

INT4-MIXED

1024

264.6

20

2893.3

50

stable-zephyr-3b-dpo

INT4-MIXED

32

59.1

20.3

2648.9

49.26

qwen2.5-coder-3b-instruct

INT4-MIXED

32

44.7

20.4

2593.3

49.02

llama-3.2-3b-instruct

INT4-MIXED

32

35.4

20.9

2688.6

47.85

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

191.1

21.3

2630.8

46.95

phi-2

INT4-MIXED

1024

334.4

21.5

3167.1

46.51

llama-3.2-3b-instruct

INT4-MIXED

32

36

21.6

2682.9

46.3

qwen2.5-coder-3b-instruct

INT4-MIXED

32

49.9

21.6

2863.6

46.3

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

203.2

21.7

2729.2

46.08

llama-3.2-3b-instruct

INT4-MIXED

32

37.6

21.8

2825.1

45.87

stablelm-3b-4e1t

INT4-MIXED

32

59.2

21.9

2768.3

45.66

llama-3.2-3b-instruct

INT4-MIXED

1024

204.6

22.4

2922.2

44.64

stable-zephyr-3b-dpo

INT4-MIXED

1024

303.2

22.5

3163.8

44.44

phi-3-mini-128k-instruct

INT4-MIXED

32

43

22.6

2924.7

44.25

phi-3-mini-4k-instruct

INT4-MIXED

32

40.2

22.6

2826.3

44.25

phi-3.5-mini-instruct

INT4-MIXED

32

40.6

22.6

2826

44.25

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

347.2

23

2992.1

43.48

llama-3.2-3b-instruct

INT4-MIXED

1024

214.6

23.1

2892

43.29

gemma-3-1b-it

FP16

32

38.6

23.3

2804

42.92

llama-3.2-3b-instruct

INT4-MIXED

1024

243.3

23.4

3045.1

42.74

phi-3-mini-4k-instruct

INT4-MIXED

32

39.3

23.4

3019

42.74

phi-3.5-mini-instruct

INT4-MIXED

32

39.7

23.4

2934.4

42.74

gemma-3-1b-it

FP16

1024

101.9

23.8

2955.3

42.02

phi-3-mini-128k-instruct

INT4-MIXED

32

40.9

23.8

3095.1

42.02

stablelm-3b-4e1t

INT4-MIXED

1024

293.3

24.2

3302.8

41.32

qwen2.5-1.5b-instruct

INT8-CW

32

77.9

24.3

2772.6

41.15

qwen2.5-1.5b-instruct

INT8-CW

1024

179

25.2

2905.3

39.68

phi-3-mini-128k-instruct

INT4-MIXED

1024

312.2

25.3

3555.4

39.53

phi-3-mini-4k-instruct

INT4-MIXED

1024

313.9

25.4

3456.9

39.37

phi-3.5-mini-instruct

INT4-MIXED

1024

313.2

25.4

3456.4

39.37

phi-3-mini-4k-instruct

INT4-MIXED

32

40.6

25.4

3249.5

39.37

phi-3.5-mini-instruct

INT4-MIXED

32

44.4

25.6

3258.6

39.06

phi-3-mini-4k-instruct

INT4-MIXED

1024

321.1

26

3630.2

38.46

phi-3.5-mini-instruct

INT4-MIXED

1024

320.3

26.2

3568

38.17

internvl2-4b

INT4-MIXED

297

257.2

26.5

4284.6

37.74

phi-3-mini-128k-instruct

INT4-MIXED

1024

410.7

26.6

3710.3

37.59

phi-4-mini-reasoning

INT4-MIXED

32

56.1

26.8

3033.1

37.31

afm-4.5b

INT4-MIXED

32

45.8

26.9

3687.3

37.17

phi-4-mini-instruct

INT4-MIXED

32

54.6

26.9

3132.7

37.17

phi-4-mini-instruct

INT4-MIXED

32

56

27.5

3217

36.36

phi-4-mini-reasoning

INT4-MIXED

32

54.8

27.6

3214.8

36.23

internvl2-4b

INT4-MIXED

297

255.6

27.8

4409.4

35.97

phi-4-mini-reasoning

INT4-MIXED

32

57

27.8

3273.6

35.97

phi-3-mini-4k-instruct

INT4-MIXED

1024

446.2

28.2

3864.6

35.46

phi-3.5-mini-instruct

INT4-MIXED

1024

475.6

28.2

3892.1

35.46

afm-4.5b

INT4-MIXED

1024

318.2

28.5

3878.3

35.09

phi-4-mini-reasoning

INT4-MIXED

1024

321.8

28.6

3314.2

34.97

phi-4-mini-instruct

INT4-MIXED

1024

319.6

28.8

3413.5

34.72

whisper-large-v3

INT4-MIXED

32

281.9

29

2986.4

34.48

gemma-3-4b-it

INT4-MIXED

32

91.1

29.2

4496.5

34.25

phi-4-mini-instruct

INT4-MIXED

1024

327

29.3

3503.6

34.13

phi-4-mini-reasoning

INT4-MIXED

1024

328

29.3

3499.2

34.13

internvl2-4b

INT4-MIXED

1027

611.1

29.4

5603.2

34.01

phi-4-mini-instruct

INT4-MIXED

32

62.4

29.4

3431.4

34.01

llama-3.2-1b-instruct

FP16

32

34.3

29.5

3157.9

33.9

phi-4-mini-reasoning

INT4-MIXED

1024

352.5

29.6

3543.1

33.78

whisper-large-v3

INT4-MIXED

1024

331.9

29.8

3018.8

33.56

llama-3.2-1b-instruct

FP16

1024

129.4

30.1

3282.3

33.22

phi-3.5-vision-instruct

INT4-MIXED

802

525.8

30.1

5356

33.22

gemma-3-4b-it

INT4-MIXED

32

91.4

30.2

4559

33.11

gemma-3-4b-it

INT4-MIXED

32

94.7

30.4

4648.3

32.89

whisper-large-v3

INT8-CW

32

285.1

30.4

3575.6

32.89

internvl2-4b

INT4-MIXED

1027

619.4

30.7

5564.1

32.57

phi-2

INT8-CW

32

57.3

30.7

3590.7

32.57

stable-zephyr-3b-dpo

INT8-CW

32

57.3

30.8

3598.6

32.47

stablelm-3b-4e1t

INT8-CW

32

56.8

30.8

3598

32.47

phi-3.5-vision-instruct

INT4-MIXED

1032

616.4

30.9

5747.6

32.36

gemma-3-4b-it

INT4-MIXED

1024

418.3

30.9

6833.3

32.36

whisper-large-v3

INT8-CW

1024

335.6

31

3608.1

32.26

phi-4-mini-instruct

INT4-MIXED

1024

412.1

31.2

3719.6

32.05

gemma-3-4b-it

INT4-MIXED

1024

433.9

31.9

6870.8

31.35

gemma-3-4b-it

INT4-MIXED

1024

456.1

32.2

6944.3

31.06

deepseek-r1-distill-qwen-1.5b

FP16

32

36.8

32.7

4206.2

30.58

qwen2.5-1.5b-instruct

FP16

32

37.4

32.7

3760.8

30.58

phi-2

INT8-CW

1024

295.4

33

4124.2

30.3

stable-zephyr-3b-dpo

INT8-CW

1024

300.6

33

4109.1

30.3

stablelm-3b-4e1t

INT8-CW

1024

299.4

33

4108.8

30.3

qwen2.5-coder-3b-instruct

INT8-CW

32

49.2

33

3938.6

30.3

deepseek-r1-distill-qwen-1.5b

FP16

1024

142.9

33.4

4330.3

29.94

qwen2.5-1.5b-instruct

FP16

1024

142.3

33.4

3902.2

29.94

gpt-j-6b

INT4-MIXED

32

68.8

34.3

4086.5

29.15

qwen2.5-coder-3b-instruct

INT8-CW

1024

232.3

34.5

4056.5

28.99

gpt-oss-20b

INT4-MIXED

32

205.1

34.8

13043.3

28.74

phi-4-multimodal-instruct

INT4-MIXED

578

488.3

35

5791.7

28.57

llama-2-7b-chat-hf

INT4-MIXED

32

44.6

35

4386.6

28.57

phi-4-multimodal-instruct

INT4-MIXED

786

583

35.2

6813

28.41

flan-t5-xxl

INT4-MIXED

33

58.5

35.7

12841.3

28.01

whisper-large-v3

FP16

32

264.3

35.8

4801.7

27.93

whisper-large-v3

FP16

1024

313.1

36

4831.8

27.78

llama-3.2-3b-instruct

INT8-CW

32

44.6

36.3

4041.3

27.55

llama-2-7b-chat-hf

INT4-MIXED

32

53.2

36.4

4482

27.47

gpt-oss-20b

INT4-MIXED

1024

617.9

36.5

13270.2

27.4

phi-4-multimodal-instruct

INT4-MIXED

1362

1215.4

36.7

8440

27.25

phi-4-multimodal-instruct

INT4-MIXED

1570

1355.5

36.9

9162.2

27.1

biomistral-7b-slerp

INT4-MIXED

7

40.9

37

4532.7

27.03

mistral-7b-instruct-v0.3

INT4-MIXED

32

48.6

37.1

4547.3

26.95

mistral-7b-instruct-v0.2

INT4-MIXED

32

48.7

37.5

4445.1

26.67

llama-3.2-3b-instruct

INT8-CW

1024

253.8

37.8

4271.3

26.46

gpt-j-6b

INT4-MIXED

1024

437.3

38

5244.8

26.32

minicpm3-4b

INT4-MIXED

32

166

38.3

3267.7

26.11

llama-2-7b-chat-hf

INT4-MIXED

1024

395.1

38.5

5103.6

25.97

mistral-7b-instruct-v0.3

INT4-MIXED

32

57.3

38.5

4656.4

25.97

falcon-7b-instruct

INT4-MIXED

32

51.1

38.6

4287.7

25.91

mistral-7b-instruct-v0.2

INT4-MIXED

32

56.4

38.6

4741.8

25.91

minicpm3-4b

INT4-MIXED

32

170.1

38.9

3440.6

25.71

biomistral-7b-slerp

INT4-MIXED

7

43.5

39.3

4837.6

25.45

mistral-7b-instruct-v0.3

INT4-MIXED

1024

409.5

39.4

4852.2

25.38

gpt-j-6b

INT4-MIXED

32

85.3

39.4

4591.2

25.38

mistral-7b-instruct-v0.2

INT4-MIXED

32

56.9

39.5

4762.1

25.32

mistral-7b-instruct-v0.3

INT4-MIXED

32

59

39.5

4860.3

25.32

mistral-7b-instruct-v0.1

INT4-MIXED

32

58.7

39.8

4855

25.13

flan-t5-xxl

INT4-MIXED

1139

182.2

40

14929.6

25

mistral-7b-instruct-v0.2

INT4-MIXED

1024

420.1

40

4750.5

25

minicpm3-4b

INT4-MIXED

1024

716.5

40.2

4449.7

24.88

falcon-7b-instruct

INT4-MIXED

1024

437.6

40.3

4435.2

24.81

llama-2-7b-chat-hf

INT4-MIXED

1024

421.6

40.4

5209.4

24.75

minicpm3-4b

INT4-MIXED

32

168.7

40.4

3554.8

24.75

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

52.1

40.5

5017.2

24.69

qwen2.5-7b-instruct

INT4-MIXED

32

52.3

40.7

5017.3

24.57

qwen2.5-7b-instruct-1m

INT4-MIXED

32

51.6

40.7

5017.7

24.57

minicpm3-4b

INT4-MIXED

1024

735.7

40.8

4643.9

24.51

deepseek-r1-distill-llama-8b

INT4-MIXED

32

52.6

40.9

5222.6

24.45

llama-3-8b-instruct

INT4-MIXED

32

56.3

40.9

5224.6

24.45

phi-4-multimodal-instruct

INT4-MIXED

578

552.2

41

6789.7

24.39

qwen2-7b-instruct

INT4-MIXED

32

51.3

41

5016.3

24.39

llama-3.1-8b-instruct

INT4-MIXED

32

55.4

41.1

5224.9

24.33

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

43.1

41.1

6711.7

24.33

phi-4-multimodal-instruct

INT4-MIXED

786

682

41.2

7263.8

24.27

mistral-7b-instruct-v0.3

INT4-MIXED

1024

432.9

41.6

4938.8

24.04

mistral-7b-instruct-v0.2

INT4-MIXED

1024

449.3

41.7

5024.4

23.98

qwen2.5-7b-instruct

INT4-MIXED

32

56.1

41.8

5213.4

23.92

mistral-7b-instruct-v0.2

INT4-MIXED

1024

507.1

41.9

5058.1

23.87

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

56.4

41.9

5213.8

23.87

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

391.5

42

5241.3

23.81

minicpm3-4b

INT4-MIXED

1024

764.2

42

4754.4

23.81

phi-3.5-mini-instruct

INT8-CW

32

55

42

4559.3

23.81

qwen2.5-7b-instruct

INT4-MIXED

1024

383.3

42.1

5239.5

23.75

phi-3-mini-128k-instruct

INT8-CW

32

55

42.1

4651.2

23.75

mistral-7b-instruct-v0.3

INT4-MIXED

1024

503.9

42.2

5154.9

23.7

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

390.3

42.2

5240.3

23.7

qwen2.5-7b-instruct-1m

INT4-MIXED

32

58.9

42.2

5417.2

23.7

phi-3-mini-4k-instruct

INT8-CW

32

55.1

42.2

4646.7

23.7

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

56.6

42.3

5422.3

23.64

qwen2-7b-instruct

INT4-MIXED

32

56.7

42.3

5330.2

23.64

llama-3.1-8b-instruct

INT4-MIXED

32

60.8

42.4

5519.2

23.58

qwen2.5-7b-instruct

INT4-MIXED

32

56.9

42.4

5419.4

23.58

llama-3-8b-instruct

INT4-MIXED

32

60.8

42.5

5427.2

23.53

llama-3-8b-instruct

INT4-MIXED

32

60

42.5

5426.2

23.53

qwen3-8b

INT4-MIXED

32

58

42.5

5415.5

23.53

phi-4-multimodal-instruct

INT4-MIXED

1570

1542.7

42.7

9587.1

23.42

qwen2-7b-instruct

INT4-MIXED

1024

388.5

42.7

5238.5

23.42

minicpm4-8b

INT4-MIXED

32

62.6

42.7

5166.6

23.42

phi-4-multimodal-instruct

INT4-MIXED

1362

1380.1

43

8862.7

23.26

gpt-j-6b

INT4-MIXED

1024

608.5

43

5754.2

23.26

falcon-7b-instruct

INT4-MIXED

32

59.4

43

5001.5

23.26

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

413.6

43.2

5529.3

23.15

mistral-7b-instruct-v0.1

INT4-MIXED

1025

552.4

43.3

5139.7

23.09

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

404.3

43.3

5414.2

23.09

qwen2.5-7b-instruct

INT4-MIXED

1024

410.4

43.3

5440.4

23.09

llama-3-8b-instruct

INT4-MIXED

32

61.3

43.3

5538.2

23.09

llama-3-8b-instruct

INT4-MIXED

1024

418.6

43.4

5529.9

23.04

minicpm-o-2_6

INT4-MIXED

238

622.9

43.4

6276.5

23.04

llama-3.1-8b-instruct

INT4-MIXED

1024

415.7

43.5

5530

22.99

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

477

43.8

5634.8

22.83

qwen2-7b-instruct

INT4-MIXED

1024

472.5

43.8

5542.9

22.83

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

469.8

43.9

5630.2

22.78

qwen2.5-7b-instruct

INT4-MIXED

1024

500

44

5632.1

22.73

phi-4-mini-instruct

INT8-CW

32

63.2

44

4667.7

22.73

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

44.3

44.2

7286

22.62

minicpm-v-2_6

INT4-MIXED

228

629

44.4

6264.2

22.52

phi-4-mini-reasoning

INT8-CW

32

62.1

44.4

4580.8

22.52

minicpm4-8b

INT4-MIXED

32

69

44.7

5379.5

22.37

phi-3.5-mini-instruct

INT8-CW

1024

431.5

44.7

5178

22.37

minicpm4-8b

INT4-MIXED

1024

455.2

44.8

5343

22.32

minicpm-v-2_6

INT4-MIXED

228

639.4

44.8

6443.9

22.32

qwen3-8b

INT4-MIXED

32

64.4

44.8

5816.3

22.32

phi-3-mini-128k-instruct

INT8-CW

1024

430.6

44.9

5274.5

22.27

phi-3-mini-4k-instruct

INT8-CW

1024

432

45

5270.3

22.22

falcon-7b-instruct

INT4-MIXED

1024

528

45.2

5172.9

22.12

qwen3-8b

INT4-MIXED

1024

445.9

45.3

5724.8

22.08

minicpm-o-2_6

INT4-MIXED

238

639.6

45.4

6695.4

22.03

minicpm-v-2_6

INT4-MIXED

228

658.4

45.5

6561.8

21.98

gemma-3-4b-it

INT8-CW

32

98.9

45.5

6007.1

21.98

minicpm4-8b

INT4-MIXED

32

69.1

45.6

5512.7

21.93

afm-4.5b

INT8-CW

32

57.5

45.7

5297.5

21.88

phi-4-mini-instruct

INT8-CW

1024

376.2

45.8

4942.6

21.83

internvl2-4b

INT8-CW

297

251.2

46

5803.2

21.74

phi-4-mini-reasoning

INT8-CW

1024

375.8

46.1

4848.5

21.69

llama-3-8b-instruct

INT4-MIXED

1024

446

46.2

5737.6

21.65

minicpm-v-4_5

INT4-MIXED

217

647.9

46.2

6768.9

21.65

llama-3.1-8b-instruct

INT4-MIXED

1024

442.3

46.3

5829.4

21.6

llama-3-8b-instruct

INT4-MIXED

1024

439.7

46.8

5739.3

21.37

llama-3-8b-instruct

INT4-MIXED

1024

527.7

47

5834.6

21.28

gemma-3-4b-it

INT8-CW

1024

464.4

47.2

8348.9

21.19

qwen3-8b

INT4-MIXED

1024

537.1

47.5

6111.9

21.05

afm-4.5b

INT8-CW

1024

326.4

47.5

5483

21.05

phi-3.5-vision-instruct

INT8-CW

802

532

47.8

6689.2

20.92

minicpm4-8b

INT4-MIXED

1024

504.6

48.4

5532.1

20.66

phi-3.5-vision-instruct

INT8-CW

1032

634.5

48.7

6969.2

20.53

minicpm4-8b

INT4-MIXED

1024

569.1

48.8

5676.8

20.49

internvl2-4b

INT8-CW

1027

646.8

48.8

6905.5

20.49

minicpm-v-4_5

INT4-MIXED

217

662.3

49

7055.3

20.41

gemma-7b-it

INT4-MIXED

32

70.1

49.1

5455.1

20.37

glm-4-9b-chat-hf

INT4-MIXED

32

110.9

50.7

6147.9

19.72

qwen2.5-vl-7b-instruct

INT4-MIXED

32

182.9

51

6161.5

19.61

gemma-7b-it

INT4-MIXED

32

75

51.5

5890.3

19.42

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

650.6

52.2

7559.2

19.16

glm-4-9b-chat-hf

INT4-MIXED

32

115.8

52.9

6278.6

18.9

llava-next-video-7b-hf

INT4-MIXED

2945

3044.9

53.5

8946.5

18.69

glm-4-9b-chat-hf

INT4-MIXED

32

114.4

53.8

6507.7

18.59

qwen2.5-vl-7b-instruct

INT4-MIXED

32

188.1

54.1

6460.2

18.48

gemma-7b-it

INT4-MIXED

1024

523.7

54.2

6170.3

18.45

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

737.6

54.6

7727.8

18.32

deepseek-r1-distill-llama-8b

INT4-MIXED

32

75.4

54.8

6507.3

18.25

llama-3.1-8b-instruct

INT4-MIXED

32

75.2

54.8

6504

18.25

phi-4-multimodal-instruct

INT8-CW

578

533.9

54.8

7575.7

18.25

minicpm3-4b

INT8-CW

32

184.2

55

5258.3

18.18

phi-4-multimodal-instruct

INT8-CW

786

641.3

55.1

8485.8

18.15

gemma-2-9b-it

INT4-MIXED

32

71.9

55.3

5914.8

18.08

phi-2

FP16

32

67.2

55.5

6239.6

18.02

phi-4-multimodal-instruct

INT8-CW

1362

1311.7

56.2

10174.5

17.79

stablelm-3b-4e1t

FP16

32

67.5

56.3

6256.7

17.76

stable-zephyr-3b-dpo

FP16

32

66.4

56.4

6267.4

17.73

glm-4-9b-chat-hf

INT4-MIXED

1024

620.7

56.4

6729.9

17.73

llava-next-video-7b-hf

INT4-MIXED

2945

3257.8

56.5

9086.1

17.7

phi-4-multimodal-instruct

INT8-CW

1570

1466.7

56.5

10906.7

17.7

gemma-7b-it

INT4-MIXED

1024

748

57.4

6593.8

17.42

glm-4-9b-chat-hf

INT4-MIXED

1024

804.7

57.4

6867.4

17.42

glm-4-9b-chat-hf

INT4-MIXED

1024

888.6

58.4

7093.1

17.12

gemma-2-9b-it

INT4-MIXED

32

79.1

58.7

6155.2

17.04

gemma-2-9b-it

INT4-MIXED

32

79.6

58.8

6397.3

17.01

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

602.5

58.9

6810.2

16.98

llama-3.1-8b-instruct

INT4-MIXED

1024

619.5

58.9

6806.8

16.98

phi-2

FP16

1024

395.5

60.1

6981.6

16.64

stable-zephyr-3b-dpo

FP16

1024

347.4

60.1

6991.6

16.64

stablelm-3b-4e1t

FP16

1024

342.9

60.3

7010.2

16.58

minicpm3-4b

INT8-CW

1024

768.7

60.8

6424.3

16.45

gemma-2-9b-it

INT4-MIXED

1024

703.7

61.8

6523.4

16.18

stable-diffusion-xl-1.0-inpainting-0.1

FP16

32

64.8

63

9176.7

15.87

gemma-2-9b-it

INT4-MIXED

1024

808.1

64.6

7013.1

15.48

gemma-2-9b-it

INT4-MIXED

1024

780.9

65

6773.2

15.38

llama-3.2-3b-instruct

FP16

32

74

65.8

6979.5

15.2

llama-3.2-3b-instruct

FP16

1024

362.3

68.6

7308.8

14.58

llama-2-13b-chat-hf

INT4-MIXED

32

95.7

69.2

7415.2

14.45

qwen2.5-coder-3b-instruct

FP16

32

78.3

72

6728.3

13.89

flan-t5-xxl

INT8-CW

33

180.7

72.9

22597.6

13.72

qwen2.5-coder-3b-instruct

FP16

1024

284.5

73.3

6902.6

13.64

gemma-3-12b-it

INT4-MIXED

32

149.3

73.5

8987.5

13.61

falcon-7b-instruct

INT8-CW

32

91.3

73.6

7547.6

13.59

qwen2.5-7b-instruct

INT8-CW

32

95.3

74.3

8139.5

13.46

qwen2.5-7b-instruct-1m

INT8-CW

32

94.9

74.3

8139.8

13.46

deepseek-r1-distill-qwen-7b

INT8-CW

32

95.8

74.4

8228.3

13.44

qwen2-7b-instruct

INT8-CW

32

96.8

74.4

8140.1

13.44

gpt-j-6b

INT8-CW

32

94.8

74.7

6886.4

13.39

ltx-video

INT4-MIXED

11

75.4

74.8

6586.6

13.37

qwen2.5-7b-instruct

INT8-CW

32

95.9

74.8

8229.7

13.37

llama-2-13b-chat-hf

INT4-MIXED

1024

992.4

75.1

8509.7

13.32

llama-2-7b-chat-hf

INT8-CW

32

90.2

75.4

7490.1

13.26

phi-3-mini-4k-instruct

FP16

32

86.3

76.3

8325.4

13.11

falcon-7b-instruct

INT8-CW

1024

819.8

76.4

7692.2

13.09

qwen2.5-7b-instruct-1m

INT8-CW

1024

576.6

76.4

8355.9

13.09

deepseek-r1-distill-qwen-7b

INT8-CW

1024

584.8

76.5

8447.6

13.07

qwen2.5-7b-instruct

INT8-CW

1024

596.2

76.5

8353.7

13.07

minicpm-o-2_6

INT8-CW

238

632.3

76.6

9462.5

13.05

minicpm-v-2_6

INT8-CW

228

686.2

76.6

9359.2

13.05

qwen2-7b-instruct

INT8-CW

1024

579

76.7

8355.8

13.04

qwen2.5-7b-instruct

INT8-CW

1024

564.7

76.7

8447.7

13.04

phi-3-mini-128k-instruct

FP16

32

85.9

76.9

8342.4

13

phi-3.5-mini-instruct

FP16

32

86.8

77.1

8244.7

12.97

phi-4

INT4-MIXED

32

112.7

77.1

8501.3

12.97

phi-4-reasoning

INT4-MIXED

32

112.1

77.1

8412.9

12.97

ltx-video

INT8-CW

11

77.8

77.1

9485.6

12.97

gemma-3-12b-it

INT4-MIXED

32

178.1

77.3

9319.5

12.94

gemma-3-12b-it

INT4-MIXED

1024

1159.1

77.7

11709.2

12.87

gpt-j-6b

INT8-CW

1024

601.5

78.2

8019.7

12.79

flan-t5-xxl

INT8-CW

1139

332.7

78.4

24785.2

12.76

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

106.8

78.8

8753.7

12.69

qwen1.5-14b-chat

INT4-MIXED

32

115.9

79.3

9158

12.61

phi-4-mini-instruct

FP16

32

92.3

79.7

8296.6

12.55

phi-4-mini-reasoning

FP16

32

91.6

79.7

8200.7

12.55

llama-2-7b-chat-hf

INT8-CW

1024

586

79.7

8192.9

12.55

baichuan2-13b-chat

INT4-MIXED

32

112.9

80.2

8899.9

12.47

mistral-7b-instruct-v0.3

INT8-CW

32

96.9

80.3

7790.5

12.45

mistral-7b-instruct-v0.1

INT8-CW

32

100.6

80.7

7788.6

12.39

internvl2-4b

FP16

297

323.6

80.8

9541.4

12.38

phi-3-mini-128k-instruct

FP16

1024

570.3

81

9213.8

12.35

phi-3-mini-4k-instruct

FP16

1024

576.1

81

9206.4

12.35

biomistral-7b-slerp

INT8-CW

7

88.1

81

7773.4

12.35

phi-4-reasoning

INT4-MIXED

1024

1100.5

81.1

8885.2

12.33

phi-4

INT4-MIXED

32

118.2

81.1

9114

12.33

phi-4

INT4-MIXED

1024

1108.5

81.2

8973.9

12.32

phi-3.5-mini-instruct

FP16

1024

564.6

81.4

9112.8

12.29

phi-4-mini-instruct

FP16

1024

555.4

81.7

8689.8

12.24

mistral-7b-instruct-v0.2

INT8-CW

32

97.3

81.7

7883.7

12.24

phi-3.5-vision-instruct

FP16

802

740.6

81.8

10290.1

12.22

gemma-3-12b-it

INT4-MIXED

1024

1244.9

81.8

11963.2

12.22

phi-4-mini-reasoning

FP16

1024

548.7

82.1

8597.9

12.18

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

115.1

82.9

9447.4

12.06

phi-3.5-vision-instruct

FP16

1032

924.7

83.1

10549.5

12.03

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

1022.6

83.1

9148.2

12.03

mistral-7b-instruct-v0.3

INT8-CW

1024

644.3

83.5

8096.8

11.98

llama-3-8b-instruct

INT8-CW

32

102.1

83.6

8564.3

11.96

internvl2-4b

FP16

1027

884.1

83.7

10393.5

11.95

qwen2.5-vl-7b-instruct

INT8-CW

32

215.2

83.8

9272.9

11.93

mistral-7b-instruct-v0.1

INT8-CW

1025

661.9

83.9

8080.1

11.92

deepseek-r1-distill-llama-8b

INT8-CW

32

101

84.3

8563.4

11.86

mistral-7b-instruct-v0.2

INT8-CW

1024

620

85.1

8181.9

11.75

llama-3.1-8b-instruct

INT8-CW

32

101.9

85.1

8564.2

11.75

phi-4

INT4-MIXED

1024

1282.1

85.3

9432.2

11.72

qwen2.5-vl-7b-instruct

INT8-CW

1024

771.8

85.3

10690.7

11.72

phi-4-reasoning

INT4-MIXED

32

124.2

85.8

9629.6

11.66

baichuan2-13b-chat

INT4-MIXED

1024

2744.9

85.9

13038.5

11.64

llama-3-8b-instruct

INT8-CW

1024

604.5

86.3

8868.6

11.59

qwen1.5-14b-chat

INT4-MIXED

1024

1154.1

86.4

10208.4

11.57

llama-2-13b-chat-hf

INT4-MIXED

32

121.4

86.9

9355.6

11.51

deepseek-r1-distill-llama-8b

INT8-CW

1024

611.9

87

8868.4

11.49

gemma-3-4b-it

FP16

32

130.6

87.8

10641.4

11.39

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

1187.6

88.2

9700.6

11.34

llama-3.1-8b-instruct

INT8-CW

1024

606.2

88.2

8870.4

11.34

lcm-dreamshaper-v7

INT8-HYBRID

1024

90.3

88.4

3746

11.31

lcm-dreamshaper-v7

INT8-HYBRID

32

90.8

88.4

3745.2

11.31

afm-4.5b

FP16

32

101.5

88.6

9661.4

11.29

gemma-3-4b-it

FP16

1024

612

89.5

13009.8

11.17

qwen3-8b

INT8-CW

32

105.1

89.9

8745.6

11.12

phi-4-reasoning

INT4-MIXED

1024

1359.7

90

9690.3

11.11

starcoder2-15b

INT4-MIXED

32

142.5

90.4

9573.1

11.06

lcm-dreamshaper-v7

INT8-CW

1024

93.9

90.5

4963.7

11.05

afm-4.5b

FP16

1024

503.6

90.6

9899.6

11.04

lcm-dreamshaper-v7

INT8-CW

32

92.5

90.7

4962.7

11.03

lcm-dreamshaper-v7

INT8-CW

1024

93.6

91.2

5210.7

10.96

lcm-dreamshaper-v7

INT8-CW

32

95.1

91.2

5209.2

10.96

lcm-dreamshaper-v7

FP16

1024

92.3

91.7

4875.5

10.91

lcm-dreamshaper-v7

FP16

32

92.4

91.7

4870.8

10.91

minicpm-v-4_5

INT8-CW

217

645.4

92.2

10004.9

10.85

llama-2-13b-chat-hf

INT4-MIXED

1024

1142.6

92.8

10186.6

10.78

gemma-7b-it

INT8-CW

32

118.8

92.9

9163.2

10.76

llava-next-video-7b-hf

INT8-CW

2945

3284.5

93

11912.2

10.75

qwen3-8b

INT8-CW

1024

657.3

93.3

9046.8

10.72

minicpm3-4b

FP16

32

173.1

93.7

9130.6

10.67

minicpm4-8b

INT8-CW

32

111.5

94.4

8797.4

10.59

starcoder2-15b

INT4-MIXED

1024

1447.5

94.5

9751.8

10.58

gemma-7b-it

INT8-CW

1024

762.4

96.8

9864.9

10.33

minicpm4-8b

INT8-CW

1024

674.7

97

8963.8

10.31

phi-4-multimodal-instruct

FP16

578

703.1

99.4

12472.6

10.06

phi-4-multimodal-instruct

FP16

786

845.9

99.6

13307

10.04

glm-4-9b-chat-hf

INT8-CW

32

138.7

99.6

9953.3

10.04

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

max rss memory

2nd token per sec

t5-small

FP16

1024

12.6

5.4

1447

185.19

t5-small

FP16

32

11.8

5.5

1312.9

181.82

t5-small

INT4-MIXED

1024

14.5

5.9

1315.2

169.49

t5-small

INT4-MIXED

1024

14.6

5.9

1310.9

169.49

t5-small

INT4-MIXED

32

13.1

5.9

1168.6

169.49

t5-small

INT4-MIXED

1024

14.7

6

1302.5

166.67

t5-small

INT4-MIXED

32

13.3

6

1154.1

166.67

t5-small

INT4-MIXED

32

13.5

6

1162.4

166.67

t5-small

INT8-CW

1024

15.5

6.2

1320.6

161.29

t5-small

INT8-CW

32

13.9

6.4

1170.8

156.25

minicpm4-0.5b

INT4-MIXED

32

27.4

7.1

1554.1

140.85

minicpm4-0.5b

INT4-MIXED

32

26.2

7.1

1513.2

140.85

minicpm4-0.5b

INT4-MIXED

32

25.3

7.1

1513.1

140.85

minicpm4-0.5b

INT4-MIXED

1024

173.6

7.3

1596.8

136.99

minicpm4-0.5b

INT4-MIXED

1024

167.1

7.3

1555.5

136.99

minicpm4-0.5b

INT4-MIXED

1024

167.3

7.5

1555.4

133.33

minicpm4-0.5b

INT4-MIXED

4097

920.9

7.6

1758.4

131.58

minicpm4-0.5b

INT4-MIXED

4097

898.8

7.6

1717.5

131.58

minicpm4-0.5b

INT4-MIXED

2049

390

7.6

1656.4

131.58

minicpm4-0.5b

INT4-MIXED

4097

910.6

7.7

1717

129.87

minicpm4-0.5b

INT4-MIXED

2049

372.9

7.7

1614.8

129.87

minicpm4-0.5b

INT4-MIXED

2049

382.8

7.8

1615.1

128.21

gemma-3-270m

INT4-MIXED

32

27.4

8.3

1727

120.48

gemma-3-270m

INT4-MIXED

4096

396.7

8.4

2028

119.05

gemma-3-270m

INT4-MIXED

2048

166.8

8.4

1833.6

119.05

gemma-3-270m

INT4-MIXED

1024

78.9

8.4

1763.5

119.05

gemma-3-270m

INT8-CW

32

27.4

10.4

1786.2

96.15

gemma-3-270m

INT8-CW

1024

87.3

10.8

1826.5

92.59

minicpm4-0.5b

INT8-CW

32

33.2

10.9

1835.3

91.74

gemma-3-270m

INT8-CW

4096

419.6

11.2

2101.6

89.29

gemma-3-270m

INT8-CW

2048

176.5

11.2

1903.4

89.29

minicpm4-0.5b

INT8-CW

1024

193.2

11.4

1905.7

87.72

minicpm4-0.5b

INT8-CW

4097

1132.7

11.8

2138.1

84.75

minicpm4-0.5b

INT8-CW

2049

487.1

11.8

1990.4

84.75

gemma-3-270m

FP16

32

30

11.9

1990.8

84.03

llama-3.2-1b-instruct

INT4-MIXED

32

33.1

12.3

2662.8

81.30

gemma-3-270m

FP16

1024

78.8

12.4

2086.2

80.65

llama-3.2-1b-instruct

INT4-MIXED

32

31.8

12.5

2690.2

80.00

gemma-3-270m

FP16

2048

162.7

12.7

2208.5

78.74

llama-3.2-1b-instruct

INT4-MIXED

1024

374

12.9

2758.1

77.52

llama-3.2-1b-instruct

INT4-MIXED

1024

379.3

13

2785.6

76.92

llama-3.2-1b-instruct

INT4-MIXED

2048

782.2

13.4

2920.1

74.63

llama-3.2-1b-instruct

INT4-MIXED

4096

1751.3

13.5

3078.8

74.07

llama-3.2-1b-instruct

INT4-MIXED

2048

770.9

13.5

2918.6

74.07

distil-large-v2

INT4-MIXED

prompt0

592.2

13.6

2160.2

73.53

gemma-3-1b-it

INT4-MIXED

32

41.4

13.6

2522.6

73.53

gemma-3-270m

FP16

4096

379.8

13.7

2375.1

72.99

llama-3.2-1b-instruct

INT4-MIXED

4096

1759.4

13.7

3079.4

72.99

gemma-3-1b-it

INT4-MIXED

32

39.5

13.8

2433.3

72.46

distil-large-v2

INT4-MIXED

prompt1

636.8

13.9

2193.5

71.94

gemma-3-1b-it

INT4-MIXED

1024

314.4

13.9

2605.3

71.94

minicpm4-0.5b

FP16

32

30.3

14.2

2713

70.42

gemma-3-1b-it

INT4-MIXED

1024

309

14.2

2517

70.42

whisper-small

FP16

prompt1

251

14.4

2074.9

69.44

gemma-3-1b-it

INT4-MIXED

4096

1378

14.4

3070.9

69.44

gemma-3-1b-it

INT4-MIXED

2048

644.5

14.5

2743.6

68.97

whisper-small

INT4-MIXED

prompt0

191.3

14.6

1670.1

68.49

whisper-small

FP16

prompt0

184

14.7

2041.4

68.03

whisper-small

INT4-MIXED

prompt1

249.2

14.7

1706.1

68.03

whisper-small

INT8-CW

prompt1

258.6

14.7

1816.9

68.03

minicpm4-0.5b

FP16

1024

208.5

14.8

2771.2

67.57

gemma-3-1b-it

INT4-MIXED

4096

1326.3

14.8

2979.8

67.57

whisper-small

INT4-MIXED

prompt0

193.8

14.9

1668.1

67.11

gemma-3-1b-it

INT4-MIXED

2048

623.1

14.9

2655.5

67.11

minicpm4-0.5b

FP16

4097

1554.8

15.1

2897.4

66.23

minicpm4-0.5b

FP16

2049

2371.1

15.1

2835.5

66.23

whisper-small

INT4-MIXED

prompt1

248.2

15.1

1738.4

66.23

whisper-small

INT4-MIXED

prompt0

195.4

15.1

1705.4

66.23

whisper-small

INT8-CW

prompt0

199.1

15.1

1782.7

66.23

whisper-large-v3-turbo

INT4-MIXED

prompt1

669.7

15.2

2295.6

65.79

whisper-large-v3-turbo

INT4-MIXED

prompt0

601.4

15.3

2262.8

65.36

whisper-small

INT4-MIXED

prompt1

261.9

15.4

1701.2

64.94

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

46.8

15.4

3047.6

64.94

qwen2.5-1.5b-instruct

INT4-MIXED

32

44.8

15.4

2879.6

64.94

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

50

15.5

3124.2

64.52

qwen2.5-1.5b-instruct

INT4-MIXED

32

49

15.5

2953.1

64.52

distil-large-v2

INT8-CW

prompt1

744.7

15.9

2664.8

62.89

distil-large-v2

INT8-CW

prompt0

686.3

15.9

2631

62.89

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

483.7

16

3112.5

62.50

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

487.2

16.1

3189.1

62.11

qwen2.5-1.5b-instruct

INT4-MIXED

1024

481

16.1

2962.1

62.11

qwen2.5-1.5b-instruct

INT4-MIXED

1024

478.5

16.1

3035.2

62.11

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

2048

1002

16.4

3131.3

60.98

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

2048

992.1

16.6

3161.8

60.24

qwen2.5-1.5b-instruct

INT4-MIXED

2048

985.3

16.6

3040.1

60.24

qwen2.5-1.5b-instruct

INT4-MIXED

2048

985.4

16.6

3113.4

60.24

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

4097

2190

16.8

3089.9

59.52

whisper-large-v3-turbo

INT8-CW

prompt1

781.2

16.8

2798.4

59.52

whisper-large-v3-turbo

INT8-CW

prompt0

704.3

16.8

2765.2

59.52

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

4097

2207.8

16.9

3119.9

59.17

qwen2.5-1.5b-instruct

INT4-MIXED

4096

2151.3

16.9

3143.3

59.17

qwen2.5-1.5b-instruct

INT4-MIXED

4096

2166.2

16.9

3169.3

59.17

qwen2.5-1.5b-instruct

INT4-MIXED

32

55.3

17.3

3209.2

57.80

qwen2.5-1.5b-instruct

INT4-MIXED

1024

503.6

17.9

3334.7

55.87

qwen2.5-1.5b-instruct

INT4-MIXED

2048

1025.7

18.3

3279

54.64

qwen2.5-1.5b-instruct

INT4-MIXED

4096

2266.5

18.7

3376.4

53.48

distil-large-v2

FP16

prompt1

837

19.6

2813.9

51.02

distil-large-v2

FP16

prompt0

773.2

19.6

2785.4

51.02

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

51.7

19.8

3658.9

50.51

llama-3.2-1b-instruct

INT8-CW

32

47

20.2

3512.6

49.50

whisper-large-v3-turbo

FP16

prompt1

839.8

20.4

3435.3

49.02

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

539.4

20.4

3571.7

49.02

llama-3.2-1b-instruct

INT8-CW

1024

432.3

20.8

3469.5

48.08

whisper-large-v3-turbo

FP16

prompt0

785.1

20.9

3548.1

47.85

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

2048

1085.1

20.9

3504.4

47.85

llama-3.2-1b-instruct

INT8-CW

2048

904.2

21.2

3427.6

47.17

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

4097

2747.4

21.3

3660.7

46.95

gemma-3-1b-it

INT8-CW

32

57.4

21.3

3072.7

46.95

llama-3.2-1b-instruct

INT8-CW

4096

2031.7

21.5

3137.4

46.51

gemma-3-1b-it

INT8-CW

1024

358.3

22

3169.9

45.45

gemma-3-1b-it

INT8-CW

4096

1518.6

22.6

3322.3

44.25

gemma-3-1b-it

INT8-CW

2048

697.6

22.7

3313.1

44.05

smolvlm2-256m-video-instruct

INT4-MIXED

1141

1779.7

22.8

2991.3

43.86

phi-2

INT4-MIXED

32

63.8

23.7

3715.7

42.19

stablelm-3b-4e1t

INT4-MIXED

32

77.9

24

3555.1

41.67

stable-zephyr-3b-dpo

INT4-MIXED

32

76.2

24.1

3680.2

41.49

phi-2

INT4-MIXED

32

70

24.4

3857.5

40.98

stable-zephyr-3b-dpo

INT4-MIXED

32

82.1

25.7

3926.3

38.91

phi-2

INT4-MIXED

1024

1022.7

26.3

3790.8

38.02

stablelm-3b-4e1t

INT4-MIXED

1024

998.8

26.7

3607.1

37.45

stable-zephyr-3b-dpo

INT4-MIXED

1024

1009.3

26.8

3731.5

37.31

phi-2

INT4-MIXED

1024

1075.8

26.9

3961

37.17

smolvlm2-256m-video-instruct

INT8-CW

1141

1799.3

26.9

3079.4

37.17

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

73.3

27.3

3600.3

36.63

qwen2.5-1.5b-instruct

INT8-CW

32

73.2

27.4

3835.2

36.50

stablelm-3b-4e1t

INT4-MIXED

32

70.7

28

3879.5

35.71

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

584.4

28

3482.2

35.71

qwen2.5-1.5b-instruct

INT8-CW

1024

575.7

28.2

3714.3

35.46

stable-zephyr-3b-dpo

INT4-MIXED

1024

1091

28.3

4033.8

35.34

llama-3.2-3b-instruct

INT4-MIXED

32

81.1

28.4

4160.3

35.21

qwen2.5-coder-3b-instruct

INT4-MIXED

32

83.3

28.4

3958.9

35.21

qwen2.5-coder-3b-instruct

INT4-MIXED

32

88.9

28.5

4076.6

35.09

deepseek-r1-distill-qwen-1.5b

INT8-CW

2048

1207.4

28.5

3603.7

35.09

llama-3.2-3b-instruct

INT4-MIXED

32

74.1

28.7

4288.5

34.84

qwen2.5-1.5b-instruct

INT8-CW

2048

1208.2

28.7

3630.8

34.84

deepseek-r1-distill-qwen-1.5b

INT8-CW

4097

3213.6

28.9

3465.1

34.60

llama-3.2-3b-instruct

INT4-MIXED

32

78.1

29

4419.6

34.48

qwen2.5-1.5b-instruct

INT8-CW

4096

2567.8

29

3400.7

34.48

stablelm-3b-4e1t

INT4-MIXED

2048

2439.9

29.3

3836.7

34.13

stable-zephyr-3b-dpo

INT4-MIXED

2076

2517.2

29.4

3962.2

34.01

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

967.3

29.6

3792.9

33.78

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

960.2

29.7

3933.8

33.67

qwen2.5-coder-3b-instruct

INT4-MIXED

32

86.1

29.7

4253.9

33.67

llama-3.2-3b-instruct

INT4-MIXED

1024

941.3

29.8

4050.6

33.56

llama-3.2-3b-instruct

INT4-MIXED

1024

955.6

30

4199.7

33.33

gemma-3-1b-it

FP16

32

55.9

30.3

4485.5

33.00

llama-3.2-3b-instruct

INT4-MIXED

1024

962

30.4

4318

32.89

phi-3.5-mini-instruct

INT4-MIXED

32

98.1

30.4

3793.9

32.89

phi-3-mini-128k-instruct

INT4-MIXED

32

100

30.5

4004.3

32.79

phi-3-mini-4k-instruct

INT4-MIXED

32

93.2

30.5

4025.5

32.79

phi-3.5-mini-instruct

INT4-MIXED

32

89.1

30.6

3997.4

32.68

stablelm-3b-4e1t

INT4-MIXED

1024

1090.9

30.7

4166.4

32.57

phi-3-mini-4k-instruct

INT4-MIXED

32

91.7

30.7

4191.5

32.57

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

959.5

30.8

4084.5

32.47

phi-3-mini-128k-instruct

INT4-MIXED

32

96.4

30.9

4083.7

32.36

gemma-3-1b-it

FP16

1024

383.8

31

4364.1

32.26

stable-zephyr-3b-dpo

INT4-MIXED

2076

2719.9

31

4150.2

32.26

llama-3.2-3b-instruct

INT4-MIXED

2048

1969.2

31.2

4132.8

32.05

llama-3.2-3b-instruct

INT4-MIXED

2048

2007.6

31.3

4175.5

31.95

gemma-3-1b-it

FP16

2048

761.7

31.6

4223.7

31.65

llama-3.2-3b-instruct

INT4-MIXED

2048

2042.6

31.7

4235.8

31.55

phi-3.5-mini-instruct

INT4-MIXED

32

96

32.1

4133.1

31.15

phi-3-mini-4k-instruct

INT4-MIXED

32

100

32.2

4148

31.06

gemma-3-1b-it

FP16

4096

1585.6

32.9

4100.9

30.40

llama-3.2-3b-instruct

INT4-MIXED

4096

4821.1

33

4043.3

30.30

llama-3.2-3b-instruct

INT4-MIXED

4096

4922.5

33.1

3963.9

30.21

stablelm-3b-4e1t

INT4-MIXED

2048

2552.6

33.2

4227.6

30.12

llama-3.2-3b-instruct

INT4-MIXED

4096

4998.5

33.4

4110.3

29.94

phi-3-mini-128k-instruct

INT4-MIXED

1024

1314.6

33.5

4252.9

29.85

phi-3.5-mini-instruct

INT4-MIXED

1024

1295.6

33.5

4136.7

29.85

internvl2-4b

INT4-MIXED

297

768

33.5

4638.9

29.85

phi-3-mini-4k-instruct

INT4-MIXED

1024

1309.2

33.6

4248.2

29.76

phi-3.5-mini-instruct

INT4-MIXED

1024

1328.5

33.6

4298.7

29.76

phi-3-mini-4k-instruct

INT4-MIXED

1024

1332.1

33.7

4313.3

29.67

llama-3.2-1b-instruct

FP16

32

51

33.9

4721.8

29.50

internvl2-4b

INT4-MIXED

297

770.1

34

4758.2

29.41

phi-3-mini-128k-instruct

INT4-MIXED

1024

1394.3

34.1

4400.6

29.33

stablelm-3b-4e1t

INT4-MIXED

4096

6399

34.6

4207.4

28.90

llama-3.2-1b-instruct

FP16

1024

518.6

34.7

4588.6

28.82

phi-3.5-mini-instruct

INT4-MIXED

1024

1412

35.2

4518.7

28.41

phi-3-mini-4k-instruct

INT4-MIXED

1024

1376.8

35.3

4489.6

28.33

phi-3.5-vision-instruct

INT4-MIXED

802

1990.8

35.4

5138.7

28.25

llama-3.2-1b-instruct

FP16

2048

1008.7

35.5

4202.8

28.17

internvl2-4b

INT4-MIXED

1027

1821.6

35.5

4614.3

28.17

phi-4-mini-instruct

INT4-MIXED

32

94.3

35.8

4678.1

27.93

phi-4-mini-reasoning

INT4-MIXED

32

90.8

35.8

4663.3

27.93

phi-3.5-vision-instruct

INT4-MIXED

1032

2367.9

35.9

4843.9

27.86

llama-3.2-1b-instruct

FP16

4096

2210.1

36

4276.6

27.78

internvl2-4b

INT4-MIXED

1027

1859.4

36

4752.7

27.78

phi-4-mini-reasoning

INT4-MIXED

32

90.7

36.1

4810.6

27.70

phi-4-mini-instruct

INT4-MIXED

32

93.5

36.2

4818.1

27.62

phi-4-mini-reasoning

INT4-MIXED

32

93.5

36.5

4998

27.40

phi-3-mini-4k-instruct

INT4-MIXED

2048

3027.4

36.6

4172.4

27.32

phi-3.5-mini-instruct

INT4-MIXED

2048

3034.8

36.6

4092.8

27.32

smolvlm2-256m-video-instruct

FP16

1141

1559.7

36.7

3683

27.25

phi-3-mini-4k-instruct

INT4-MIXED

2048

3076.2

36.7

4225.3

27.25

phi-3.5-mini-instruct

INT4-MIXED

2048

3080.2

36.7

4179.3

27.25

phi-4-mini-instruct

INT4-MIXED

1024

1107.7

37.2

4679.9

26.88

phi-4-mini-reasoning

INT4-MIXED

1024

1109.1

37.3

4663.9

26.81

afm-4.5b

INT4-MIXED

32

108.5

37.3

4699.2

26.81

phi-4-mini-instruct

INT4-MIXED

32

86.4

37.6

5075.3

26.60

phi-4-mini-instruct

INT4-MIXED

1024

1126.9

37.7

4730.9

26.53

phi-4-mini-reasoning

INT4-MIXED

1024

1133.1

37.7

4735.9

26.53

phi-4-mini-reasoning

INT4-MIXED

1024

1158.7

38

4893.5

26.32

phi-3.5-mini-instruct

INT4-MIXED

2048

3270.2

38.2

4615.1

26.18

stablelm-3b-4e1t

INT4-MIXED

4096

6583.4

38.3

4965.4

26.11

internvl2-4b

INT4-MIXED

2051

3624.3

38.3

5584.2

26.11

phi-3-mini-4k-instruct

INT4-MIXED

2048

3193.1

38.3

4409.5

26.11

afm-4.5b

INT4-MIXED

1024

1402.7

38.5

4577.7

25.97

phi-3.5-vision-instruct

INT4-MIXED

2056

4151.7

38.6

5847.1

25.91

phi-4-mini-instruct

INT4-MIXED

2048

2354.3

38.7

4332.9

25.84

internvl2-4b

INT4-MIXED

2051

3681.3

38.8

5718.9

25.77

phi-4-mini-reasoning

INT4-MIXED

1986

2337.2

38.8

4338.9

25.77

phi-4-mini-instruct

INT4-MIXED

1024

1214.1

39

5044.5

25.64

phi-4-mini-instruct

INT4-MIXED

2048

2406.2

39.2

4441.2

25.51

phi-4-mini-reasoning

INT4-MIXED

1986

2398.7

39.2

4462

25.51

phi-4-mini-reasoning

INT4-MIXED

1986

2418.9

39.5

4641.6

25.32

stable-zephyr-3b-dpo

INT8-CW

32

87

39.5

4873.1

25.32

stablelm-3b-4e1t

INT8-CW

32

95.7

39.6

4767.2

25.25

afm-4.5b

INT4-MIXED

2048

2971.5

39.7

4591

25.19

phi-2

INT8-CW

32

97.2

40

4904.3

25.00

afm-4.5b

INT4-MIXED

4096

6608.7

40.5

4582.5

24.69

phi-4-mini-instruct

INT4-MIXED

2048

2597.4

40.5

4866.9

24.69

qwen3-4b

INT4-MIXED

2048

2736.2

40.6

4943.8

24.63

qwen3-4b

INT4-MIXED

2048

2762.9

41

4809.3

24.39

phi-4-mini-instruct

INT4-MIXED

3623

4742

41.1

4536.6

24.33

phi-4-mini-reasoning

INT4-MIXED

3623

4759.7

41.3

4519.1

24.21

qwen3-4b

INT4-MIXED

4096

6472.1

41.6

5090.7

24.04

phi-4-mini-instruct

INT4-MIXED

3623

4836.9

41.6

4582.5

24.04

phi-4-mini-reasoning

INT4-MIXED

3623

4894.3

41.7

4592.2

23.98

phi-4-mini-reasoning

INT4-MIXED

3623

4935.8

42

4632.5

23.81

stable-zephyr-3b-dpo

INT8-CW

1024

1190.6

42

4746.2

23.81

qwen3-4b

INT4-MIXED

4096

6568.5

42.1

5168.6

23.75

stablelm-3b-4e1t

INT8-CW

1024

1193.5

42.1

4650.6

23.75

deepseek-r1-distill-qwen-1.5b

FP16

32

71.3

42.3

5040.7

23.64

phi-2

INT8-CW

1024

1216

42.3

4751.1

23.64

qwen2.5-1.5b-instruct

FP16

32

70.7

42.4

5460.8

23.58

whisper-large-v3

INT4-MIXED

prompt0

772.9

42.6

3854.8

23.47

phi-3.5-mini-instruct

INT4-MIXED

4096

7629.4

42.6

5027

23.47

phi-3-mini-4k-instruct

INT4-MIXED

4096

7602.8

42.7

5129.5

23.42

phi-3.5-mini-instruct

INT4-MIXED

4096

7747.3

42.7

5163.1

23.42

phi-3-mini-4k-instruct

INT4-MIXED

4096

7734.1

42.8

5164.7

23.36

phi-4-mini-instruct

INT4-MIXED

3623

5172.4

42.9

5043.3

23.31

deepseek-r1-distill-qwen-1.5b

FP16

1024

651

43.2

5124.9

23.15

qwen2.5-1.5b-instruct

FP16

1024

651.4

43.3

5144.8

23.09

deepseek-r1-distill-qwen-1.5b

FP16

2048

1309.9

43.8

4815

22.83

internvl2-4b

INT4-MIXED

4099

8950.9

43.8

7377.6

22.83

qwen2.5-1.5b-instruct

FP16

2048

1304.7

43.9

4726.6

22.78

deepseek-r1-distill-qwen-1.5b

FP16

4097

8260.8

44.3

4881.6

22.57

qwen2.5-1.5b-instruct

FP16

4096

2788.9

44.3

4898.5

22.57

internvl2-4b

INT4-MIXED

4099

9058.1

44.3

7365.2

22.57

phi-3.5-mini-instruct

INT4-MIXED

4096

8269.2

44.3

5492.1

22.57

phi-3-mini-4k-instruct

INT4-MIXED

4096

8098.8

44.4

5317.7

22.52

stable-zephyr-3b-dpo

INT8-CW

2076

4103.6

44.6

4797.2

22.42

stablelm-3b-4e1t

INT8-CW

2048

2821

44.6

4750.7

22.42

gpt-j-6b

INT4-MIXED

32

140.6

45.3

5178.9

22.08

whisper-large-v3

INT4-MIXED

prompt1

855.2

45.4

3886.9

22.03

llama-3.2-3b-instruct

INT8-CW

32

93.7

46.4

5553

21.55

phi-3.5-vision-instruct

INT4-MIXED

5239

13020.6

47.4

8087

21.10

minicpm3-4b

INT4-MIXED

32

181.5

47.6

4621.1

21.01

whisper-large-v3

INT8-CW

prompt0

884.2

47.6

4683.3

21.01

flan-t5-xxl

INT4-MIXED

33

468.3

47.7

13996.1

20.96

llama-3.2-3b-instruct

INT8-CW

1024

1159.5

47.7

5326.8

20.96

whisper-large-v3

INT8-CW

prompt1

947.3

48.1

4594

20.79

gpt-j-6b

INT4-MIXED

32

153.2

48.4

5508.1

20.66

minicpm3-4b

INT4-MIXED

32

215.5

48.5

4426.7

20.62

gpt-j-6b

INT4-MIXED

1024

2092.6

48.6

6045.2

20.58

minicpm3-4b

INT4-MIXED

32

220

48.7

4385.5

20.53

llama-3.2-3b-instruct

INT8-CW

2048

2401.4

49

4945.1

20.41

stablelm-3b-4e1t

INT8-CW

4096

7153.3

49.7

5660.4

20.12

llama-2-7b-chat-hf

INT4-MIXED

32

165.3

50.2

5377.6

19.92

llama-3.2-3b-instruct

INT8-CW

4096

5655.6

50.8

5450.1

19.69

qwen2.5-coder-3b-instruct

INT8-CW

32

116.4

50.9

5415.6

19.65

llama-2-7b-chat-hf

INT4-MIXED

32

172.9

51.1

5508

19.57

gpt-j-6b

INT4-MIXED

1024

2252.9

51.9

6377.5

19.27

falcon-7b-instruct

INT4-MIXED

32

166.4

51.9

5731.2

19.27

qwen2.5-coder-3b-instruct

INT8-CW

1024

1014.5

52

5106.1

19.23

flan-t5-xxl

INT4-MIXED

1139

422.9

52.1

15612.4

19.19

gpt-j-6b

INT4-MIXED

2057

4835.9

52.8

7320.2

18.94

falcon-7b-instruct

INT4-MIXED

1024

2194

53.2

5561.1

18.80

phi-3-mini-128k-instruct

INT8-CW

32

108

53.4

5721.8

18.73

phi-3-mini-4k-instruct

INT8-CW

32

113.9

53.4

5778.1

18.73

phi-3.5-mini-instruct

INT8-CW

32

109.2

53.4

5703.6

18.73

mistral-7b-instruct-v0.2

INT4-MIXED

32

175.5

53.7

5621.5

18.62

minicpm3-4b

INT4-MIXED

1024

1758.4

53.9

5081.9

18.55

chatglm3-6b

FP16

7

74.1

54.3

5655

18.42

llama-2-7b-chat-hf

INT4-MIXED

1024

2213.7

54.3

5448.1

18.42

mistral-7b-instruct-v0.3

INT4-MIXED

32

175.3

54.3

5652.8

18.42

biomistral-7b-slerp

INT4-MIXED

7

74.1

54.3

5655

18.42

chatglm3-6b

INT4-MIXED

7

74.1

54.3

5655

18.42

chatglm3-6b

INT4-MIXED

7

74.1

54.3

5655

18.42

chatglm3-6b

INT8-CW

7

74.1

54.3

5655

18.42

phi-4-multimodal-instruct

INT4-MIXED

578

4897.7

54.6

6524.5

18.32

mistral-7b-instruct-v0.2

INT4-MIXED

32

186.2

54.6

5833.8

18.32

phi-4-multimodal-instruct

INT4-MIXED

786

6613.3

54.8

6722

18.25

minicpm3-4b

INT4-MIXED

1024

1757.6

55.1

4980.6

18.15

llama-2-7b-chat-hf

INT4-MIXED

1024

2216.8

55.3

5530.5

18.08

minicpm3-4b

INT4-MIXED

1024

1727.3

55.3

4874.5

18.08

mistral-7b-instruct-v0.3

INT4-MIXED

32

188.3

55.4

5748.5

18.05

phi-4-multimodal-instruct

INT4-MIXED

1362

18175.5

55.5

8029.9

18.02

falcon-7b-instruct

INT4-MIXED

32

181.9

55.6

6288.2

17.99

mistral-7b-instruct-v0.3

INT4-MIXED

32

188.3

55.7

5965.6

17.95

phi-4-multimodal-instruct

INT4-MIXED

1570

13503

55.8

8935.9

17.92

biomistral-7b-slerp

INT4-MIXED

7

78.6

55.8

5854.6

17.92

phi-4-multimodal-instruct

INT4-MIXED

1685

13140.2

55.9

9393.9

17.89

mistral-7b-instruct-v0.2

INT4-MIXED

1024

2233

55.9

5438.2

17.89

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

174.3

56

5720.8

17.86

mistral-7b-instruct-v0.1

INT4-MIXED

32

189.1

56

5958.5

17.86

mistral-7b-instruct-v0.2

INT4-MIXED

32

187.4

56

5768.1

17.86

qwen2.5-7b-instruct

INT4-MIXED

32

181.6

56

5721.6

17.86

qwen2.5-7b-instruct-1m

INT4-MIXED

32

177.2

56

5811.4

17.86

phi-4-mini-instruct

INT8-CW

32

111.2

56

6409.4

17.86

gpt-j-6b

INT4-MIXED

2057

5028.8

56.1

7740.4

17.83

qwen2-7b-instruct

INT4-MIXED

32

178.4

56.1

5838.1

17.83

phi-4-mini-reasoning

INT8-CW

32

113.7

56.1

6280.5

17.83

internvl2-4b

INT8-CW

297

981.1

56.3

6407.6

17.76

whisper-large-v3

FP16

prompt0

933.8

56.4

5766.8

17.73

phi-3-mini-128k-instruct

INT8-CW

1024

1604.9

56.5

5732.6

17.70

phi-3-mini-4k-instruct

INT8-CW

1024

1590.8

56.5

5734.7

17.70

phi-3.5-mini-instruct

INT8-CW

1024

1592.8

56.5

5625.6

17.70

flan-t5-xxl

INT4-MIXED

2048

690.7

56.6

19603.4

17.67

mistral-7b-instruct-v0.3

INT4-MIXED

1024

2234.4

56.6

5426.5

17.67

mistral-7b-instruct-v0.2

INT4-MIXED

1024

2239.3

56.8

5532.3

17.61

whisper-large-v3

FP16

prompt1

1002.7

56.9

5545.9

17.57

falcon-7b-instruct

INT4-MIXED

1024

2323.2

57

6004.5

17.54

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

181.1

57.3

6223

17.45

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

2038.7

57.4

5859.6

17.42

qwen2-7b-instruct

INT4-MIXED

1024

2104.6

57.4

5977

17.42

qwen2.5-7b-instruct

INT4-MIXED

1024

2025.4

57.4

5860.3

17.42

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

2028.1

57.4

5950.1

17.42

qwen2.5-7b-instruct

INT4-MIXED

32

183.3

57.4

6420.8

17.42

phi-4-mini-instruct

INT8-CW

1024

1341.4

57.5

6156

17.39

mistral-7b-instruct-v0.3

INT4-MIXED

1024

2245.2

57.6

5553.9

17.36

phi-4-mini-reasoning

INT8-CW

1024

1350.9

57.7

6162.4

17.33

phi-3.5-vision-instruct

INT8-CW

802

7401.7

57.7

6893.7

17.33

mistral-7b-instruct-v0.3

INT4-MIXED

1024

2324.5

57.9

5647.1

17.27

minicpm-v-2_6

INT4-MIXED

228

2896.3

57.9

7147.8

17.27

mistral-7b-instruct-v0.1

INT4-MIXED

1025

2444.7

58

5670.2

17.24

mistral-7b-instruct-v0.2

INT4-MIXED

1024

2315.8

58.1

5646.2

17.21

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

182.3

58.1

6177.7

17.21

qwen2.5-7b-instruct-1m

INT4-MIXED

32

179.8

58.1

6333.6

17.21

minicpm-o-2_6

INT4-MIXED

238

2962.2

58.2

7165.8

17.18

qwen2-7b-instruct

INT4-MIXED

32

188.1

58.2

6525.8

17.18

qwen2.5-7b-instruct

INT4-MIXED

32

180.6

58.2

6352

17.18

phi-3.5-vision-instruct

INT8-CW

1032

3862.9

58.2

6296.3

17.18

internvl2-4b

INT8-CW

1027

4017

58.4

6261.8

17.12

llama-2-7b-chat-hf

INT4-MIXED

2048

4980

58.6

5792.1

17.06

qwen2-7b-instruct

INT4-MIXED

2048

4371.3

58.6

5559.7

17.06

deepseek-r1-distill-qwen-7b

INT4-MIXED

2048

4364.6

58.7

5441.3

17.04

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

2064

58.7

5969.2

17.04

qwen2.5-7b-instruct

INT4-MIXED

1024

2072.6

58.8

6084.8

17.01

phi-4-multimodal-instruct

INT4-MIXED

578

5157.9

58.8

6919.7

17.01

phi-4-multimodal-instruct

INT4-MIXED

786

6791.8

59

7372.8

16.95

minicpm-v-2_6

INT4-MIXED

228

3002.8

59.1

7262

16.92

phi-4-mini-instruct

INT8-CW

2048

2819.2

59.1

5817.7

16.92

phi-4-mini-reasoning

INT8-CW

1986

5427.8

59.2

5825.9

16.89

deepseek-r1-distill-qwen-7b

INT4-MIXED

4097

9472.8

59.3

5805.7

16.86

qwen2-7b-instruct

INT4-MIXED

4096

9372.7

59.3

5918.5

16.86

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

2135.2

59.5

6060.1

16.81

qwen2.5-7b-instruct

INT4-MIXED

1024

2135.9

59.5

6062.5

16.81

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

2125.7

59.5

6163.5

16.81

llama-2-7b-chat-hf

INT4-MIXED

2048

4978.1

59.6

5871.6

16.78

qwen2-7b-instruct

INT4-MIXED

1024

2221

59.6

6183.6

16.78

llama-3-8b-instruct

INT4-MIXED

32

180.6

59.6

5936.8

16.78

phi-3-mini-4k-instruct

INT8-CW

2048

3646.6

59.6

5851.8

16.78

phi-3.5-mini-instruct

INT8-CW

2048

3619.2

59.6

5736.2

16.78

phi-4-multimodal-instruct

INT4-MIXED

1362

18669

59.7

8758.1

16.75

deepseek-r1-distill-llama-8b

INT4-MIXED

32

184.6

59.7

6054.9

16.75

deepseek-r1-distill-qwen-7b

INT4-MIXED

2048

4460

59.8

5549.2

16.72

phi-4-multimodal-instruct

INT4-MIXED

1570

13974.8

60

9654.1

16.67

minicpm-v-2_6

INT4-MIXED

228

3130.7

60

7387

16.67

phi-4-multimodal-instruct

INT4-MIXED

1685

13655.4

60.1

10068.6

16.64

minicpm-o-2_6

INT4-MIXED

238

3188.1

60.1

6894.6

16.64

phi-4-multimodal-instruct

INT4-MIXED

4506

36268.1

60.2

14572.5

16.61

minicpm3-4b

INT4-MIXED

2049

4566.4

60.2

5834.6

16.61

llama-3.1-8b-instruct

INT4-MIXED

32

180.2

60.2

5936.8

16.61

llama-3-8b-instruct

INT4-MIXED

32

191.1

60.5

6454.4

16.53

deepseek-r1-distill-qwen-7b

INT4-MIXED

4097

9634.9

60.6

5915.3

16.50

deepseek-r1-distill-qwen-7b

INT4-MIXED

2048

4523.8

60.6

5639.5

16.50

qwen2-7b-instruct

INT4-MIXED

2048

4595.8

60.8

5778.2

16.45

minicpm4-8b

INT4-MIXED

32

198.2

60.8

5897.5

16.45

llama-3-8b-instruct

INT4-MIXED

32

193.6

60.9

6496.6

16.42

phi-3.5-vision-instruct

INT8-CW

2056

7298.7

60.9

7394.5

16.42

internvl2-4b

INT8-CW

2051

7364.9

61.1

7237.1

16.37

llama-3.1-8b-instruct

INT4-MIXED

32

193.3

61.3

6417.7

16.31

deepseek-r1-distill-qwen-7b

INT4-MIXED

4097

9847.7

61.4

6003.4

16.29

minicpm3-4b

INT4-MIXED

2049

4560.3

61.4

5723.3

16.29

qwen3-8b

INT4-MIXED

32

180.5

61.4

6242.9

16.29

phi-4-mini-instruct

INT8-CW

3623

10432.1

61.4

6278.4

16.29

qwen2-7b-instruct

INT4-MIXED

4096

9798.4

61.5

6123.9

16.26

gpt-j-6b

INT4-MIXED

4112

11653.7

61.6

10386.6

16.23

minicpm4-8b

INT4-MIXED

32

211.3

61.6

6455.4

16.23

phi-4-mini-reasoning

INT8-CW

3623

10471.7

61.6

6268

16.23

minicpm3-4b

INT4-MIXED

2049

4495.4

61.7

5694.7

16.21

llama-3-8b-instruct

INT4-MIXED

1024

2256.5

61.7

6114.2

16.21

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

2247.8

61.8

6232.2

16.18

llama-3-8b-instruct

INT4-MIXED

32

190.3

62

6490.2

16.13

llava-next-video-7b-hf

INT4-MIXED

2945

9332.9

62.4

7936.1

16.03

llama-3.1-8b-instruct

INT4-MIXED

1024

2258.1

62.4

6114.1

16.03

afm-4.5b

INT8-CW

32

159.6

62.4

6132.1

16.03

minicpm4-8b

INT4-MIXED

1024

2364.2

62.5

6017.6

16.00

minicpm4-8b

INT4-MIXED

32

214.6

62.5

6605.5

16.00

llama-3-8b-instruct

INT4-MIXED

1024

2236.8

62.7

6203

15.95

qwen3-8b

INT4-MIXED

32

179.3

62.8

6763

15.92

qwen3-4b

INT8-CW

2048

3273.7

63.1

6609.6

15.85

llama-3-8b-instruct

INT4-MIXED

1024

2228.7

63.2

6349.2

15.82

minicpm4-8b

INT4-MIXED

1024

2439.5

63.4

6153.7

15.77

llama-3.1-8b-instruct

INT4-MIXED

1024

2244.5

63.5

6199.3

15.75

afm-4.5b

INT8-CW

1024

1517

63.5

5963.3

15.75

minicpm4-8b

INT4-MIXED

4097

11075.9

63.7

5863.4

15.70

llama-3-8b-instruct

INT4-MIXED

2048

4755.4

63.9

5770.4

15.65

qwen3-8b

INT4-MIXED

1024

2222.3

63.9

6427.4

15.65

llava-next-video-7b-hf

INT4-MIXED

2945

9696.9

64

8038.1

15.63

minicpm4-8b

INT4-MIXED

2049

5198.3

64

5502.8

15.63

deepseek-r1-distill-llama-8b

INT4-MIXED

2048

4771.4

64

5889.3

15.63

minicpm4-8b

INT4-MIXED

1024

2548.5

64.1

6259.1

15.60

qwen3-4b

INT8-CW

4096

7519

64.1

6479.4

15.60

llama-3-8b-instruct

INT4-MIXED

1024

2332.4

64.2

6443.5

15.58

phi-4-multimodal-instruct

INT4-MIXED

4506

36990.5

64.4

15329.2

15.53

minicpm-v-4_5

INT4-MIXED

217

3015.3

64.4

7566.2

15.53

llama-3.1-8b-instruct

INT4-MIXED

2048

4745.3

64.5

5770.8

15.50

minicpm4-8b

INT4-MIXED

4097

11347.1

64.6

5994.3

15.48

afm-4.5b

INT8-CW

2048

3187.1

64.7

5907.2

15.46

gpt-j-6b

INT4-MIXED

4112

12121.1

64.8

10729.4

15.43

llama-3-8b-instruct

INT4-MIXED

4096

10403.5

64.8

6298.9

15.43

minicpm4-8b

INT4-MIXED

2049

5350.1

64.9

5628.1

15.41

llama-3-8b-instruct

INT4-MIXED

2048

4802.5

64.9

5856.1

15.41

deepseek-r1-distill-llama-8b

INT4-MIXED

4086

10391.6

65.1

6425

15.36

minicpm-v-4_5

INT4-MIXED

217

3132.4

65.2

7794.5

15.34

llama-3-8b-instruct

INT4-MIXED

2048

4783.3

65.3

6000.9

15.31

qwen3-8b

INT4-MIXED

1024

2353.2

65.3

6654.6

15.31

minicpm4-8b

INT4-MIXED

4097

11649.3

65.4

6099.8

15.29

llama-3.1-8b-instruct

INT4-MIXED

4096

10431

65.4

6299.9

15.29

minicpm4-8b

INT4-MIXED

2049

5509

65.6

5751.1

15.24

afm-4.5b

INT8-CW

4096

7029.2

65.6

6288.2

15.24

llama-3.1-8b-instruct

INT4-MIXED

2048

4820.6

65.7

5851.1

15.22

phi-3-mini-4k-instruct

INT8-CW

4096

8952.2

65.7

6953.5

15.22

phi-3.5-mini-instruct

INT8-CW

4096

8942.6

65.7

6837.4

15.22

llama-3-8b-instruct

INT4-MIXED

4096

10507.7

65.8

6385.2

15.20

llama-3-8b-instruct

INT4-MIXED

4096

10486.7

66.1

6529.5

15.13

llama-3-8b-instruct

INT4-MIXED

2048

4936.2

66.3

6095.7

15.08

qwen3-8b

INT4-MIXED

2048

4806.1

66.3

6107.6

15.08

llama-3.1-8b-instruct

INT4-MIXED

4096

10521.9

66.5

6379.9

15.04

qwen2.5-vl-7b-instruct

INT4-MIXED

32

303.8

66.6

6965.7

15.02

internvl2-4b

INT8-CW

4099

10469.2

66.7

8614.4

14.99

gemma-7b-it

INT4-MIXED

32

224.2

67

7202.4

14.93

llama-2-7b-chat-hf

INT4-MIXED

4096

12428.9

67.2

7167.6

14.88

llama-3-8b-instruct

INT4-MIXED

4096

10823.9

67.2

6624

14.88

qwen3-8b

INT4-MIXED

4096

10581.4

67.2

6661.2

14.88

flan-t5-xxl

INT4-MIXED

4096

1377.3

67.5

28697.9

14.81

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

2426.5

67.6

7483.5

14.79

qwen3-8b

INT4-MIXED

2048

5027.4

67.7

6324.6

14.77

llama-2-7b-chat-hf

INT4-MIXED

4096

12334.6

68.1

7247.6

14.68

qwen2.5-vl-7b-instruct

INT4-MIXED

32

309.6

68.5

7452.8

14.60

gemma-7b-it

INT4-MIXED

32

233.5

68.6

7523.4

14.58

qwen3-8b

INT4-MIXED

4096

11033

68.7

6883.5

14.56

qwen2.5-vl-7b-instruct

INT4-MIXED

2048

5190.3

68.7

9568.6

14.56

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

2540.2

69.6

7690

14.37

phi-3.5-vision-instruct

INT8-CW

5239

24507.7

69.7

9417.7

14.35

qwen2.5-vl-7b-instruct

INT4-MIXED

2048

5344.7

70.7

9787.8

14.14

qwen2.5-vl-7b-instruct

INT4-MIXED

4096

11437.7

70.9

13142.7

14.10

gemma-7b-it

INT4-MIXED

1024

2750.2

71

7115.5

14.08

stable-zephyr-3b-dpo

FP16

32

97.5

71.4

7250.1

14.01

stablelm-3b-4e1t

FP16

32

97.4

71.4

7405.6

14.01

phi-2

FP16

32

101.4

71.5

7239.9

13.99

deepseek-r1-distill-llama-8b

INT4-MIXED

32

176.9

71.9

7503.1

13.91

llama-3.1-8b-instruct

INT4-MIXED

32

181

72.1

7448.5

13.87

gemma-7b-it

INT4-MIXED

1024

2824.2

72.5

7554.3

13.79

glm-4-9b-chat-hf

INT4-MIXED

32

218.6

72.5

6896

13.79

minicpm3-4b

INT4-MIXED

4097

12436.8

72.9

7833.4

13.72

qwen2.5-vl-7b-instruct

INT4-MIXED

4096

11899.3

72.9

13335.9

13.72

glm-4-9b-chat-hf

INT4-MIXED

32

236.7

73.6

7400.7

13.59

minicpm3-4b

INT4-MIXED

4097

12490.3

74

7750.3

13.51

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

2473.8

74.1

7115.6

13.50

llama-3.1-8b-instruct

INT4-MIXED

1024

2471.7

74.1

7329.9

13.50

minicpm3-4b

INT4-MIXED

4097

12312.3

74.3

7713.8

13.46

glm-4-9b-chat-hf

INT4-MIXED

1024

2689.9

74.3

7123.8

13.46

glm-4-9b-chat-hf

INT4-MIXED

32

230.4

74.6

7402.4

13.40

gemma-7b-it

INT4-MIXED

2048

5975.4

75

7910.5

13.33

glm-4-9b-chat-hf

INT4-MIXED

1024

2770.3

75.3

7226.6

13.28

stable-zephyr-3b-dpo

FP16

1024

1419.6

75.9

7137

13.18

stablelm-3b-4e1t

FP16

1024

1413.1

75.9

7015.3

13.18

glm-4-9b-chat-hf

INT4-MIXED

4096

12262.4

75.9

8870.3

13.18

phi-2

FP16

1024

1409.1

76

7088.2

13.16

glm-4-9b-chat-hf

INT4-MIXED

2048

5683.9

76

7506.1

13.16

deepseek-r1-distill-llama-8b

INT4-MIXED

2048

5281

76.2

7093.4

13.12

glm-4-9b-chat-hf

INT4-MIXED

1024

2810.4

76.2

7362

13.12

llama-3.1-8b-instruct

INT4-MIXED

2048

5307.2

76.4

7049.5

13.09

gemma-2-9b-it

INT4-MIXED

32

200.4

76.4

7933.9

13.09

gemma-7b-it

INT4-MIXED

2048

6109.3

76.5

8115.3

13.07

minicpm3-4b

INT8-CW

32

182.7

76.7

6235.7

13.04

glm-4-9b-chat-hf

INT4-MIXED

4096

12495.3

77.1

8983.8

12.97

llama-3.1-8b-instruct

INT4-MIXED

4096

11469.5

77.2

7713.4

12.95

glm-4-9b-chat-hf

INT4-MIXED

2048

5761.7

77.2

7678.9

12.95

deepseek-r1-distill-llama-8b

INT4-MIXED

4086

11479.9

77.3

7762

12.94

gemma-2-9b-it

INT4-MIXED

32

214.2

77.3

7913.9

12.94

gemma-2-9b-it

INT4-MIXED

32

213.7

77.5

8270.5

12.90

phi-4-multimodal-instruct

INT8-CW

578

5176.6

77.8

8310.9

12.85

glm-4-9b-chat-hf

INT4-MIXED

2048

5924.4

78

7742.2

12.82

phi-4-multimodal-instruct

INT8-CW

786

6915.9

78

8445.9

12.82

glm-4-9b-chat-hf

INT4-MIXED

4096

12806.4

78.1

9116.7

12.80

phi-4-multimodal-instruct

INT8-CW

1362

18870.5

78.8

9847.9

12.69

phi-4-multimodal-instruct

INT8-CW

1570

14309.2

79

10734.4

12.66

phi-4-multimodal-instruct

INT8-CW

1685

13971.7

79.2

11213.2

12.63

stablelm-3b-4e1t

FP16

2048

3162.5

80.5

7823.7

12.42

stable-zephyr-3b-dpo

FP16

2076

9447.5

80.6

7836.8

12.41

gemma-2-9b-it

INT4-MIXED

1024

2851.6

80.6

7715.2

12.41

gemma-2-9b-it

INT4-MIXED

1024

2942.9

81.5

7841.6

12.27

gemma-2-9b-it

INT4-MIXED

1024

3018.3

81.8

8017

12.22

flan-t5-xxl

INT8-CW

33

310.9

81.9

23709.8

12.21

qwen2.5-coder-3b-instruct

FP16

32

126

82.7

8611.3

12.09

phi-4-multimodal-instruct

INT8-CW

4506

37716.5

82.8

16439.1

12.08

minicpm3-4b

INT8-CW

1024

2028.6

83

6596.6

12.05

gpt-j-6b

INT8-CW

32

158.4

83

7982.6

12.05

gemma-7b-it

INT4-MIXED

4096

14072.9

83.1

9060.3

12.03

qwen2.5-coder-3b-instruct

FP16

1024

1394.7

84.3

7990.9

11.86

gemma-7b-it

INT4-MIXED

4096

14448.1

84.7

9245

11.81

gemma-2-9b-it

INT4-MIXED

2048

6150.2

84.9

8125.5

11.78

llama-3.2-3b-instruct

FP16

32

108.2

85.4

9000.5

11.71

gemma-2-9b-it

INT4-MIXED

2048

6334.1

85.8

8257.3

11.66

gemma-2-9b-it

INT4-MIXED

2048

6513.8

86

8408.9

11.63

gpt-j-6b

INT8-CW

1024

2465.2

87.1

8211.5

11.48

flan-t5-xxl

INT8-CW

1139

509.5

87.2

25123.7

11.47

llama-3.2-3b-instruct

FP16

1024

1413.1

87.5

8422.3

11.43

gemma-2-9b-it

INT4-MIXED

4096

13749.8

88.4

9292.1

11.31

gemma-2-9b-it

INT4-MIXED

4096

14169.9

89.3

9423.6

11.20

minicpm3-4b

INT8-CW

2049

7300.5

89.3

7623.8

11.20

stablelm-3b-4e1t

FP16

4096

7749

89.4

9238.6

11.19

gemma-2-9b-it

INT4-MIXED

4096

14397.3

89.5

9574.8

11.17

llama-3.2-3b-instruct

FP16

2048

2892.4

89.6

8470.6

11.16

gpt-j-6b

INT8-CW

2057

7011.4

91.3

9570.4

10.95

llama-2-7b-chat-hf

INT8-CW

32

156.2

91.6

8478.1

10.92

llama-3.2-3b-instruct

FP16

4096

6578.9

92.1

9073.1

10.86

flan-t5-xxl

INT8-CW

2048

836.5

92.3

29363

10.83

llama-2-13b-chat-hf

INT4-MIXED

32

315.3

93.9

8271.6

10.65

falcon-7b-instruct

INT8-CW

32

178.6

94.9

8853.9

10.54

llama-2-7b-chat-hf

INT8-CW

1024

2675.6

95.6

8222.6

10.46

falcon-7b-instruct

INT8-CW

1024

2619.8

96.2

8208.9

10.40

deepseek-r1-distill-qwen-7b

INT8-CW

32

183

97.9

8857.8

10.21

qwen2.5-7b-instruct

INT8-CW

32

193.6

97.9

8859.7

10.21

qwen2.5-7b-instruct-1m

INT8-CW

32

188.6

97.9

8941.8

10.21

qwen2-7b-instruct

INT8-CW

32

184.8

98

8969.5

10.20

mistral-7b-instruct-v0.2

INT8-CW

32

181.2

99

8740.7

10.10

mistral-7b-instruct-v0.3

INT8-CW

32

176.1

99

8756.2

10.10

phi-3-mini-4k-instruct

FP16

32

127.9

99.1

9385.8

10.09

phi-3-mini-128k-instruct

FP16

32

126.5

99.2

9397.8

10.08

deepseek-r1-distill-qwen-7b

INT8-CW

1024

2504.8

99.3

8385.8

10.07

qwen2.5-7b-instruct

INT8-CW

1024

2505.2

99.4

8386.7

10.06

qwen2.5-7b-instruct-1m

INT8-CW

1024

2518.2

99.4

8470.4

10.06

phi-3.5-mini-instruct

FP16

32

128.7

99.5

9291.7

10.05

qwen2-7b-instruct

INT8-CW

1024

2566.8

99.5

8504.9

10.05

biomistral-7b-slerp

INT8-CW

7

117.9

99.5

8777.8

10.05

minicpm-v-2_6

INT8-CW

228

3132.9

99.7

10104.6

10.03

llama-2-7b-chat-hf

INT8-CW

2048

5933.5

99.9

8960.9

10.01

minicpm-o-2_6

INT8-CW

238

3081.2

100

10113.2

10.00

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1