Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2026.2, 28 May 2026.

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

2nd token per sec

llama-3.2-1b-instruct

INT4-MIXED

32

13

8

76.92

llama-3.2-1b-instruct

INT4-MIXED

32

13.3

8

75.19

minicpm4-0.5b

FP4-NORMALIZED

32

14.1

8.7

70.92

gemma-3-270m

FP16

32

14.8

6.5

67.57

minicpm4-0.5b

FP16

32

14.8

9.4

67.57

llama-3.2-1b-instruct

INT8-CW

32

15

12.7

66.67

minicpm4-0.5b

INT4-MIXED

32

15.4

3.9

64.94

minicpm4-0.5b

INT4-MIXED

32

15.4

3.9

64.94

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

15.8

4.4

63.29

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

15.9

4.5

62.89

qwen2.5-coder-0.5b-instruct

FP16

32

16.1

10.5

62.11

minicpm4-0.5b

INT4-MIXED

32

16.4

4

60.98

gemma-3-270m

INT8-CW

32

16.7

7.5

59.88

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

16.8

4.5

59.52

minicpm4-0.5b

INT8-CW

32

17.3

5.5

57.80

gemma-3-270m

INT4-MIXED

32

17.4

5.7

57.47

qwen2.5-coder-0.5b-instruct

INT8-CW

32

17.8

6.1

56.18

gemma-3-270m

FP16

1024

18.7

6.7

53.48

gemma-2b-it

INT4-MIXED

32

19.1

15

52.36

gemma-3-270m

INT4-MIXED

1024

19.1

4.3

52.36

qwen3-embedding-0.6b

FP16

32

19.1

-1

52.36

qwen3-embedding-0.6b

INT4-MIXED

32

19.6

-1

51.02

gemma-2b-it

INT4-MIXED

32

19.8

15.7

50.51

gemma-3-270m

INT8-CW

1024

19.9

4.6

50.25

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

19.9

9.8

50.25

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

19.9

10

50.25

qwen2.5-1.5b-instruct

INT4-MIXED

32

20

9.9

50.00

qwen3-embedding-0.6b

INT4-MIXED

32

20

-1

50.00

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

20.2

9.9

49.50

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

20.7

10.1

48.31

qwen3-embedding-0.6b

INT4-MIXED

32

20.7

-1

48.31

qwen3-embedding-0.6b

INT8-CW

32

21.3

-1

46.95

qwen3-reranker-0.6b-seq-cls

FP16

22

21.3

-1

46.95

gemma-3-1b-it

FP4-NORMALIZED

32

21.4

18.3

46.73

gemma-3-1b-it

INT4-MIXED

32

21.7

11.2

46.08

gemma-3-1b-it

INT4-MIXED

32

22.1

11.3

45.25

gemma-3-1b-it

INT4-MIXED

32

22.4

11.7

44.64

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

22.5

12.6

44.44

qwen2.5-1.5b-instruct

INT4-MIXED

32

22.5

10.8

44.44

qwen2.5-1.5b-instruct

INT8-CW

32

23.2

15.7

43.10

minicpm4-0.5b

INT4-MIXED

1024

23.6

4

42.37

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

23.9

15.7

41.84

minicpm4-0.5b

INT4-MIXED

1024

24

4.2

41.67

gemma-3-1b-it

FP16

32

24.3

20.8

41.15

gemma-2-2b

INT4-MIXED

33

24.4

16.9

40.98

gemma-3-1b-it

INT8-CW

32

24.5

13.2

40.82

llama-3.2-1b-instruct

FP16

32

24.9

23.2

40.16

qwen2.5-coder-1.5b-instruct

INT8-CW

32

25.2

15.7

39.68

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

26.1

4.5

38.31

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

26.2

4.6

38.17

llama-3.2-3b-instruct

INT4-MIXED

32

26.4

18.1

37.88

minicpm4-0.5b

INT8-CW

1024

26.4

5.9

37.88

llama-3.2-3b-instruct

INT4-MIXED

32

26.8

18.6

37.31

qwen3-reranker-0.6b-seq-cls

INT4-MIXED

22

26.8

-1

37.31

qwen3-reranker-0.6b-seq-cls

INT4-MIXED

22

27.1

-1

36.90

qwen3-reranker-0.6b-seq-cls

INT4-MIXED

22

27.6

-1

36.23

llama-3.2-3b-instruct

INT4-MIXED

32

27.8

18.9

35.97

gemma-2b-it

INT8-CW

32

28.3

25

35.34

qwen2.5-coder-0.5b-instruct

INT8-CW

1024

28.6

6.3

34.97

minicpm4-0.5b

INT4-MIXED

1024

28.9

4.1

34.60

phi-3-mini-128k-instruct

INT4-MIXED

32

28.9

19.9

34.60

phi-3-mini-4k-instruct

INT4-MIXED

32

28.9

19.8

34.60

phi-3.5-mini-instruct

INT4-MIXED

32

28.9

19.8

34.60

phi-3.5-mini-instruct

INT4-MIXED

32

29.1

20.4

34.36

phi-3-mini-4k-instruct

INT4-MIXED

32

29.3

20.4

34.13

gemma-2-2b

INT4-MIXED

33

29.6

17.5

33.78

phi-3-mini-128k-instruct

INT4-MIXED

32

30

20.9

33.33

deepseek-r1-distill-qwen-1.5b

FP4-NORMALIZED

32

30.3

27.5

33.00

phi-3-mini-4k-instruct

INT4-MIXED

32

30.7

21.7

32.57

stable-zephyr-3b-dpo

INT4-MIXED

32

31

15.7

32.26

minicpm4-0.5b

FP4-NORMALIZED

1024

31.2

8.9

32.05

qwen3-reranker-0.6b

INT4-MIXED

22

31.3

-1

31.95

qwen3-reranker-0.6b-seq-cls

INT8-CW

22

31.4

-1

31.85

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

31.5

4.6

31.75

qwen3-reranker-0.6b

INT4-MIXED

22

31.6

-1

31.65

afm-4.5b

INT4-MIXED

32

31.9

24

31.35

qwen2.5-1.5b-instruct

FP16

32

31.9

29.7

31.35

qwen2.5-coder-1.5b-instruct

FP16

32

32

29.6

31.25

deepseek-r1-distill-qwen-1.5b

FP16

32

32.1

29.6

31.15

minicpm4-0.5b

FP16

1024

32.2

9.7

31.06

phi-3.5-mini-instruct

INT4-MIXED

32

32.2

21.9

31.06

qwen2.5-1.5b-instruct

FP4-NORMALIZED

32

32.3

27.5

30.96

qwen2.5-coder-3b-instruct

INT4-MIXED

32

32.5

18.7

30.77

qwen2.5-coder-3b-instruct

INT4-MIXED

32

32.5

18.1

30.77

glm-edge-1.5b-chat

INT4-MIXED

32

32.6

11.1

30.67

gemma-2-2b

INT8-CW

33

32.7

26

30.58

qwen3-reranker-0.6b

FP16

22

32.7

-1

30.58

minicpm-1b-sft

INT4-MIXED

31

33.7

9.6

29.67

chatglm3-6b

INT4-MIXED

32

34.3

29.8

29.15

qwen3-reranker-0.6b

INT4-MIXED

22

34.5

-1

28.99

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

34.5

34

28.99

minicpm-1b-sft

INT4-MIXED

31

34.6

9.8

28.90

glm-edge-1.5b-chat

INT8-CW

32

34.8

16.2

28.74

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

35.3

35.1

28.33

qwen2.5-coder-3b-instruct

INT8-CW

32

35.6

30.5

28.09

qwen2.5-coder-0.5b-instruct

FP16

1024

35.7

10.7

28.01

llama-3.2-3b-instruct

INT8-CW

32

35.9

32.3

27.86

minicpm-1b-sft

INT4-MIXED

31

36.2

10.5

27.62

minicpm-1b-sft

FP16

31

36.3

27.9

27.55

biomistral-7b-slerp

INT4-MIXED

7

36.4

34.5

27.47

stablelm-3b-4e1t

INT4-MIXED

32

36.4

15.8

27.47

qwen2.5-coder-3b-instruct

INT4-MIXED

32

36.5

19.1

27.40

glm-edge-1.5b-chat

FP16

32

36.8

28.7

27.17

qwen3-4b

INT4-MIXED

32

36.8

22.7

27.17

phi-2

INT4-MIXED

32

36.9

15.9

27.10

chatglm3-6b

INT4-MIXED

32

37

31.1

27.03

ltx-video

INT4-MIXED

11

37

36.4

27.03

codellama-7b

INT4-MIXED

32

37.4

32.8

26.74

llama-2-7b-chat-hf

INT4-MIXED

32

37.4

32.8

26.74

qwen3-4b

INT4-MIXED

32

37.6

23.4

26.60

phi-2

INT4-MIXED

32

38.1

16.9

26.25

biomistral-7b-slerp

INT4-MIXED

7

38.4

36.2

26.04

ltx-video

INT8-CW

11

38.7

38

25.84

qwen3-reranker-0.6b

INT8-CW

22

39.4

-1

25.38

phi-2

INT8-CW

32

39.6

27.1

25.25

stable-zephyr-3b-dpo

INT8-CW

32

39.6

27.2

25.25

stablelm-3b-4e1t

INT8-CW

32

39.6

27.3

25.25

lcm-dreamshaper-v7

INT8-HYBRID

32

39.8

38.6

25.13

falcon-7b-instruct

INT4-MIXED

32

40.3

34.6

24.81

mistral-7b-instruct-v0.3

INT4-MIXED

32

40.4

34.7

24.75

phi-4-mini-reasoning

INT4-MIXED

32

40.6

22.9

24.63

mistral-7b-instruct-v0.2

INT4-MIXED

32

40.7

34.6

24.57

phi-4-mini-reasoning

INT4-MIXED

32

40.7

23.5

24.57

lcm-dreamshaper-v7

INT8-HYBRID

1024

40.8

38.8

24.51

llama-2-7b-chat-hf

INT4-MIXED

32

40.8

33.8

24.51

phi-4-mini-instruct

INT4-MIXED

32

41

22.9

24.39

minicpm-1b-sft

INT8-CW

31

41.2

15.6

24.27

codellama-7b

INT4-MIXED

32

41.4

34.3

24.15

phi-4-mini-instruct

INT4-MIXED

32

41.5

23.5

24.10

qwen3-embedding-0.6b

INT4-MIXED

1024

41.5

-1

24.10

qwen3-reranker-0.6b-seq-cls

INT4-MIXED

836

41.8

-1

23.92

stable-zephyr-3b-dpo

INT4-MIXED

32

41.9

17.6

23.87

stablelm-3b-4e1t

INT4-MIXED

32

41.9

19.2

23.87

qwen3-embedding-0.6b

INT4-MIXED

1024

42.1

-1

23.75

qwen2.5-7b-instruct-1m

INT4-MIXED

32

42.2

36.3

23.70

phi-3-mini-4k-instruct

INT8-CW

32

42.8

37.9

23.36

phi-3-mini-128k-instruct

INT8-CW

32

42.9

37.9

23.31

phi-3.5-mini-instruct

INT8-CW

32

43

38.1

23.26

phi-4-mini-reasoning

INT4-MIXED

32

43.2

23.8

23.15

qwen3-reranker-0.6b-seq-cls

INT4-MIXED

836

43.2

-1

23.15

llama-3.2-1b-instruct

INT4-MIXED

1024

43.4

8.4

23.04

mistral-7b-instruct-v0.2

INT4-MIXED

32

43.6

35.8

22.94

mistral-7b-instruct-v0.3

INT4-MIXED

32

43.9

35.8

22.78

deepseek-r1-distill-llama-8b

INT4-MIXED

32

44

38.2

22.73

llama-3-8b-instruct

INT4-MIXED

32

44

38.2

22.73

llama-3.1-8b-instruct

INT4-MIXED

32

44

38.2

22.73

phi-4-mini-reasoning

INT8-CW

32

44.3

39

22.57

phi-4-mini-instruct

INT8-CW

32

44.4

39.1

22.52

qwen2.5-7b-instruct

INT4-MIXED

32

44.5

36.3

22.47

mistral-7b-instruct-v0.3

INT4-MIXED

32

44.6

36.3

22.42

qwen2-7b-instruct

INT4-MIXED

32

44.6

37.4

22.42

qwen2.5-7b-instruct

INT4-MIXED

32

44.7

37.4

22.37

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

44.8

37.3

22.32

mistral-7b-instruct-v0.1

INT4-MIXED

32

44.8

36.2

22.32

qwen3-4b

INT8-CW

32

44.8

39

22.32

qwen3-embedding-0.6b

INT8-CW

1024

44.8

-1

22.32

mistral-7b-instruct-v0.2

INT4-MIXED

32

44.9

36.3

22.27

bloomz-7b1

INT4-MIXED

32

45.2

38.9

22.12

phi-4-mini-instruct

INT4-MIXED

32

45.2

24.6

22.12

qwen3-8b

INT4-MIXED

32

45.2

39.3

22.12

qwen2-7b-instruct

INT4-MIXED

32

45.5

37.9

21.98

afm-4.5b

INT8-CW

32

45.7

40.9

21.88

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

45.8

37.8

21.83

qwen2.5-7b-instruct-1m

INT4-MIXED

32

45.8

37.8

21.83

qwen3-reranker-0.6b-seq-cls

FP16

836

45.8

-1

21.83

qwen2.5-7b-instruct

INT4-MIXED

32

45.9

37.8

21.79

minicpm4-8b

INT4-MIXED

32

46.4

38.8

21.55

falcon-7b-instruct

INT4-MIXED

32

47.1

37.9

21.23

qwen3-embedding-0.6b

FP16

1024

47.2

-1

21.19

deepseek-r1-distill-llama-8b

INT4-MIXED

32

47.3

39.3

21.14

llama-3-8b-instruct

INT4-MIXED

32

47.3

39.3

21.14

llama-3.1-8b-instruct

INT4-MIXED

32

47.5

39.2

21.05

qwen3-reranker-0.6b-seq-cls

INT4-MIXED

836

47.6

-1

21.01

llama-3-8b-instruct

INT4-MIXED

32

47.7

39.2

20.96

lcm-dreamshaper-v7

INT8-CW

1024

47.9

47.2

20.88

llama-3-8b-instruct

INT4-MIXED

32

48.2

39.8

20.75

qwen3-8b

INT4-MIXED

32

48.2

40.5

20.75

qwen3-reranker-0.6b-seq-cls

INT8-CW

836

48.2

-1

20.75

lcm-dreamshaper-v7

INT8-CW

32

48.3

47.6

20.70

lcm-dreamshaper-v7

INT8-CW

1024

48.5

47.7

20.62

bloomz-7b1

INT4-MIXED

32

48.9

40.3

20.45

lcm-dreamshaper-v7

INT8-CW

32

48.9

47.1

20.45

gemma-2b-it

FP16

32

49.6

46.3

20.16

minicpm4-8b

INT4-MIXED

32

49.6

40

20.16

qwen3-embedding-0.6b

INT4-MIXED

1024

49.6

-1

20.16

lcm-dreamshaper-v7

FP16

1024

49.7

49

20.12

lcm-dreamshaper-v7

FP16

32

49.7

48.9

20.12

qwen3-8b

INT4-MIXED

32

49.8

40.9

20.08

baichuan2-7b-chat

INT4-MIXED

32

50.9

43.4

19.65

glm-4-9b-chat-hf

INT4-MIXED

32

51.2

44.5

19.53

minicpm4-8b

INT4-MIXED

32

51.3

40.6

19.49

gemma-3-1b-it

INT4-MIXED

1024

51.7

8.4

19.34

gemma-7b-it

INT4-MIXED

32

51.8

44.4

19.31

qwen3_8b_eagle3

INT4-MIXED

32

51.9

33.8

19.27

llama-3.2-1b-instruct

INT4-MIXED

1024

52.1

8.6

19.19

gemma-3-4b-it

INT4-MIXED

32

52.7

25.3

18.98

gemma-2-2b

FP16

33

53.3

50.2

18.76

gemma-3-1b-it

INT4-MIXED

1024

53.8

8.6

18.59

glm-4-9b-chat-hf

INT4-MIXED

32

54.1

45.8

18.48

phi-2

FP16

32

54.1

50.8

18.48

stable-zephyr-3b-dpo

FP16

32

54.3

51

18.42

stablelm-3b-4e1t

FP16

32

54.5

51.1

18.35

gemma-7b-it

INT4-MIXED

32

55

46.2

18.18

gemma-3-4b-it

INT4-MIXED

32

55.1

25.6

18.15

glm-4-9b-chat-hf

INT4-MIXED

32

55.3

46.4

18.08

ltx-video

FP16

11

55.4

54.8

18.05

gemma-3-4b-it

INT4-MIXED

32

55.6

24.8

17.99

llama-3.2-1b-instruct

INT8-CW

1024

56.3

13

17.76

nanollava

INT8-CW

760

56.5

7.4

17.70

nanollava

INT4-MIXED

760

57.3

6.5

17.45

gemma-2-9b-it

INT4-MIXED

32

58

49.7

17.24

gemma-3-4b-it

INT8-CW

32

58.5

39.5

17.09

glm-edge-4b-chat

INT4-MIXED

32

58.5

24.5

17.09

llama-3.2-3b-instruct

FP4-NORMALIZED

32

59.4

56.3

16.84

chatglm3-6b

INT8-CW

32

60.3

55.3

16.58

gemma-3-1b-it

INT4-MIXED

1024

60.3

9.1

16.58

nanollava

FP16

760

60.3

11.5

16.58

deepseek-r1-distill-llama-8b

INT4-MIXED

32

60.5

47.1

16.53

llama-3.1-8b-instruct

INT4-MIXED

32

60.8

47.1

16.45

gemma-3-1b-it

INT8-CW

1024

61.1

11.8

16.37

qwen2.5-coder-3b-instruct

FP16

32

61.1

57.3

16.37

glm-edge-4b-chat

INT8-CW

32

61.2

43.3

16.34

gemma-2-9b-it

INT4-MIXED

32

61.6

51.1

16.23

gemma-2-9b-it

INT4-MIXED

32

62.5

51.7

16.00

llama-3.2-3b-instruct

FP16

32

62.5

60

16.00

whisper-small

FP16

prompt0

64.1

8.1

15.60

whisper-small

INT4-MIXED

prompt0

66.1

6

15.13

qwen3.5-0.8b

FP16

85

66.2

15.8

15.11

whisper-small

INT4-MIXED

prompt0

66.8

6.2

14.97

llama-2-7b-chat-hf

INT8-CW

32

67.1

62.1

14.90

biomistral-7b-slerp

INT8-CW

7

67.5

65.5

14.81

codellama-7b

INT8-CW

32

67.5

61.7

14.81

qwen3.5-0.8b

FP4-NORMALIZED

85

68.1

13.4

14.68

lcm-sdxl

INT8-HYBRID

32

68.7

68

14.56

llama-2-13b-chat-hf

INT4-MIXED

32

69.2

61.5

14.45

lcm-sdxl

INT8-HYBRID

1024

69.4

68.2

14.41

lcm-sdxl

INT8-CW

1024

69.8

69.2

14.33

lcm-sdxl

INT8-CW

32

69.9

69.3

14.31

whisper-small

INT8-CW

prompt0

70

6.2

14.29

falcon-7b-instruct

INT8-CW

32

70.1

64.3

14.27

minicpm-1b-sft

INT4-MIXED

1014

70.6

10.5

14.16

lcm-sdxl

INT8-CW

1024

70.9

70.5

14.10

lcm-sdxl

INT8-CW

32

71.1

70.5

14.06

phi-4-mini-reasoning

FP4-NORMALIZED

32

71.1

66.2

14.06

phi-4-mini-instruct

FP4-NORMALIZED

32

71.2

66.4

14.04

stable-diffusion-v1-5

INT8-HYBRID

1024

71.2

70.4

14.04

stable-diffusion-v1-5

INT8-HYBRID

32

71.2

70.2

14.04

whisper-small

INT4-MIXED

prompt0

71.2

6.4

14.04

baichuan2-7b-chat

INT8-CW

32

71.6

65.1

13.97

dolly-v2-12b

INT4-MIXED

32

71.7

59.7

13.95

mistral-7b-instruct-v0.1

INT8-CW

32

72

65.9

13.89

mistral-7b-instruct-v0.2

INT8-CW

32

72

65.9

13.89

mistral-7b-instruct-v0.3

INT8-CW

32

72

65.7

13.89

deepseek-r1-distill-qwen-7b

INT8-CW

32

72.5

66.5

13.79

qwen2.5-7b-instruct

INT8-CW

32

72.5

66.8

13.79

qwen2.5-7b-instruct-1m

INT8-CW

32

72.6

66.4

13.77

qwen2-7b-instruct

INT8-CW

32

72.7

66.4

13.76

phi-3-mini-4k-instruct

FP16

32

72.8

69.6

13.74

bloomz-7b1

INT8-CW

32

72.9

68

13.72

gemma-3-1b-it

FP4-NORMALIZED

1024

72.9

18

13.72

phi-3-mini-128k-instruct

FP16

32

72.9

69.3

13.72

phi-3.5-mini-instruct

FP16

32

73.1

69.3

13.68

gemma-3-1b-it

FP16

1024

73.6

20.7

13.59

qwen3.5-0.8b

INT4-MIXED

85

74

6.9

13.51

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

74.7

10.1

13.39

sd-turbo

INT8-HYBRID

1024

74.7

74.3

13.39

sd-turbo

INT8-HYBRID

32

74.7

74.1

13.39

stable-diffusion-v2-1

INT8-HYBRID

1024

74.8

74.3

13.37

stable-diffusion-v2-1

INT8-HYBRID

32

75

74.2

13.33

qwen2.5-1.5b-instruct

INT8-CW

1024

75.5

16.3

13.25

llama-3.1-8b-instruct

INT8-CW

32

75.6

69.4

13.23

deepseek-r1-distill-llama-8b

INT8-CW

32

75.7

69.3

13.21

llama-3-8b-instruct

INT8-CW

32

75.7

69.3

13.21

qwen2.5-coder-1.5b-instruct

INT8-CW

1024

76

16.3

13.16

llama-3.2-1b-instruct

FP16

1024

76.2

23.6

13.12

phi-4-mini-instruct

FP16

32

76.4

71.8

13.09

phi-4-mini-reasoning

FP16

32

76.4

72

13.09

qwen3.5-0.8b

INT4-MIXED

85

76.6

7.1

13.05

qwen3-reranker-0.6b

INT4-MIXED

836

77.1

-1

12.97

qwen3-8b

INT8-CW

32

77.6

70.1

12.89

minicpm-1b-sft

INT4-MIXED

1014

78.2

11.3

12.79

phi-4

INT4-MIXED

32

78.2

68.3

12.79

phi-4-reasoning

INT4-MIXED

32

78.2

68.3

12.79

gemma-3-4b-it

FP4-NORMALIZED

32

78.5

67.5

12.74

qwen3-reranker-0.6b

FP16

836

78.7

-1

12.71

qwen3-4b

FP16

32

78.8

74.4

12.69

qwen3.5-0.8b

INT8-CW

85

78.8

9.2

12.69

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

79

10.5

12.66

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

79.1

16.2

12.64

qwen3.5-0.8b

INT4-MIXED

85

79.8

7.3

12.53

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

80

10.4

12.50

qwen1.5-14b-chat

INT4-MIXED

32

80.1

69.5

12.48

qwen2.5-1.5b-instruct

INT4-MIXED

1024

80.3

10.3

12.45

qwen3-reranker-0.6b

INT4-MIXED

836

80.7

-1

12.39

qwen3_8b_eagle3

INT8-CW

32

80.8

55.7

12.38

minicpm4-8b

INT8-CW

32

81.5

75.2

12.27

phi-4-reasoning

INT4-MIXED

32

82.5

70.4

12.12

stable-diffusion-v1-5

INT8-CW

1024

83

82.5

12.05

minicpm-1b-sft

INT8-CW

1014

83.2

16.7

12.02

afm-4.5b

FP16

32

83.3

79.8

12.00

qwen3-reranker-0.6b

INT8-CW

836

83.3

-1

12.00

stable-diffusion-v1-5

INT8-CW

32

83.3

82.5

12.00

stable-diffusion-v1-5

INT8-CW

32

83.8

82.7

11.93

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

84.1

10.4

11.89

gemma-3-4b-it

FP16

32

84.2

73.4

11.88

qwen2.5-1.5b-instruct

INT4-MIXED

1024

84.5

11.2

11.83

sd-turbo

INT8-CW

1024

84.5

84.2

11.83

stable-diffusion-v1-5

FP16

1024

84.5

84.6

11.83

sd-turbo

INT8-CW

32

84.6

84

11.82

stable-diffusion-v2-1

INT8-CW

32

84.6

84.1

11.82

phi-4

INT4-MIXED

32

84.8

71.4

11.79

glm-edge-4b-chat

FP16

32

85

78.3

11.76

stable-diffusion-v1-5

FP16

32

85

84.3

11.76

stable-diffusion-v1-5

INT8-CW

1024

85.1

83

11.75

stable-diffusion-v2-1

INT8-CW

1024

85.2

84.1

11.74

stable-diffusion-v2-1

INT8-CW

32

85.3

84.5

11.72

stable-diffusion-v2-1

INT8-CW

1024

85.4

84.6

11.71

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

85.5

72.4

11.70

sd-turbo

INT8-CW

1024

85.5

85.2

11.70

sd-turbo

INT8-CW

32

85.5

84.9

11.70

sd-turbo

FP16

32

86

85.7

11.63

sd-turbo

FP16

1024

86.2

89.7

11.60

glm-4-9b-chat-hf

INT8-CW

32

88

80.7

11.36

qwen3-reranker-0.6b

INT4-MIXED

836

88

-1

11.36

lcm-sdxl

FP16

32

88.7

88.3

11.27

lcm-sdxl

FP16

1024

88.8

88.4

11.26

qwen2.5-1.5b-instruct

FP4-NORMALIZED

1024

89

27.9

11.24

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

89.1

13.1

11.22

deepseek-r1-distill-qwen-1.5b

FP4-NORMALIZED

1024

89.3

27.9

11.20

gemma-7b-it

INT8-CW

32

89.4

81

11.19

gemma-3-12b-it

INT4-MIXED

32

89.5

63.4

11.17

starcoder2-15b

INT4-MIXED

32

89.7

77.6

11.15

minicpm-1b-sft

INT4-MIXED

1014

89.8

11.3

11.14

gemma-3-12b-it

INT4-MIXED

32

91.6

66

10.92

qwen2.5-1.5b-instruct

FP16

1024

91.6

30.1

10.92

deepseek-r1-distill-qwen-1.5b

FP16

1024

91.9

30.1

10.88

qwen2.5-coder-1.5b-instruct

FP16

1024

92

30.1

10.87

nanollava

INT8-CW

1752

92.2

8.5

10.85

llama-2-13b-chat-hf

INT4-MIXED

32

92.8

76.4

10.78

gemma-2b-it

INT4-MIXED

1024

93.8

15.7

10.66

minicpm-1b-sft

FP16

1014

95

28.9

10.53

gemma-2-9b-it

INT8-CW

32

96.7

87.9

10.34

phi-4-reasoning

INT4-MIXED

32

97.6

74.3

10.25

nanollava

INT4-MIXED

1752

99.9

7.4

10.01

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

2nd token per sec

distil-large-v2

INT4-MIXED

prompt0

186.5

6.4

156.25

distil-large-v2

INT8-CW

prompt0

174.8

6.4

156.25

distil-large-v2

INT8-CW

prompt1

218

6.4

156.25

distil-large-v2

INT4-MIXED

prompt1

226.6

6.5

153.85

whisper-large-v3-turbo

INT4-MIXED

prompt0

243.8

9.1

109.89

whisper-large-v3-turbo

INT4-MIXED

prompt1

311.4

9.2

108.70

distil-large-v2

FP16

prompt0

217.1

9.4

106.38

whisper-large-v3-turbo

INT8-CW

prompt1

310.4

9.4

106.38

distil-large-v2

FP16

prompt1

267.3

9.5

105.26

whisper-large-v3-turbo

INT8-CW

prompt0

226.8

9.8

102.04

minicpm4-0.5b

INT4-MIXED

32

36.1

10.5

95.24

minicpm4-0.5b

INT4-MIXED

32

32.6

10.7

93.46

minicpm4-0.5b

INT4-MIXED

1024

71.1

11.1

90.09

minicpm4-0.5b

INT4-MIXED

1024

74.5

11.3

88.50

minicpm4-0.5b

INT8-CW

32

33.2

11.5

86.96

minicpm4-0.5b

INT4-MIXED

1024

67.3

11.6

86.21

minicpm4-0.5b

INT4-MIXED

32

40

11.9

84.03

minicpm4-0.5b

INT8-CW

1024

74.7

12.2

81.97

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

45.1

12.6

79.37

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

70.6

12.8

78.13

whisper-small

INT4-MIXED

prompt0

119.2

13.2

75.76

whisper-large-v3-turbo

FP16

prompt0

277.1

13.3

75.19

whisper-small

INT4-MIXED

prompt0

118.8

13.4

74.63

whisper-large-v3-turbo

FP16

prompt1

342.8

13.5

74.07

whisper-small

INT4-MIXED

prompt0

129.3

13.5

74.07

gemma-3-270m

INT4-MIXED

32

40.8

13.7

72.99

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

49.1

13.7

72.99

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

71.8

13.8

72.46

whisper-small

INT4-MIXED

prompt1

186.5

13.8

72.46

whisper-small

INT4-MIXED

prompt1

178.3

14

71.43

gemma-3-270m

INT4-MIXED

1024

57.8

14.1

70.92

whisper-small

INT4-MIXED

prompt1

185.2

14.1

70.92

whisper-small

INT8-CW

prompt1

198.8

14.6

68.49

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

48.9

14.7

68.03

gemma-3-270m

FP16

1024

53.5

14.8

67.57

gemma-3-270m

INT8-CW

32

47.5

14.8

67.57

gemma-3-270m

FP16

32

39

14.9

67.11

whisper-small

INT8-CW

prompt0

127.1

14.9

67.11

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

76.3

15.1

66.23

gemma-3-270m

INT8-CW

1024

59.1

15.2

65.79

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

139.1

16.8

59.52

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

63.7

17.1

58.48

llama-3.2-1b-instruct

INT4-MIXED

32

41.8

17.2

58.14

whisper-small

FP16

prompt0

128.2

17.2

58.14

whisper-small

FP16

prompt1

174.1

17.2

58.14

llama-3.2-1b-instruct

INT4-MIXED

1024

145.4

18.1

55.25

qwen2.5-coder-0.5b-instruct

INT8-CW

32

64.7

18.1

55.25

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

68.9

18.3

54.64

qwen3.5-0.8b

INT4-MIXED

1024

295.4

18.3

54.64

qwen2.5-coder-0.5b-instruct

INT8-CW

1024

81.6

18.5

54.05

qwen3.5-0.8b

INT4-MIXED

85

153.4

18.5

54.05

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

170.4

18.8

53.19

qwen3.5-0.8b

INT4-MIXED

1024

298.1

18.9

52.91

qwen3.5-0.8b

INT4-MIXED

85

166.2

19.3

51.81

qwen3.5-0.8b

INT4-MIXED

85

178.4

19.9

50.25

qwen3.5-0.8b

INT4-MIXED

1024

308.4

20

50.00

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

46.8

20.1

49.75

qwen3.5-0.8b

INT8-CW

1024

320.6

20.6

48.54

qwen3.5-0.8b

INT8-CW

85

193.3

20.6

48.54

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

140.6

20.9

47.85

nanollava

INT4-MIXED

760

170

21

47.62

llama-3.2-1b-instruct

INT4-MIXED

32

38.8

21.2

47.17

nanollava

INT8-CW

760

169.2

21.2

47.17

llama-3.2-1b-instruct

INT4-MIXED

1024

120.7

21.5

46.51

llama-3.2-1b-instruct

INT8-CW

32

39.1

21.5

46.51

qwen2.5-coder-0.5b-instruct

FP16

32

38.9

21.7

46.08

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

78.7

22.4

44.64

gemma-3-1b-it

INT4-MIXED

1024

127.7

22.5

44.44

nanollava

INT8-CW

1752

281

22.5

44.44

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

161.3

22.5

44.44

llama-3.2-1b-instruct

INT8-CW

1024

133

22.6

44.25

minicpm4-0.5b

FP16

32

50.8

22.9

43.67

nanollava

INT4-MIXED

1752

281.8

22.9

43.67

qwen2.5-coder-0.5b-instruct

FP16

1024

88.1

23.2

43.10

minicpm4-0.5b

FP16

1024

92.2

23.8

42.02

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

83.4

24.5

40.82

gemma-3-1b-it

INT4-MIXED

1024

142.3

24.6

40.65

glm-edge-1.5b-chat

INT4-MIXED

32

90.8

24.7

40.49

gemma-3-1b-it

INT4-MIXED

32

71.5

24.8

40.32

qwen2.5-1.5b-instruct

INT4-MIXED

32

73.5

24.9

40.16

gemma-3-1b-it

INT4-MIXED

1024

148.6

25

40.00

gemma-3-1b-it

INT4-MIXED

32

60.2

25.2

39.68

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

199

25.3

39.53

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

69.4

25.3

39.53

gemma-3-1b-it

INT4-MIXED

32

81.4

25.4

39.37

qwen2.5-1.5b-instruct

INT4-MIXED

1024

195.5

25.5

39.22

qwen2.5-1.5b-instruct

INT4-MIXED

32

75

25.8

38.76

qwen2.5-1.5b-instruct

INT4-MIXED

1024

170

26.6

37.59

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

166.6

26.9

37.17

glm-edge-1.5b-chat

INT4-MIXED

1024

268.1

27.1

36.90

qwen2.5-1.5b-instruct

INT8-CW

32

74.6

27.5

36.36

qwen2.5-coder-1.5b-instruct

INT8-CW

32

76.3

27.9

35.84

glm-edge-1.5b-chat

INT8-CW

32

94.3

28.6

34.97

qwen2.5-coder-1.5b-instruct

INT8-CW

1024

180.9

28.8

34.72

qwen2.5-1.5b-instruct

INT8-CW

1024

173.6

29.1

34.36

gemma-3-1b-it

INT8-CW

1024

152

29.5

33.90

glm-edge-1.5b-chat

INT8-CW

1024

244.1

30.1

33.22

gemma-3-1b-it

INT8-CW

32

93.1

30.2

33.11

nanollava

FP16

760

202.5

32.7

30.58

nanollava

FP16

1752

324

33.8

29.59

phi-2

INT4-MIXED

32

136.9

34.8

28.74

llama-3.2-3b-instruct

INT4-MIXED

32

96

35.6

28.09

qwen2.5-coder-3b-instruct

INT4-MIXED

32

98.3

35.7

28.01

stable-zephyr-3b-dpo

INT4-MIXED

32

126.5

36.5

27.40

phi-2

INT4-MIXED

1024

393.7

36.6

27.32

stablelm-3b-4e1t

INT4-MIXED

32

123.3

36.7

27.25

afm-4.5b

INT4-MIXED

32

85.2

36.9

27.10

llama-3.2-3b-instruct

INT4-MIXED

1024

326.2

37.2

26.88

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

295.6

37.3

26.81

phi-2

INT4-MIXED

32

150.6

37.8

26.46

llama-3.2-3b-instruct

INT4-MIXED

32

96.4

37.9

26.39

qwen2.5-coder-3b-instruct

INT4-MIXED

32

123.5

37.9

26.39

qwen3.5-0.8b

FP16

85

230.3

38

26.32

afm-4.5b

INT4-MIXED

1024

460.5

38.3

26.11

stable-zephyr-3b-dpo

INT4-MIXED

1024

387.3

38.4

26.04

phi-3-mini-128k-instruct

INT4-MIXED

32

112.8

38.5

25.97

qwen3.5-0.8b

FP16

1024

397.7

38.6

25.91

phi-3-mini-4k-instruct

INT4-MIXED

32

106.8

38.7

25.84

phi-3.5-mini-instruct

INT4-MIXED

32

114.7

38.7

25.84

stablelm-3b-4e1t

INT4-MIXED

1024

384.3

38.8

25.77

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

497.2

39.6

25.25

stable-zephyr-3b-dpo

INT4-MIXED

32

140

39.7

25.19

llama-3.2-3b-instruct

INT4-MIXED

1024

417.6

39.9

25.06

phi-2

INT4-MIXED

1024

486.7

40.3

24.81

chatglm3-6b

INT4-MIXED

32

73.5

40.6

24.63

phi-3-mini-128k-instruct

INT4-MIXED

32

112.9

41.6

24.04

phi-3-mini-128k-instruct

INT4-MIXED

1024

500.9

41.7

23.98

phi-3-mini-4k-instruct

INT4-MIXED

1024

505.8

41.8

23.92

deepseek-r1-distill-qwen-1.5b

FP16

32

66.6

41.9

23.87

phi-3.5-mini-instruct

INT4-MIXED

1024

511.7

41.9

23.87

stable-zephyr-3b-dpo

INT4-MIXED

1024

462.5

41.9

23.87

chatglm3-6b

INT4-MIXED

1024

484.7

42.1

23.75

gemma-3-1b-it

FP16

1024

178.3

42.4

23.58

deepseek-r1-distill-qwen-1.5b

FP16

1024

200.9

42.5

23.53

gemma-3-1b-it

FP16

32

88.6

42.5

23.53

chatglm3-6b

INT4-MIXED

32

68.5

43

23.26

phi-3-mini-128k-instruct

INT4-MIXED

1024

655

44.3

22.57

chatglm3-6b

INT4-MIXED

1024

678.8

44.5

22.47

llama-3.2-3b-instruct

INT4-MIXED

32

87

44.7

22.37

phi-2

INT8-CW

32

117.5

44.7

22.37

phi-3-mini-4k-instruct

INT4-MIXED

32

113.8

44.9

22.27

llama-3.2-1b-instruct

FP16

32

62.9

45.1

22.17

qwen3-4b

INT4-MIXED

32

130.7

45.1

22.17

stablelm-3b-4e1t

INT4-MIXED

32

132.6

45.1

22.17

phi-3.5-mini-instruct

INT4-MIXED

32

122.1

45.3

22.08

phi-4-mini-reasoning

INT4-MIXED

32

147.1

45.4

22.03

stablelm-3b-4e1t

INT8-CW

32

120.9

45.5

21.98

phi-4-mini-instruct

INT4-MIXED

32

139.3

45.7

21.88

stable-zephyr-3b-dpo

INT8-CW

32

130

45.8

21.83

llama-3.2-1b-instruct

FP16

1024

202.3

45.9

21.79

llama-3.2-3b-instruct

INT4-MIXED

1024

335.3

46.6

21.46

phi-4-mini-reasoning

INT4-MIXED

1024

430.5

47.2

21.19

falcon-7b-instruct

INT4-MIXED

32

87.8

47.8

20.92

phi-4-mini-instruct

INT4-MIXED

1024

423.4

47.8

20.92

biomistral-7b-slerp

INT4-MIXED

7

65.7

47.9

20.88

phi-2

INT8-CW

1024

410.1

48.1

20.79

phi-3-mini-4k-instruct

INT4-MIXED

1024

643.9

48.2

20.75

phi-4-mini-reasoning

INT4-MIXED

32

156.7

48.2

20.75

qwen3-4b

INT4-MIXED

1024

474.9

48.5

20.62

stablelm-3b-4e1t

INT4-MIXED

1024

414.6

48.5

20.62

internvl2-4b

INT4-MIXED

297

308.3

48.7

20.53

phi-3.5-mini-instruct

INT4-MIXED

1024

667.8

48.8

20.49

stable-zephyr-3b-dpo

INT8-CW

1024

413.9

48.8

20.49

stablelm-3b-4e1t

INT8-CW

1024

422.7

49

20.41

falcon-7b-instruct

INT4-MIXED

1024

596.1

49.4

20.24

phi-4-mini-reasoning

INT4-MIXED

1024

533.3

49.7

20.12

llama-3.2-3b-instruct

INT8-CW

32

85.7

50.5

19.80

phi-3.5-vision-instruct

INT4-MIXED

802

695.8

50.6

19.76

phi-3-mini-4k-instruct

INT4-MIXED

32

114.5

50.7

19.72

qwen2.5-coder-3b-instruct

INT8-CW

32

128.4

50.8

19.69

biomistral-7b-slerp

INT4-MIXED

7

74.8

51.1

19.57

phi-4-mini-instruct

INT4-MIXED

32

161.3

51.1

19.57

phi-3.5-mini-instruct

INT4-MIXED

32

110.4

51.3

19.49

afm-4.5b

INT8-CW

32

76.3

51.5

19.42

qwen2.5-coder-3b-instruct

INT8-CW

1024

412.6

52.1

19.19

llama-3.2-3b-instruct

INT8-CW

1024

355.7

52.3

19.12

deepseek-r1-distill-llama-8b

INT4-MIXED

32

84.5

52.5

19.05

internvl2-4b

INT4-MIXED

1027

814.3

52.7

18.98

phi-3.5-vision-instruct

INT4-MIXED

1032

863.8

52.7

18.98

qwen2.5-coder-3b-instruct

INT4-MIXED

32

68.3

52.8

18.94

afm-4.5b

INT8-CW

1024

344.2

53

18.87

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

80.4

53.8

18.59

phi-4-mini-instruct

INT4-MIXED

1024

565.7

54

18.52

phi-4-mini-reasoning

INT4-MIXED

32

162.2

54.4

18.38

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

315.5

54.4

18.38

phi-3.5-mini-instruct

INT4-MIXED

1024

508.3

54.5

18.35

phi-3-mini-4k-instruct

INT4-MIXED

1024

502.7

54.6

18.32

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

580.6

54.7

18.28

phi-4-mini-instruct

INT4-MIXED

32

130.5

55

18.18

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

724.8

55.3

18.08

whisper-large-v3

INT4-MIXED

prompt1

550.9

55.3

18.08

whisper-large-v3

INT4-MIXED

prompt0

502.8

55.7

17.95

whisper-large-v3

INT8-CW

prompt1

526.1

55.7

17.95

gemma-4-e2b-it

INT4-MIXED

274

861.1

55.8

17.92

falcon-7b-instruct

INT4-MIXED

32

91.6

56

17.86

internvl2-4b

INT4-MIXED

297

275.5

56.2

17.79

qwen2.5-1.5b-instruct

FP16

32

89.8

56.9

17.57

qwen2.5-coder-1.5b-instruct

FP16

32

91.6

56.9

17.57

phi-4-mini-instruct

INT4-MIXED

1024

431.9

57.4

17.42

phi-4-mini-reasoning

INT4-MIXED

1024

442.9

57.4

17.42

falcon-7b-instruct

INT4-MIXED

1024

795.7

57.5

17.39

qwen2.5-coder-1.5b-instruct

FP16

1024

254.6

57.6

17.36

glm-edge-4b-chat

INT4-MIXED

32

206

57.7

17.33

qwen2.5-1.5b-instruct

FP16

1024

249.4

57.7

17.33

whisper-large-v3

INT8-CW

prompt0

473.8

57.7

17.33

phi-3-mini-4k-instruct

INT8-CW

32

105.6

57.9

17.27

phi-3-mini-128k-instruct

INT8-CW

32

102.7

58

17.24

phi-3.5-mini-instruct

INT8-CW

32

96.8

58.1

17.21

glm-edge-1.5b-chat

FP16

32

131.8

58.3

17.15

llama-2-7b-chat-hf

INT4-MIXED

32

118.9

58.3

17.15

glm-edge-1.5b-chat

FP16

1024

315.8

59.2

16.89

qwen3-4b

INT4-MIXED

32

124.1

59.4

16.84

glm-edge-4b-chat

INT4-MIXED

1024

701.1

59.7

16.75

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

75.3

60

16.67

gemma-4-e2b-it

INT4-MIXED

274

875.5

60.1

16.64

internvl2-4b

INT4-MIXED

1027

713.8

61

16.39

mistral-7b-instruct-v0.3

INT4-MIXED

32

106.4

61.1

16.37

gemma-3-4b-it

INT4-MIXED

32

220.3

61.3

16.31

mistral-7b-instruct-v0.2

INT4-MIXED

32

108.5

61.3

16.31

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

537.7

61.4

16.29

phi-4-mini-instruct

INT8-CW

32

146.1

61.4

16.29

phi-4-mini-reasoning

INT8-CW

32

121.7

61.4

16.29

phi-3-mini-128k-instruct

INT8-CW

1024

541.5

61.5

16.26

phi-3.5-mini-instruct

INT8-CW

1024

543.9

61.7

16.21

phi-3-mini-4k-instruct

INT8-CW

1024

543

61.8

16.18

deepseek-r1-distill-llama-8b

INT4-MIXED

32

86.2

62.6

15.97

gemma-4-e2b-it

INT4-MIXED

1024

1052.7

62.7

15.95

gemma-3-4b-it

INT4-MIXED

32

213.2

62.9

15.90

internvl2-4b

INT8-CW

297

299.9

63.1

15.85

qwen3-4b

INT4-MIXED

1024

480.7

63.1

15.85

gemma-3-4b-it

INT4-MIXED

1024

513

63.3

15.80

llama-2-7b-chat-hf

INT4-MIXED

1024

715.1

63.4

15.77

mistral-7b-instruct-v0.2

INT4-MIXED

1024

733.2

63.8

15.67

phi-4-mini-instruct

INT8-CW

1024

464.7

63.8

15.67

phi-4-mini-reasoning

INT8-CW

1024

465.8

63.8

15.67

qwen3-vl-4b-instruct

INT4-MIXED

4907

7718.6

64

15.63

qwen3-4b

INT8-CW

32

115.3

64.1

15.60

mistral-7b-instruct-v0.3

INT4-MIXED

1024

731.2

64.2

15.58

qwen3-vl-4b-instruct

INT4-MIXED

4937

7877.4

64.2

15.58

qwen2.5-7b-instruct-1m

INT4-MIXED

32

93.3

64.3

15.55

qwen2.5-7b-instruct

INT4-MIXED

32

89.6

64.4

15.53

mistral-7b-instruct-v0.2

INT4-MIXED

32

130.3

64.5

15.50

mistral-7b-instruct-v0.3

INT4-MIXED

32

110.3

64.6

15.48

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

600

64.7

15.46

whisper-large-v3

FP16

prompt0

510.4

64.8

15.43

mistral-7b-instruct-v0.1

INT4-MIXED

32

113.8

65

15.38

whisper-large-v3

FP16

prompt1

566.9

65

15.38

gemma-4-e2b-it

INT4-MIXED

1024

1082.6

65.1

15.36

gemma-4-e2b-it

INT8-CW

274

844.5

65.1

15.36

qwen2.5-7b-instruct

INT4-MIXED

1024

645.4

65.4

15.29

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

654.6

65.4

15.29

deepseek-r1-distill-llama-8b

INT4-MIXED

32

99.8

65.5

15.27

gemma-3-4b-it

INT4-MIXED

1024

617.2

66.2

15.11

gpt-oss-20b

INT4-MIXED

32

250.6

66.2

15.11

llama-3.1-8b-instruct

INT4-MIXED

32

118.3

66.3

15.08

phi-3.5-vision-instruct

INT8-CW

802

602.3

66.3

15.08

llama-3-8b-instruct

INT4-MIXED

32

110.7

66.5

15.04

qwen3-4b

INT8-CW

1024

502.4

66.9

14.95

gpt-oss-20b

INT4-MIXED

1024

3204.9

67.2

14.88

mistral-7b-instruct-v0.1

INT4-MIXED

1025

1008.6

67.4

14.84

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

759.5

67.6

14.79

mistral-7b-instruct-v0.2

INT4-MIXED

1024

999.4

67.6

14.79

phi-3.5-vision-instruct

INT8-CW

1032

775.8

67.6

14.79

qwen3-vl-4b-instruct

INT4-MIXED

4937

8299.5

67.6

14.79

internvl2-4b

INT8-CW

1027

732.4

67.7

14.77

minicpm-v-2_6

INT4-MIXED

228

726.5

67.7

14.77

mistral-7b-instruct-v0.3

INT4-MIXED

1024

1022.7

67.7

14.77

minicpm-o-2_6

INT4-MIXED

238

744

68

14.71

chatglm3-6b

INT8-CW

32

90.8

68.1

14.68

minicpm4-8b

INT4-MIXED

32

120.3

68.1

14.68

qwen2.5-7b-instruct

INT4-MIXED

32

119.1

68.1

14.68

qwen3-vl-4b-instruct

INT4-MIXED

4907

8220.5

68.1

14.68

qwen2-7b-instruct

INT4-MIXED

32

114.5

68.2

14.66

qwen2.5-7b-instruct-1m

INT4-MIXED

32

109.2

68.3

14.64

minicpm3-4b

INT4-MIXED

32

309.7

68.5

14.60

glm-edge-4b-chat

INT8-CW

32

192.4

68.9

14.51

phi-4-multimodal-instruct

INT4-MIXED

578

598.2

68.9

14.51

phi-4-multimodal-instruct

INT4-MIXED

786

685.6

68.9

14.51

llama-3-8b-instruct

INT4-MIXED

1024

714.2

69.4

14.41

qwen3-8b

INT4-MIXED

32

135.8

69.6

14.37

chatglm3-6b

INT8-CW

1024

481.5

69.7

14.35

llama-3.1-8b-instruct

INT4-MIXED

1024

729

69.7

14.35

qwen2.5-7b-instruct

INT4-MIXED

1024

897.8

69.8

14.33

phi-4-multimodal-instruct

INT4-MIXED

1362

1481.7

70.2

14.25

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

894

70.2

14.25

minicpm4-8b

INT4-MIXED

1024

798.7

70.4

14.20

phi-4-multimodal-instruct

INT4-MIXED

1570

1638.5

70.4

14.20

qwen2-7b-instruct

INT4-MIXED

1024

889.5

70.4

14.20

llama-3-8b-instruct

INT4-MIXED

32

121.2

70.5

14.18

gemma-3-4b-it

INT4-MIXED

32

208.8

71

14.08

gemma-4-e2b-it

INT8-CW

1024

1063.7

71.7

13.95

glm-edge-4b-chat

INT8-CW

1024

613.1

71.9

13.91

minicpm-v-2_6

INT4-MIXED

228

811.4

72

13.89

minicpm-o-2_6

INT4-MIXED

238

806.5

72.2

13.85

qwen3-8b

INT4-MIXED

1024

775.9

72.3

13.83

minicpm4-8b

INT4-MIXED

32

136.6

72.6

13.77

qwen3-8b

INT4-MIXED

32

139.6

73

13.70

llama-3-8b-instruct

INT4-MIXED

1024

981.8

73.1

13.68

minicpm-v-4_5

INT4-MIXED

217

830.3

73.9

13.53

gemma-3-4b-it

INT8-CW

32

205.4

74.4

13.44

minicpm4-8b

INT4-MIXED

1024

1088.7

74.5

13.42

minicpm3-4b

INT4-MIXED

32

355.7

75.5

13.25

gemma-3-4b-it

INT4-MIXED

1024

523.7

76.5

13.07

minicpm3-4b

INT4-MIXED

1024

830.4

76.9

13.00

minicpm-v-4_5

INT4-MIXED

217

878.1

77.5

12.90

minicpm3-4b

INT4-MIXED

32

295

77.5

12.90

glm-4-9b-chat-hf

INT4-MIXED

32

130

77.6

12.89

qwen3.5-9b

INT4-MIXED

83

458.9

78.3

12.77

qwen3.5-9b

INT4-MIXED

1024

1267.7

78.8

12.69

qwen3-vl-4b-instruct

INT4-MIXED

4937

7956

78.9

12.67

gemma-3-4b-it

INT8-CW

1024

536.9

79

12.66

gemma-7b-it

INT4-MIXED

32

118

79

12.66

qwen3-vl-4b-instruct

INT4-MIXED

4907

7691.1

79.3

12.61

minicpm3-4b

INT8-CW

32

328.7

80.2

12.47

gpt-oss-20b

INT4-MIXED

32

289.1

80.3

12.45

biomistral-7b-slerp

INT8-CW

7

107.1

80.5

12.42

glm-4-9b-chat-hf

INT4-MIXED

1024

890.3

80.5

12.42

deepseek-r1-distill-qwen-7b

INT8-CW

32

103.2

81.2

12.32

llama-2-7b-chat-hf

INT4-MIXED

32

101.8

82.1

12.18

qwen3.5-9b

INT4-MIXED

83

497.7

82.1

12.18

glm-4-9b-chat-hf

INT4-MIXED

32

140.3

82.5

12.12

qwen3.5-9b

INT4-MIXED

1024

1532

82.6

12.11

phi-4-multimodal-instruct

INT8-CW

578

627.9

82.8

12.08

qwen2.5-vl-7b-instruct

INT4-MIXED

32

314.1

82.8

12.08

deepseek-r1-distill-qwen-7b

INT8-CW

1024

509.4

82.9

12.06

qwen3-vl-4b-instruct

INT8-CW

4907

7760.6

83.1

12.03

qwen3-vl-4b-instruct

INT8-CW

4937

7793.8

83.1

12.03

llama-3.1-8b-instruct

INT4-MIXED

32

129.7

83.2

12.02

gemma-7b-it

INT4-MIXED

1024

877

83.3

12.00

phi-4-multimodal-instruct

INT8-CW

786

739.4

83.4

11.99

qwen2-7b-instruct

INT4-MIXED

32

98.6

83.5

11.98

minicpm3-4b

INT4-MIXED

1024

950.8

83.6

11.96

qwen2.5-7b-instruct

INT4-MIXED

32

98.2

83.8

11.93

gemma-7b-it

INT4-MIXED

32

128.3

83.9

11.92

mistral-7b-instruct-v0.3

INT4-MIXED

32

104.4

83.9

11.92

mistral-7b-instruct-v0.2

INT4-MIXED

32

102

84

11.90

phi-4-multimodal-instruct

INT8-CW

1362

1565

84.1

11.89

phi-4-multimodal-instruct

INT8-CW

1570

1759.2

84.1

11.89

glm-4-9b-chat-hf

INT4-MIXED

1024

1176.7

84.2

11.88

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

838.1

84.5

11.83

gpt-oss-20b

INT4-MIXED

1024

3427.6

85.2

11.74

qwen2.5-7b-instruct

INT4-MIXED

1024

656.2

85.3

11.72

minicpm3-4b

INT4-MIXED

1024

878

85.5

11.70

qwen2-7b-instruct

INT4-MIXED

1024

655.1

85.5

11.70

deepseek-r1-distill-llama-8b

INT8-CW

32

112.4

85.7

11.67

llama-3.1-8b-instruct

INT4-MIXED

1024

947.9

85.9

11.64

llava-next-video-7b-hf

INT4-MIXED

2945

4195.8

86.1

11.61

phi-4-multimodal-instruct

INT4-MIXED

578

750.6

86.2

11.60

minicpm-v-2_6

INT4-MIXED

228

705

86.4

11.57

mistral-7b-instruct-v0.3

INT4-MIXED

1024

700.9

86.4

11.57

qwen2.5-vl-7b-instruct

INT4-MIXED

32

360.6

86.5

11.56

fara-7b

INT4-MIXED

32

411.5

86.6

11.55

mistral-7b-instruct-v0.2

INT4-MIXED

1024

718

86.6

11.55

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

86

86.8

11.52

llama-2-7b-chat-hf

INT4-MIXED

1024

717.3

86.9

11.51

phi-4-multimodal-instruct

INT4-MIXED

1362

1827.9

87

11.49

gemma-2-9b-it

INT4-MIXED

32

178.7

87.4

11.44

minicpm3-4b

INT8-CW

1024

895.7

87.4

11.44

deepseek-r1-distill-llama-8b

INT8-CW

1024

567.3

88

11.36

phi-4-multimodal-instruct

INT4-MIXED

1570

2034.6

88

11.36

phi-4-multimodal-instruct

INT4-MIXED

786

905.4

88.1

11.35

gemma-7b-it

INT4-MIXED

1024

1156.8

88.2

11.34

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

1088.4

88.5

11.30

fara-7b

INT4-MIXED

1024

1293.2

88.9

11.25

llama-3-8b-instruct

INT4-MIXED

32

113

89.4

11.19

llama-3-8b-instruct

INT4-MIXED

32

109.8

89.6

11.16

llama-3.1-8b-instruct

INT4-MIXED

32

115.1

89.7

11.15

phi-2

FP16

32

135.2

90

11.11

stable-zephyr-3b-dpo

FP16

32

123.7

91.5

10.93

stablelm-3b-4e1t

FP16

32

147.6

92

10.87

llama-3-8b-instruct

INT4-MIXED

1024

725.3

92.1

10.86

llama-3-8b-instruct

INT4-MIXED

1024

723.5

92.2

10.85

llama-3.1-8b-instruct

INT4-MIXED

1024

714.3

92.2

10.85

gemma-2-9b-it

INT4-MIXED

1024

1006.2

92.4

10.82

qwen3-8b

INT4-MIXED

32

135.3

92.5

10.81

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

92.5

92.7

10.79

minicpm4-8b

INT4-MIXED

32

116.4

93.3

10.72

phi-2

FP16

1024

616.7

94.5

10.58

minicpm4-8b

INT4-MIXED

1024

798.7

95.3

10.49

qwen3-8b

INT4-MIXED

1024

774.1

95.8

10.44

stable-zephyr-3b-dpo

FP16

1024

615.1

96.8

10.33

stablelm-3b-4e1t

FP16

1024

598.5

96.8

10.33

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

146

97.3

10.28

ltx-video

INT8-CW

11

98.3

97.9

10.21

llama-2-7b-chat-hf

INT8-CW

32

139.2

98.4

10.16

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

2nd token per sec

distil-large-v2

INT4-MIXED

prompt0

166.7

5.9

169.49

distil-large-v2

INT4-MIXED

prompt1

211.5

5.9

169.49

distil-large-v2

INT8-CW

prompt0

163.9

6

166.67

distil-large-v2

INT8-CW

prompt1

206.9

6

166.67

whisper-large-v3-turbo

INT8-CW

prompt1

225.5

6.3

158.73

whisper-large-v3-turbo

INT8-CW

prompt0

184.1

6.9

144.93

whisper-large-v3-turbo

INT4-MIXED

prompt0

183

7

142.86

whisper-large-v3-turbo

INT4-MIXED

prompt1

232.7

7

142.86

minicpm4-0.5b

INT4-MIXED

32

28.3

7.3

136.99

minicpm4-0.5b

INT4-MIXED

1024

44.1

7.4

135.14

minicpm4-0.5b

INT4-MIXED

32

28.4

7.4

135.14

tiny-llama-1.1b-chat

INT4-MIXED

32

24.9

7.4

135.14

minicpm4-0.5b

INT4-MIXED

1024

45.6

7.5

133.33

minicpm4-0.5b

INT8-CW

32

30.5

7.6

131.58

minicpm4-0.5b

INT4-MIXED

1024

53.1

7.7

129.87

minicpm4-0.5b

INT4-MIXED

32

29.9

7.7

129.87

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

28.4

7.7

129.87

gemma-3-270m

INT4-MIXED

1024

33.7

7.8

128.21

minicpm4-0.5b

INT8-CW

1024

52.4

7.8

128.21

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

46.2

7.8

128.21

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

55

8

125.00

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

29.9

8

125.00

tiny-llama-1.1b-chat

INT4-MIXED

32

26.4

8

125.00

tiny-llama-1.1b-chat

INT4-MIXED

1024

87.4

8

125.00

gemma-3-270m

INT4-MIXED

32

29.2

8.1

123.46

distil-large-v2

FP16

prompt0

164.7

8.2

121.95

distil-large-v2

FP16

prompt1

210

8.2

121.95

gemma-3-270m

FP16

32

25.4

8.2

121.95

gemma-3-270m

FP16

1024

32

8.3

120.48

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

28.7

8.5

117.65

qwen2.5-coder-0.5b-instruct

INT8-CW

1024

53.2

8.6

116.28

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

45.7

8.7

114.94

tiny-llama-1.1b-chat

INT4-MIXED

1024

112.5

8.7

114.94

whisper-large-v3-turbo

FP16

prompt0

195.6

8.7

114.94

gemma-3-270m

INT8-CW

1024

34.6

8.8

113.64

qwen2.5-coder-0.5b-instruct

INT8-CW

32

30.8

8.8

113.64

whisper-large-v3-turbo

FP16

prompt1

251.5

8.9

112.36

gemma-3-270m

INT8-CW

32

29.1

9.2

108.70

llama-3.2-1b-instruct

INT4-MIXED

32

21.5

9.5

105.26

llama-3.2-1b-instruct

INT4-MIXED

32

22

9.6

104.17

llama-3.2-1b-instruct

INT4-MIXED

1024

84

10

100.00

llama-3.2-1b-instruct

INT4-MIXED

1024

98.2

10.2

98.04

whisper-small

INT4-MIXED

prompt0

94.8

10.3

97.09

whisper-small

INT4-MIXED

prompt1

137.9

10.5

95.24

minicpm4-0.5b

FP16

32

24.8

10.9

91.74

whisper-small

INT4-MIXED

prompt0

99.6

10.9

91.74

whisper-small

INT4-MIXED

prompt1

145.1

10.9

91.74

whisper-small

INT8-CW

prompt0

99.5

11

90.91

whisper-small

INT8-CW

prompt1

143.3

11

90.91

minicpm4-0.5b

FP16

1024

60.3

11.3

88.50

qwen2.5-1.5b-instruct

INT4-MIXED

32

34.1

11.6

86.21

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

32.8

11.6

86.21

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

33.6

11.8

84.75

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

33.7

11.8

84.75

qwen2.5-1.5b-instruct

INT4-MIXED

32

33.8

11.9

84.03

gemma-3-1b-it

INT4-MIXED

1024

85.5

12

83.33

gemma-3-1b-it

INT4-MIXED

1024

87.6

12

83.33

nanollava

INT4-MIXED

760

98.8

12.1

82.64

qwen2.5-coder-0.5b-instruct

FP16

32

27

12.1

82.64

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

36

12.1

82.64

tiny-llama-1.1b-chat

INT8-CW

32

27.6

12.1

82.64

whisper-small

FP16

prompt0

96.9

12.1

82.64

whisper-small

FP16

prompt1

139.5

12.1

82.64

nanollava

INT8-CW

760

94.7

12.2

81.97

nanollava

INT4-MIXED

1752

190.2

12.3

81.30

qwen2.5-1.5b-instruct

INT4-MIXED

1024

135.5

12.3

81.30

qwen2.5-coder-0.5b-instruct

FP16

1024

54

12.3

81.30

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

137.1

12.4

80.65

qwen2.5-1.5b-instruct

INT4-MIXED

1024

135.8

12.4

80.65

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

125.8

12.4

80.65

nanollava

INT8-CW

1752

184.1

12.5

80.00

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

135.7

12.6

79.37

qwen2.5-1.5b-instruct

INT4-MIXED

32

38.1

12.7

78.74

tiny-llama-1.1b-chat

INT8-CW

1024

100.8

12.7

78.74

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

143.5

12.8

78.13

gemma-3-1b-it

INT4-MIXED

1024

93.1

12.9

77.52

qwen2.5-1.5b-instruct

INT4-MIXED

32

36.6

12.9

77.52

nanollava

FP16

760

104.7

13

76.92

gemma-3-1b-it

INT4-MIXED

32

38.6

13.1

76.34

qwen2.5-1.5b-instruct

INT4-MIXED

1024

144

13.3

75.19

whisper-small

INT4-MIXED

prompt1

147.6

13.3

75.19

qwen2.5-1.5b-instruct

INT4-MIXED

1024

142.8

13.5

74.07

gemma-3-1b-it

INT4-MIXED

32

37

13.6

73.53

whisper-small

INT4-MIXED

prompt0

102.7

13.6

73.53

gemma-3-1b-it

INT4-MIXED

32

39.5

14.1

70.92

gemma-3-1b-it

INT8-CW

1024

99.1

14.3

69.93

nanollava

FP16

1752

193

14.3

69.93

llama-3.2-1b-instruct

INT8-CW

32

23.2

14.7

68.03

gemma-3-1b-it

INT8-CW

32

43.5

14.9

67.11

glm-edge-1.5b-chat

INT4-MIXED

1024

205

15

66.67

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

37.5

15.1

66.23

glm-edge-1.5b-chat

INT4-MIXED

32

56.1

15.1

66.23

llama-3.2-1b-instruct

INT8-CW

1024

101.5

15.5

64.52

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

155.6

15.9

62.89

minicpm-1b-sft

INT4-MIXED

31

59.6

16.5

60.61

minicpm-1b-sft

INT4-MIXED

1014

137.9

16.8

59.52

minicpm-1b-sft

INT4-MIXED

31

60.3

17

58.82

minicpm-1b-sft

INT4-MIXED

1014

161.3

17.1

58.48

qwen3.5-0.8b

INT4-MIXED

85

147.3

17.3

57.80

qwen3.5-0.8b

INT4-MIXED

1024

281

17.5

57.14

gemma-2b-it

INT4-MIXED

32

26.9

17.8

56.18

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

38.3

17.9

55.87

qwen3.5-0.8b

INT4-MIXED

1024

294.4

18

55.56

qwen2.5-1.5b-instruct

INT8-CW

32

38

18.1

55.25

minicpm-1b-sft

INT4-MIXED

1014

134

18.2

54.95

minicpm-1b-sft

INT4-MIXED

31

60.6

18.2

54.95

qwen2.5-1.5b-instruct

INT8-CW

32

38.3

18.2

54.95

qwen2.5-coder-1.5b-instruct

INT8-CW

32

37.9

18.2

54.95

qwen3.5-0.8b

INT4-MIXED

85

158

18.2

54.95

glm-edge-1.5b-chat

INT8-CW

32

56.3

18.5

54.05

qwen3.5-0.8b

INT8-CW

1024

296.7

18.5

54.05

gemma-2b-it

INT4-MIXED

1024

171

18.6

53.76

phi-2

INT4-MIXED

32

60.2

18.6

53.76

gemma-2b-it

INT4-MIXED

32

32.2

18.7

53.48

stable-zephyr-3b-dpo

INT4-MIXED

32

59.5

18.7

53.48

stablelm-3b-4e1t

INT4-MIXED

32

59.6

18.7

53.48

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

132.6

19

52.63

minicpm-1b-sft

INT8-CW

31

66.5

19.1

52.36

qwen2.5-1.5b-instruct

INT8-CW

1024

139.8

19.1

52.36

qwen2.5-1.5b-instruct

INT8-CW

1024

145.4

19.1

52.36

qwen2.5-coder-1.5b-instruct

INT8-CW

1024

141.9

19.1

52.36

qwen3.5-0.8b

INT4-MIXED

1024

292.5

19.1

52.36

qwen3.5-0.8b

INT8-CW

85

158.7

19.3

51.81

gemma-2b-it

INT4-MIXED

1024

203.6

19.4

51.55

glm-edge-1.5b-chat

INT8-CW

1024

211.3

19.7

50.76

gemma-2-2b

INT4-MIXED

33

39

20

50.00

phi-2

INT4-MIXED

32

63.3

20.2

49.50

minicpm-1b-sft

INT8-CW

1014

150.6

20.4

49.02

qwen3.5-0.8b

INT4-MIXED

85

150.1

20.5

48.78

gemma-2-2b

INT4-MIXED

33

44.8

20.9

47.85

phi-2

INT4-MIXED

1024

309.7

20.9

47.85

stable-zephyr-3b-dpo

INT4-MIXED

1024

303.8

21

47.62

stablelm-3b-4e1t

INT4-MIXED

1024

304.5

21

47.62

qwen2.5-coder-3b-instruct

INT4-MIXED

32

59.4

21.2

47.17

qwen2.5-coder-3b-instruct

INT4-MIXED

32

53

21.2

47.17

gemma-2-2b

INT4-MIXED

1025

209.2

21.3

46.95

qwen2.5-1.5b-instruct

INT4-MIXED

32

74.1

21.3

46.95

stable-zephyr-3b-dpo

INT4-MIXED

32

67.9

21.3

46.95

qwen2.5-1.5b-instruct

INT4-MIXED

1024

185.2

21.4

46.73

llama-3.2-3b-instruct

INT4-MIXED

32

43.9

21.7

46.08

qwen2.5-1.5b-instruct

INT4-MIXED

32

75.8

22.1

45.25

gemma-2-2b

INT4-MIXED

1025

227.1

22.2

45.05

qwen2.5-1.5b-instruct

INT4-MIXED

1024

193.9

22.2

45.05

phi-2

INT4-MIXED

1024

383.8

22.5

44.44

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

234.8

22.5

44.44

llama-3.2-3b-instruct

INT4-MIXED

32

43.9

22.6

44.25

qwen2.5-coder-3b-instruct

INT4-MIXED

32

58.4

22.6

44.25

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

240.1

22.6

44.25

stablelm-3b-4e1t

INT4-MIXED

32

65.9

22.8

43.86

llama-3.2-3b-instruct

INT4-MIXED

32

44.2

22.9

43.67

llama-3.2-3b-instruct

INT4-MIXED

1024

243.1

23.3

42.92

stable-zephyr-3b-dpo

INT4-MIXED

1024

336.7

23.5

42.55

phi-3-mini-4k-instruct

INT4-MIXED

32

49.5

23.7

42.19

gemma-3-1b-it

FP16

1024

108.2

23.8

42.02

phi-3-mini-128k-instruct

INT4-MIXED

32

49

23.8

42.02

phi-3.5-mini-instruct

INT4-MIXED

32

50

23.8

42.02

qwen3.5-0.8b

FP16

1024

294

23.8

42.02

qwen3.5-0.8b

FP16

85

122.2

23.8

42.02

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

427.9

24

41.67

gemma-3-1b-it

FP16

32

40.5

24.1

41.49

llama-3.2-3b-instruct

INT4-MIXED

1024

254

24.3

41.15

llama-3.2-3b-instruct

INT4-MIXED

1024

277.5

24.6

40.65

phi-3-mini-4k-instruct

INT4-MIXED

32

49.7

24.7

40.49

phi-3.5-mini-instruct

INT4-MIXED

32

50.2

24.8

40.32

stablelm-3b-4e1t

INT4-MIXED

1024

332.3

25

40.00

qwen2.5-1.5b-instruct

INT8-CW

32

80.6

25.1

39.84

phi-3-mini-128k-instruct

INT4-MIXED

32

50.7

25.2

39.68

qwen2.5-1.5b-instruct

INT8-CW

1024

180.9

25.7

38.91

tiny-llama-1.1b-chat

FP16

32

30.3

26.2

38.17

phi-3-mini-4k-instruct

INT4-MIXED

1024

373.5

26.5

37.74

phi-3-mini-128k-instruct

INT4-MIXED

1024

375.3

26.6

37.59

phi-3.5-mini-instruct

INT4-MIXED

1024

372.5

26.7

37.45

phi-3-mini-4k-instruct

INT4-MIXED

32

50.1

26.8

37.31

qwen3-4b

INT4-MIXED

32

58.6

26.8

37.31

tiny-llama-1.1b-chat

FP16

1024

113.8

26.9

37.17

phi-3.5-mini-instruct

INT4-MIXED

32

53.5

27.1

36.90

internvl2-4b

INT4-MIXED

297

221.1

27.3

36.63

internvl2-4b

INT4-MIXED

297

215.9

27.5

36.36

phi-4-mini-reasoning

INT4-MIXED

32

63.7

27.5

36.36

phi-3.5-mini-instruct

INT4-MIXED

1024

391.6

27.6

36.23

qwen3-4b

INT4-MIXED

32

60

27.6

36.23

phi-4-mini-instruct

INT4-MIXED

32

64.5

27.7

36.10

phi-3-mini-4k-instruct

INT4-MIXED

1024

391.1

28

35.71

gemma-4-e2b-it

INT4-MIXED

274

426.9

28.1

35.59

afm-4.5b

INT4-MIXED

32

51.9

28.2

35.46

phi-3-mini-128k-instruct

INT4-MIXED

1024

488.9

28.2

35.46

gpt-oss-20b

INT4-MIXED

32

207.2

28.3

35.34

phi-4-mini-instruct

INT4-MIXED

32

64.5

28.4

35.21

gemma-2b-it

INT8-CW

32

36.2

28.5

35.09

phi-4-mini-reasoning

INT4-MIXED

32

65.6

28.5

35.09

phi-4-mini-reasoning

INT4-MIXED

32

66.4

28.7

34.84

gemma-3-4b-it

INT4-MIXED

32

82.9

28.9

34.60

gemma-4-e2b-it

INT4-MIXED

274

457.9

28.9

34.60

gemma-2b-it

INT8-CW

1024

211.4

29.4

34.01

phi-3.5-vision-instruct

INT4-MIXED

802

489

29.5

33.90

phi-4-mini-reasoning

INT4-MIXED

1024

375.3

29.5

33.90

qwen3-4b

INT4-MIXED

1024

347

29.5

33.90

afm-4.5b

INT4-MIXED

1024

391.1

29.7

33.67

phi-3-mini-4k-instruct

INT4-MIXED

1024

524.3

29.7

33.67

gemma-3-4b-it

INT4-MIXED

32

85.3

29.8

33.56

phi-3.5-mini-instruct

INT4-MIXED

1024

548.9

29.8

33.56

phi-4-mini-instruct

INT4-MIXED

1024

375.6

29.8

33.56

llama-3.2-1b-instruct

FP16

32

35

29.9

33.44

whisper-large-v3

INT4-MIXED

prompt0

285.2

29.9

33.44

gemma-3-4b-it

INT4-MIXED

32

86

30.1

33.22

qwen3-4b

INT4-MIXED

1024

384

30.1

33.22

internvl2-4b

INT4-MIXED

1027

588.9

30.2

33.11

glm-edge-4b-chat

INT4-MIXED

32

95

30.3

33.00

phi-4-mini-reasoning

INT4-MIXED

1024

387.9

30.3

33.00

phi-3.5-vision-instruct

INT4-MIXED

1032

607.6

30.4

32.89

phi-4-mini-instruct

INT4-MIXED

1024

391.1

30.4

32.89

gemma-3-4b-it

INT4-MIXED

1024

370.1

30.5

32.79

internvl2-4b

INT4-MIXED

1027

589.7

30.5

32.79

phi-4-mini-reasoning

INT4-MIXED

1024

407.8

30.5

32.79

llama-3.2-1b-instruct

FP16

1024

135.5

30.6

32.68

gpt-oss-20b

INT4-MIXED

1024

671.5

30.7

32.57

whisper-large-v3

INT4-MIXED

prompt1

333.6

30.7

32.57

gemma-2-2b

INT8-CW

33

47.4

30.8

32.47

phi-2

INT8-CW

32

66.9

31.5

31.75

whisper-large-v3

INT8-CW

prompt0

292.8

31.5

31.75

stable-zephyr-3b-dpo

INT8-CW

32

65.3

31.6

31.65

stablelm-3b-4e1t

INT8-CW

32

66.5

31.6

31.65

gemma-3-4b-it

INT4-MIXED

1024

404.9

31.8

31.45

whisper-large-v3

INT8-CW

prompt1

343.8

31.9

31.35

gemma-3-4b-it

INT4-MIXED

1024

406.7

32.1

31.15

glm-edge-4b-chat

INT4-MIXED

1024

571.8

32.4

30.86

minicpm-1b-sft

FP16

31

60.5

32.5

30.77

gemma-2-2b

INT8-CW

1025

240.2

32.6

30.67

chatglm3-6b

INT4-MIXED

32

45.1

33.1

30.21

gemma-4-e2b-it

INT4-MIXED

1024

601.1

33.4

29.94

deepseek-r1-distill-qwen-1.5b

FP16

32

40.1

33.5

29.85

qwen2.5-coder-1.5b-instruct

FP16

32

40.6

33.5

29.85

qwen2.5-1.5b-instruct

FP16

32

40.1

33.6

29.76

stablelm-3b-4e1t

INT8-CW

1024

340.3

33.8

29.59

gemma-4-e2b-it

INT4-MIXED

1024

582.4

33.9

29.50

qwen2.5-coder-3b-instruct

INT8-CW

32

58.2

33.9

29.50

minicpm-1b-sft

FP16

1014

184.3

34

29.41

stable-zephyr-3b-dpo

INT8-CW

1024

334.3

34

29.41

phi-2

INT8-CW

1024

343.4

34.2

29.24

qwen2.5-1.5b-instruct

FP16

1024

191.3

34.5

28.99

qwen2.5-coder-1.5b-instruct

FP16

1024

165.6

34.6

28.90

deepseek-r1-distill-qwen-1.5b

FP16

1024

167.1

34.8

28.74

gpt-oss-20b

INT4-MIXED

32

218.1

34.8

28.74

chatglm3-6b

INT4-MIXED

1024

470.8

35.2

28.41

qwen2.5-coder-3b-instruct

INT8-CW

1024

276.6

35.2

28.41

chatglm3-6b

INT4-MIXED

32

49.7

35.4

28.25

gemma-4-e2b-it

INT8-CW

274

461.1

35.6

28.09

phi-4-multimodal-instruct

INT4-MIXED

578

462.4

35.6

28.09

phi-4-multimodal-instruct

INT4-MIXED

786

542.7

35.6

28.09

whisper-large-v3

FP16

prompt0

276.7

35.8

27.93

whisper-large-v3

FP16

prompt1

315.9

36.2

27.62

gpt-oss-20b

INT4-MIXED

1024

701.4

36.3

27.55

phi-4-multimodal-instruct

INT4-MIXED

1362

1130.2

36.3

27.55

phi-4-multimodal-instruct

INT4-MIXED

1570

1249.6

36.3

27.55

llama-2-7b-chat-hf

INT4-MIXED

32

53.2

36.8

27.17

codellama-7b

INT4-MIXED

32

52.4

36.9

27.10

chatglm3-6b

INT4-MIXED

1024

558

37.2

26.88

llama-3.2-3b-instruct

INT8-CW

32

46.8

37.3

26.81

llama-3.2-3b-instruct

INT8-CW

1024

284.3

38.5

25.97

biomistral-7b-slerp

INT4-MIXED

7

45.4

38.7

25.84

mistral-7b-instruct-v0.2

INT4-MIXED

32

54

38.8

25.77

zephyr-7b-beta

INT4-MIXED

32

54.1

38.8

25.77

llama-2-7b-chat-hf

INT4-MIXED

32

55.6

38.9

25.71

neural-chat-7b-v3-3

INT4-MIXED

32

53.7

39.1

25.58

mistral-7b-instruct-v0.3

INT4-MIXED

32

54.2

39.2

25.51

glm-edge-1.5b-chat

FP16

32

52.9

39.3

25.45

codellama-7b

INT4-MIXED

32

55.9

39.4

25.38

qwen3_8b_eagle3

INT4-MIXED

32

65.7

39.7

25.19

falcon-7b-instruct

INT4-MIXED

32

56.1

40.2

24.88

glm-edge-1.5b-chat

FP16

1024

218.3

40.4

24.75

llama-2-7b-chat-hf

INT4-MIXED

1024

530.3

40.6

24.63

codellama-7b

INT4-MIXED

1024

523.7

40.7

24.57

minicpm3-4b

INT4-MIXED

32

178.7

40.7

24.57

phi-4-multimodal-instruct

INT4-MIXED

578

524.2

40.8

24.51

mistral-7b-instruct-v0.2

INT4-MIXED

32

59

40.9

24.45

mistral-7b-instruct-v0.3

INT4-MIXED

32

58.7

40.9

24.45

qwen3-vl-4b-instruct

INT4-MIXED

4937

6482

40.9

24.45

phi-4-multimodal-instruct

INT4-MIXED

786

645.3

41

24.39

qwen3-vl-4b-instruct

INT4-MIXED

4907

6208.2

41

24.39

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

42.5

41

24.39

minicpm3-4b

INT4-MIXED

32

174.9

41.1

24.33

biomistral-7b-slerp

INT4-MIXED

7

46.8

41.3

24.21

mistral-7b-instruct-v0.2

INT4-MIXED

1024

545

41.3

24.21

neural-chat-7b-v3-3

INT4-MIXED

1024

540.6

41.3

24.21

zephyr-7b-beta

INT4-MIXED

1024

535.1

41.3

24.21

mistral-7b-instruct-v0.1

INT4-MIXED

32

59.2

41.4

24.15

mistral-7b-instruct-v0.2

INT4-MIXED

32

59

41.4

24.15

mistral-7b-instruct-v0.3

INT4-MIXED

32

58.1

41.4

24.15

mistral-7b-instruct-v0.3

INT4-MIXED

1024

540.4

41.5

24.10

neural-chat-7b-v3-3

INT4-MIXED

32

58.5

41.5

24.10

minicpm3-4b

INT4-MIXED

32

174.7

41.6

24.04

qwen3-vl-4b-instruct

INT4-MIXED

4907

6404.3

41.8

23.92

qwen3-vl-4b-instruct

INT4-MIXED

4907

6597.4

42

23.81

falcon-7b-instruct

INT4-MIXED

1024

570.2

42.1

23.75

qwen2.5-7b-instruct-1m

INT4-MIXED

32

69.4

42.1

23.75

qwen3-vl-4b-instruct

INT4-MIXED

4937

6814.4

42.3

23.64

qwen3-vl-4b-instruct

INT4-MIXED

4937

6576.2

42.3

23.64

phi-4-multimodal-instruct

INT4-MIXED

1570

1439.4

42.4

23.58

llama-2-7b-chat-hf

INT4-MIXED

1024

585.5

42.5

23.53

qwen2.5-7b-instruct

INT4-MIXED

32

56

42.5

23.53

qwen2.5-7b-instruct

INT4-MIXED

32

56.8

42.5

23.53

deepseek-r1-distill-llama-8b

INT4-MIXED

32

57.2

42.6

23.47

phi-4-multimodal-instruct

INT4-MIXED

1362

1299.1

42.6

23.47

llama-3-8b-instruct

INT4-MIXED

32

60.4

42.9

23.31

gemma-4-e2b-it

INT8-CW

1024

651.1

43

23.26

llama-3.1-8b-instruct

INT4-MIXED

32

58.3

43

23.26

codellama-7b

INT4-MIXED

1024

610.4

43.1

23.20

phi-3-mini-128k-instruct

INT8-CW

32

56.9

43.1

23.20

phi-3-mini-4k-instruct

INT8-CW

32

56.5

43.1

23.20

phi-3.5-mini-instruct

INT8-CW

32

56.8

43.1

23.20

qwen3_8b_eagle3

INT4-MIXED

1024

637.5

43.2

23.15

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

44.4

43.2

23.15

minicpm-v-2_6

INT4-MIXED

228

544.7

43.3

23.09

mistral-7b-instruct-v0.2

INT4-MIXED

1024

576.6

43.3

23.09

minicpm3-4b

INT4-MIXED

1024

723.6

43.4

23.04

mistral-7b-instruct-v0.3

INT4-MIXED

1024

576.7

43.4

23.04

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

489.2

43.4

23.04

qwen2.5-7b-instruct

INT4-MIXED

32

59.8

43.5

22.99

qwen2.5-7b-instruct

INT4-MIXED

32

60

43.5

22.99

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

60.3

43.6

22.94

minicpm-o-2_6

INT4-MIXED

238

559.6

43.6

22.94

qwen2-7b-instruct

INT4-MIXED

32

59.5

43.6

22.94

qwen2.5-7b-instruct

INT4-MIXED

1024

486.4

43.6

22.94

qwen2.5-7b-instruct

INT4-MIXED

1024

489.8

43.7

22.88

qwen2.5-7b-instruct

INT4-MIXED

32

59.9

43.8

22.83

mistral-7b-instruct-v0.1

INT4-MIXED

1025

695.9

43.9

22.78

neural-chat-7b-v3-3

INT4-MIXED

1024

648

43.9

22.78

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

60.4

44

22.73

minicpm3-4b

INT4-MIXED

1024

860.6

44

22.73

mistral-7b-instruct-v0.2

INT4-MIXED

1024

647.7

44

22.73

mistral-7b-instruct-v0.3

INT4-MIXED

1024

644.5

44

22.73

qwen2-7b-instruct

INT4-MIXED

32

60.2

44.1

22.68

qwen3-8b

INT4-MIXED

32

71

44.1

22.68

qwen2.5-7b-instruct

INT4-MIXED

32

60

44.2

22.62

qwen2.5-7b-instruct-1m

INT4-MIXED

32

59.7

44.3

22.57

bloomz-7b1

INT4-MIXED

32

60.8

44.4

22.52

minicpm3-4b

INT4-MIXED

1024

756.3

44.5

22.47

minicpm4-8b

INT4-MIXED

32

63.8

44.5

22.47

falcon-7b-instruct

INT4-MIXED

32

62.5

44.7

22.37

llama-3-8b-instruct

INT4-MIXED

32

62.6

44.7

22.37

minicpm-v-2_6

INT4-MIXED

228

554.2

44.7

22.37

phi-4-mini-reasoning

INT8-CW

32

68.5

44.7

22.37

llama-3.1-8b-instruct

INT4-MIXED

32

62.4

44.8

22.32

gemma-3-4b-it

INT8-CW

32

89.8

44.9

22.27

llama-3-8b-instruct

INT4-MIXED

32

62.9

45

22.22

phi-4-mini-instruct

INT8-CW

32

69.1

45

22.22

qwen2.5-7b-instruct

INT4-MIXED

1024

550.2

45

22.22

qwen2.5-7b-instruct

INT4-MIXED

1024

544.7

45

22.22

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

547

45.1

22.17

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

527.9

45.1

22.17

minicpm-v-2_6

INT4-MIXED

228

579.9

45.1

22.17

qwen2-7b-instruct

INT4-MIXED

1024

522.2

45.1

22.17

minicpm-o-2_6

INT4-MIXED

238

570

45.3

22.08

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

583.6

45.4

22.03

internvl2-4b

INT8-CW

297

215.6

45.4

22.03

llama-3-8b-instruct

INT4-MIXED

32

62.6

45.4

22.03

qwen2.5-7b-instruct

INT4-MIXED

1024

580.4

45.4

22.03

qwen2-7b-instruct

INT4-MIXED

1024

587.9

45.5

21.98

deepseek-r1-distill-llama-8b

INT4-MIXED

32

61.3

45.6

21.93

llama-3.1-8b-instruct

INT4-MIXED

1024

544.8

45.8

21.83

phi-3-mini-128k-instruct

INT8-CW

1024

490.4

45.8

21.83

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

587.3

45.8

21.83

phi-3-mini-4k-instruct

INT8-CW

1024

484.2

45.9

21.79

phi-3.5-mini-instruct

INT8-CW

1024

487.7

45.9

21.79

qwen2.5-7b-instruct

INT4-MIXED

1024

589

46

21.74

qwen3-4b

INT8-CW

32

64.1

46

21.74

minicpm-v-4_5

INT4-MIXED

217

593.1

46.1

21.69

phi-4-mini-instruct

INT8-CW

1024

420.8

46.4

21.55

phi-4-mini-reasoning

INT8-CW

1024

421

46.4

21.55

minicpm4-8b

INT4-MIXED

1024

577.6

46.5

21.51

qwen3-8b

INT4-MIXED

32

70.6

46.5

21.51

bloomz-7b1

INT4-MIXED

32

65.9

46.7

21.41

minicpm4-8b

INT4-MIXED

32

70.4

46.8

21.37

qwen3-8b

INT4-MIXED

1024

565.3

46.8

21.37

falcon-7b-instruct

INT4-MIXED

1024

666.3

46.9

21.32

qwen3-8b

INT4-MIXED

32

73.4

46.9

21.32

gemma-3-4b-it

INT8-CW

1024

418.2

47

21.28

llama-3-8b-instruct

INT4-MIXED

1024

578.7

47.2

21.19

llama-3-8b-instruct

INT4-MIXED

1024

582.2

47.3

21.14

llama-3.1-8b-instruct

INT4-MIXED

1024

577

47.3

21.14

phi-3.5-vision-instruct

INT8-CW

802

501.9

47.3

21.14

llama-3-8b-instruct

INT4-MIXED

1024

549.1

47.6

21.01

afm-4.5b

INT8-CW

32

60.1

47.7

20.96

bloomz-7b1

INT4-MIXED

1024

686.7

47.7

20.96

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

568.9

48

20.83

llama-3-8b-instruct

INT4-MIXED

1024

648.1

48.2

20.75

minicpm4-8b

INT4-MIXED

32

69.2

48.3

20.70

internvl2-4b

INT8-CW

1027

606.8

48.4

20.66

afm-4.5b

INT8-CW

1024

385.7

48.5

20.62

minicpm-v-4_5

INT4-MIXED

217

614.8

48.5

20.62

qwen3-4b

INT8-CW

1024

407.4

48.6

20.58

phi-3.5-vision-instruct

INT8-CW

1032

633.2

48.7

20.53

minicpm4-8b

INT4-MIXED

1024

620.6

48.9

20.45

qwen3-8b

INT4-MIXED

1024

600.7

49.1

20.37

qwen3-8b

INT4-MIXED

1024

675.1

49.8

20.08

glm-4-9b-chat-hf

INT4-MIXED

32

67.1

49.9

20.04

zephyr-7b-beta

INT4-MIXED

32

64.6

49.9

20.04

qwen2.5-vl-7b-instruct

INT4-MIXED

32

150.7

50.1

19.96

bloomz-7b1

INT4-MIXED

1024

797

50.2

19.92

glm-edge-4b-chat

INT8-CW

32

96

50.3

19.88

minicpm4-8b

INT4-MIXED

1024

713.5

50.5

19.80

gemma-7b-it

INT4-MIXED

32

71.7

51

19.61

baichuan2-7b-chat

INT4-MIXED

32

66.5

51.1

19.57

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

615.4

51.4

19.46

glm-4-9b-chat-hf

INT4-MIXED

32

73.4

51.9

19.27

glm-edge-4b-chat

INT8-CW

1024

586.2

52.4

19.08

zephyr-7b-beta

INT4-MIXED

1024

570.5

52.4

19.08

glm-4-9b-chat-hf

INT4-MIXED

1024

671.6

52.5

19.05

glm-4-9b-chat-hf

INT4-MIXED

32

73

53.1

18.83

fara-7b

INT4-MIXED

32

174.9

53.5

18.69

gemma-7b-it

INT4-MIXED

32

76.1

53.9

18.55

qwen2.5-vl-7b-instruct

INT4-MIXED

32

150.8

54

18.52

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

610.3

54.1

18.48

phi-4-multimodal-instruct

INT8-CW

578

497.1

54.4

18.38

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

674.8

54.6

18.32

qwen2.5-vl-7b-instruct

INT4-MIXED

32

153.9

54.8

18.25

phi-4-multimodal-instruct

INT8-CW

786

602.7

54.9

18.21

baichuan2-7b-chat

INT4-MIXED

1024

646.9

55

18.18

glm-4-9b-chat-hf

INT4-MIXED

1024

703.5

55

18.18

gemma-7b-it

INT4-MIXED

1024

645

55.1

18.15

qwen3.5-9b

INT4-MIXED

83

251.7

55.1

18.15

ltx-video

INT4-MIXED

11

55.7

55.2

18.12

llava-next-video-7b-hf

INT4-MIXED

2945

3184.7

56.1

17.83

minicpm3-4b

INT8-CW

32

200.7

56.1

17.83

fara-7b

INT4-MIXED

1024

839

56.2

17.79

glm-4-9b-chat-hf

INT4-MIXED

1024

785.2

56.2

17.79

phi-4-multimodal-instruct

INT8-CW

1362

1239.2

56.2

17.79

qwen3.5-9b

INT4-MIXED

1024

1000.9

56.3

17.76

deepseek-r1-distill-llama-8b

INT4-MIXED

32

73.4

56.4

17.73

phi-4-multimodal-instruct

INT8-CW

1570

1378

56.5

17.70

llama-3.1-8b-instruct

INT4-MIXED

32

73.2

56.6

17.67

phi-2

FP16

32

65.1

57.2

17.48

gemma-7b-it

INT4-MIXED

1024

753

57.8

17.30

ltx-video

INT8-CW

11

57.6

57.8

17.30

gemma-4-e2b-it

FP16

274

504.3

57.9

17.27

stable-zephyr-3b-dpo

FP16

32

65.1

58

17.24

gemma-2-9b-it

INT4-MIXED

32

80.4

58.2

17.18

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

718.3

58.7

17.04

llama-3.1-8b-instruct

INT4-MIXED

1024

719.3

58.8

17.01

qwen3.5-9b

INT4-MIXED

83

259.5

58.8

17.01

gemma-2-2b

FP16

33

64

58.9

16.98

stablelm-3b-4e1t

FP16

32

65

58.9

16.98

qwen3.5-9b

INT4-MIXED

1024

1162.2

59

16.95

gemma-2-9b-it

INT4-MIXED

32

82.7

59.8

16.72

qwen3-vl-4b-instruct

INT8-CW

4907

6586.4

59.8

16.72

gemma-2-2b

FP16

1025

334.1

60.3

16.58

qwen3-vl-4b-instruct

INT8-CW

4937

6840.8

60.4

16.56

phi-2

FP16

1024

433.8

61.1

16.37

gemma-2-9b-it

INT4-MIXED

32

80.7

61.2

16.34

stable-zephyr-3b-dpo

FP16

1024

411.6

61.7

16.21

minicpm3-4b

INT8-CW

1024

910.9

61.8

16.18

stablelm-3b-4e1t

FP16

1024

411.9

61.8

16.18

gemma-2b-it

FP16

32

67.2

61.9

16.16

gemma-2-9b-it

INT4-MIXED

1024

724.6

62.2

16.08

gemma-2b-it

FP16

1024

256.5

62.5

16.00

gemma-2-9b-it

INT4-MIXED

1024

750.2

64.3

15.55

gemma-2-9b-it

INT4-MIXED

1024

814.2

65.5

15.27

gemma-4-e2b-it

FP16

1024

739

65.6

15.24

lcm-dreamshaper-v7

INT8-HYBRID

1024

70.5

66.2

15.11

lcm-dreamshaper-v7

INT8-HYBRID

32

69.7

66.6

15.02

chatglm3-6b

INT8-CW

32

80.5

66.9

14.95

llama-3.2-3b-instruct

FP16

32

72.5

67.6

14.79

qwen3_8b_eagle3

INT8-CW

1024

673.7

67.6

14.79

dolly-v2-12b

INT4-MIXED

32

102.1

68

14.71

chatglm3-6b

INT8-CW

1024

548.8

68.7

14.56

llama-3.2-3b-instruct

FP16

1024

403.5

69.4

14.41

llama-2-13b-chat-hf

INT4-MIXED

32

96.6

69.9

14.31

qwen2.5-coder-3b-instruct

FP16

32

78

72.8

13.74

dolly-v2-12b

INT4-MIXED

1024

1248.3

73

13.70

gemma-3-12b-it

INT4-MIXED

32

138.4

73

13.70

qwen3_8b_eagle3

INT8-CW

32

105.5

73.1

13.68

falcon-7b-instruct

INT8-CW

32

89.3

74.1

13.50

qwen2.5-coder-3b-instruct

FP16

1024

362.8

74.5

13.42

qwen2.5-7b-instruct

INT8-CW

32

94.4

74.7

13.39

qwen2.5-7b-instruct-1m

INT8-CW

32

94.4

74.7

13.39

qwen2.5-7b-instruct

INT8-CW

32

95.2

74.9

13.35

qwen2-7b-instruct

INT8-CW

32

94.8

75

13.33

llama-2-13b-chat-hf

INT4-MIXED

1024

1015.5

75.5

13.25

deepseek-r1-distill-qwen-7b

INT8-CW

32

94.9

75.8

13.19

minicpm-v-2_6

INT8-CW

228

551.9

75.9

13.18

falcon-7b-instruct

INT8-CW

1024

814

76

13.16

codellama-7b

INT8-CW

32

88.2

76.2

13.12

qwen2.5-7b-instruct

INT8-CW

1024

581.9

76.2

13.12

qwen2-7b-instruct

INT8-CW

1024

580.4

76.3

13.11

llama-2-7b-chat-hf

INT8-CW

32

87.9

76.4

13.09

qwen2.5-7b-instruct

INT8-CW

1024

581.1

76.4

13.09

gemma-3-12b-it

INT4-MIXED

32

145.6

76.7

13.04

phi-3.5-mini-instruct

FP16

32

84.5

76.7

13.04

phi-3-mini-128k-instruct

FP16

32

84.7

76.8

13.02

gemma-3-12b-it

INT4-MIXED

1024

1042.3

76.9

13.00

phi-3-mini-4k-instruct

FP16

32

84.8

76.9

13.00

deepseek-r1-distill-qwen-7b

INT8-CW

1024

585.3

77.1

12.97

qwen2.5-7b-instruct-1m

INT8-CW

1024

576.2

77.2

12.95

minicpm-o-2_6

INT8-CW

238

564.2

77.4

12.92

phi-4

INT4-MIXED

32

111.2

77.4

12.92

lcm-dreamshaper-v7

INT8-CW

1024

80.1

77.9

12.84

lcm-dreamshaper-v7

INT8-CW

32

79

78.1

12.80

phi-4-reasoning

INT4-MIXED

32

114.1

78.9

12.67

internvl2-4b

FP16

297

283.7

79.1

12.64

codellama-7b

INT8-CW

1024

580.4

79.6

12.56

qwen1.5-14b-chat

INT4-MIXED

32

114.9

79.6

12.56

llama-2-7b-chat-hf

INT8-CW

1024

593.3

79.7

12.55

lcm-dreamshaper-v7

INT8-CW

1024

80.4

79.8

12.53

lcm-dreamshaper-v7

INT8-CW

32

80.7

79.8

12.53

phi-4-mini-instruct

FP16

32

89.1

79.8

12.53

phi-4-mini-reasoning

FP16

32

89.6

79.9

12.52

gemma-3-12b-it

INT4-MIXED

1024

1162.9

80.8

12.38

phi-3.5-vision-instruct

FP16

802

678.1

80.8

12.38

phi-4-reasoning

INT4-MIXED

32

122.9

80.9

12.36

phi-3.5-mini-instruct

FP16

1024

575.1

81

12.35

mistral-7b-instruct-v0.1

INT8-CW

32

95.3

81.1

12.33

phi-3-mini-128k-instruct

FP16

1024

576.8

81.1

12.33

phi-4

INT4-MIXED

1024

1142.3

81.1

12.33

zephyr-7b-beta

INT8-CW

32

94.4

81.1

12.33

neural-chat-7b-v3-3

INT8-CW

32

94.2

81.2

12.32

mistral-7b-instruct-v0.3

INT8-CW

32

94.1

81.3

12.30

mistral-7b-instruct-v0.2

INT8-CW

32

93.6

81.4

12.29

phi-3-mini-4k-instruct

FP16

1024

572.6

81.4

12.29

internvl2-4b

FP16

1027

850

81.6

12.25

phi-4

INT4-MIXED

32

123.8

81.6

12.25

phi-3.5-vision-instruct

FP16

1032

866.4

81.8

12.22

biomistral-7b-slerp

INT8-CW

7

85.7

81.9

12.21

phi-4-mini-instruct

FP16

1024

552.3

82

12.20

phi-4-mini-reasoning

FP16

1024

557

82.2

12.17

phi-4-reasoning

INT4-MIXED

1024

1161.8

82.7

12.09

baichuan2-7b-chat

INT8-CW

32

93.4

82.8

12.08

gemma-3-4b-it

FP16

32

105.2

82.8

12.08

deepseek-r1-distill-qwen-14b

INT4-MIXED

32

113.2

83.1

12.03

mistral-7b-instruct-v0.3

INT8-CW

1024

609.4

83.4

11.99

fara-7b

INT8-CW

32

215.4

83.5

11.98

mistral-7b-instruct-v0.1

INT8-CW

1025

656.8

83.5

11.98

zephyr-7b-beta

INT8-CW

1024

605.3

83.5

11.98

qwen2.5-vl-7b-instruct

INT8-CW

32

191.2

83.7

11.95

mistral-7b-instruct-v0.2

INT8-CW

1024

606.4

83.8

11.93

neural-chat-7b-v3-3

INT8-CW

1024

603.8

83.8

11.93

bloomz-7b1

INT8-CW

32

96.2

84.3

11.86

gemma-3-4b-it

FP16

1024

550.8

84.3

11.86

fara-7b

INT8-CW

1024

835

84.9

11.78

llama-3.1-8b-instruct

INT8-CW

32

98.2

85

11.76

llama-3-8b-instruct

INT8-CW

32

98.5

85.2

11.74

phi-4

INT4-MIXED

1024

1326.6

85.3

11.72

qwen1.5-14b-chat

INT4-MIXED

1024

1166

85.4

11.71

baichuan2-7b-chat

INT8-CW

1024

645.6

85.5

11.70

phi-4-reasoning

INT4-MIXED

1024

1206.1

85.6

11.68

qwen3-4b

FP16

32

93.8

85.7

11.67

phi-4-reasoning

INT4-MIXED

32

123.5

86.2

11.60

deepseek-r1-distill-llama-8b

INT8-CW

32

98.5

86.3

11.59

qwen2.5-vl-7b-instruct

INT8-CW

1024

697.1

86.7

11.53

bloomz-7b1

INT8-CW

1024

798.3

86.9

11.51

deepseek-r1-distill-qwen-14b

INT4-MIXED

1024

1217.4

87.2

11.47

glm-edge-4b-chat

FP16

32

100.5

87.2

11.47

llama-3-8b-instruct

INT8-CW

1024

619.3

87.6

11.42

llama-3.1-8b-instruct

INT8-CW

1024

614.4

87.6

11.42

qwen3-4b

FP16

1024

553.4

88.5

11.30

deepseek-r1-distill-llama-8b

INT8-CW

1024

616.3

88.8

11.26

llava-v1.6-mistral-7b-hf

INT8-CW

2944

2916.3

88.8

11.26

starcoder

INT4-MIXED

32

140.3

89.2

11.21

glm-edge-4b-chat

FP16

1024

704.6

89.7

11.15

llama-2-13b-chat-hf

INT4-MIXED

32

120.4

89.7

11.15

phi-4-reasoning

INT4-MIXED

1024

1371.4

89.9

11.12

starcoder2-15b

INT4-MIXED

32

144.6

90.7

11.03

afm-4.5b

FP16

32

100.3

90.9

11.00

qwen3-8b

INT8-CW

32

101.4

91.3

10.95

afm-4.5b

FP16

1024

498

91.8

10.89

minicpm-v-4_5

INT8-CW

217

586.8

91.9

10.88

starcoder

INT4-MIXED

1024

1437.6

92.8

10.78

qwen3-8b

INT8-CW

1024

657.4

92.9

10.76

llava-next-video-7b-hf

INT8-CW

2945

3258.6

93.1

10.74

gemma-7b-it

INT8-CW

32

114.5

93.2

10.73

llama-2-13b-chat-hf

INT4-MIXED

1024

1204

93.9

10.65

minicpm4-8b

INT8-CW

32

106.7

94.6

10.57

starcoder2-15b

INT4-MIXED

1024

1455.4

94.6

10.57

phi-4-multimodal-instruct

FP16

578

629.7

94.7

10.56

phi-4-multimodal-instruct

FP16

786

767.6

95

10.53

minicpm3-4b

FP16

32

177.5

95.7

10.45

minicpm4-8b

INT8-CW

1024

716.2

96.5

10.36

glm-4-9b-chat-hf

INT8-CW

32

114.3

97.3

10.28

lcm-dreamshaper-v7

FP16

32

86.1

97.3

10.28

gemma-7b-it

INT8-CW

1024

757.2

97.7

10.24

lcm-dreamshaper-v7

FP16

1024

94.1

97.7

10.24

phi-4-multimodal-instruct

FP16

1570

1675

97.9

10.21

phi-4-multimodal-instruct

FP16

1362

1524.4

98.6

10.14

Topology

Precision

Input Size

1st latency (ms)

2nd latency (ms)

2nd token per sec

minicpm4-0.5b

INT4-MIXED

32

22

7.3

136.99

minicpm4-0.5b

INT4-MIXED

32

21.9

7.5

133.33

minicpm4-0.5b

INT4-MIXED

32

23.4

7.7

129.87

gemma-3-270m

INT4-MIXED

1024

74.2

8.2

121.95

minicpm4-0.5b

INT4-MIXED

1024

111.5

8.4

119.05

gemma-3-270m

INT4-MIXED

32

23.7

8.5

117.65

minicpm4-0.5b

INT4-MIXED

1024

116.9

8.6

116.28

minicpm4-0.5b

INT4-MIXED

1024

142.4

8.8

113.64

gemma-3-270m

INT8-CW

1024

71.1

9

111.11

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

22.9

9

111.11

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

24.1

9.1

109.89

gemma-3-270m

INT8-CW

32

23.5

9.2

108.70

qwen2.5-coder-0.5b-instruct

INT4-MIXED

32

24.1

9.3

107.53

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

116.7

9.8

102.04

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

118.7

10

100.00

qwen2.5-coder-0.5b-instruct

INT4-MIXED

1024

147.3

10.1

99.01

minicpm4-0.5b

INT8-CW

32

23.5

10.4

96.15

whisper-small

INT4-MIXED

prompt0

140.5

10.5

95.24

whisper-small

INT4-MIXED

prompt1

196.1

11

90.91

whisper-small

INT4-MIXED

prompt1

204

11.4

87.72

minicpm4-0.5b

INT8-CW

1024

129.2

11.5

86.96

whisper-small

INT4-MIXED

prompt0

143.7

11.5

86.96

whisper-small

INT4-MIXED

prompt0

157.4

11.6

86.21

whisper-small

INT8-CW

prompt1

209.3

11.6

86.21

whisper-small

INT4-MIXED

prompt1

223.1

11.7

85.47

whisper-small

INT8-CW

prompt0

170

11.8

84.75

qwen2.5-coder-0.5b-instruct

INT8-CW

32

24.9

12.1

82.64

gemma-3-270m

FP16

32

23.6

12.3

81.30

nanollava

INT4-MIXED

760

250.2

12.4

80.65

whisper-large-v3-turbo

INT8-CW

prompt1

484.9

12.4

80.65

whisper-large-v3-turbo

INT4-MIXED

prompt1

508.3

12.7

78.74

whisper-large-v3-turbo

INT8-CW

prompt0

421.2

12.7

78.74

whisper-large-v3-turbo

INT4-MIXED

prompt0

452.6

12.8

78.13

gemma-3-270m

FP16

1024

79.8

12.9

77.52

qwen2.5-coder-0.5b-instruct

INT8-CW

1024

135.3

12.9

77.52

tiny-llama-1.1b-chat

INT4-MIXED

32

30

13.2

75.76

nanollava

INT8-CW

760

239.5

13.7

72.99

qwen3.5-0.8b

INT4-MIXED

85

92.7

14.3

69.93

nanollava

INT4-MIXED

1752

485.2

14.4

69.44

qwen3.5-0.8b

INT4-MIXED

1024

380.5

14.4

69.44

qwen3.5-0.8b

INT4-MIXED

85

92.4

14.4

69.44

whisper-small

FP16

prompt0

143.4

14.6

68.49

whisper-small

FP16

prompt1

204.3

14.7

68.03

qwen3.5-0.8b

INT4-MIXED

85

97.4

14.8

67.57

qwen3.5-0.8b

INT4-MIXED

1024

384.7

14.8

67.57

qwen3.5-0.8b

INT4-MIXED

1024

434.7

15.1

66.23

tiny-llama-1.1b-chat

INT4-MIXED

1024

240.6

15.1

66.23

nanollava

INT8-CW

1752

458.6

15.8

63.29

tiny-llama-1.1b-chat

INT4-MIXED

32

34.3

16

62.50

llama-3.2-1b-instruct

INT4-MIXED

32

33.1

16.4

60.98

whisper-large-v3-turbo

FP16

prompt0

454.7

16.4

60.98

whisper-large-v3-turbo

FP16

prompt1

521.8

16.4

60.98

llama-3.2-1b-instruct

INT4-MIXED

32

33.8

16.6

60.24

llama-3.2-1b-instruct

INT4-MIXED

1024

226.8

17.7

56.50

tiny-llama-1.1b-chat

INT4-MIXED

1024

320.6

17.9

55.87

gemma-3-1b-it

INT4-MIXED

32

33.3

18

55.56

llama-3.2-1b-instruct

INT4-MIXED

1024

271

18

55.56

qwen3.5-0.8b

INT8-CW

85

98.6

18

55.56

qwen3.5-0.8b

INT8-CW

1024

398.2

18.6

53.76

gemma-3-1b-it

INT4-MIXED

1024

217.3

18.8

53.19

gemma-3-1b-it

INT4-MIXED

32

34.9

18.8

53.19

gemma-3-1b-it

INT4-MIXED

32

35

19

52.63

minicpm4-0.5b

FP16

32

23.5

19.4

51.55

gemma-3-1b-it

INT4-MIXED

1024

220.5

19.7

50.76

gemma-3-1b-it

INT4-MIXED

1024

253.8

19.9

50.25

minicpm-1b-sft

INT4-MIXED

31

57.6

20

50.00

glm-edge-1.5b-chat

INT4-MIXED

32

48.5

20.3

49.26

minicpm4-0.5b

FP16

1024

143.7

20.4

49.02

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

41.3

20.7

48.31

minicpm-1b-sft

INT4-MIXED

31

45.4

21

47.62

minicpm-1b-sft

INT4-MIXED

31

47.7

21.4

46.73

tiny-llama-1.1b-chat

INT8-CW

32

39.9

21.4

46.73

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

45.5

21.7

46.08

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

45.3

21.8

45.87

qwen2.5-1.5b-instruct

INT4-MIXED

32

44.7

21.8

45.87

nanollava

FP16

760

285.1

21.9

45.66

qwen2.5-coder-0.5b-instruct

FP16

32

27.5

22

45.45

qwen2.5-coder-1.5b-instruct

INT4-MIXED

32

45.4

22

45.45

glm-edge-1.5b-chat

INT4-MIXED

1024

559.9

22.1

45.25

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

325.4

22.2

45.05

qwen2.5-coder-0.5b-instruct

FP16

1024

156.5

22.5

44.44

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

357.9

23.2

43.10

minicpm-1b-sft

INT4-MIXED

1014

358.6

23.2

43.10

qwen2.5-1.5b-instruct

INT4-MIXED

1024

359.1

23.2

43.10

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

360

23.2

43.10

qwen2.5-coder-1.5b-instruct

INT4-MIXED

1024

377.7

23.4

42.74

tiny-llama-1.1b-chat

INT8-CW

1024

272.3

23.4

42.74

qwen2.5-1.5b-instruct

INT4-MIXED

32

47.9

23.5

42.55

gemma-3-1b-it

INT8-CW

32

44.5

23.9

41.84

nanollava

FP16

1752

512.2

23.9

41.84

minicpm-1b-sft

INT4-MIXED

1014

369.9

24

41.67

minicpm-1b-sft

INT4-MIXED

1014

420.4

24.7

40.49

gemma-3-1b-it

INT8-CW

1024

258.7

24.8

40.32

qwen2.5-1.5b-instruct

INT4-MIXED

1024

381.7

24.8

40.32

llama-3.2-1b-instruct

INT8-CW

32

44.2

25.2

39.68

llama-3.2-1b-instruct

INT8-CW

1024

260.8

26.7

37.45

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

32

62.6

30.8

32.47

qwen3.5-0.8b

FP16

85

100.3

31.3

31.95

glm-edge-1.5b-chat

INT8-CW

32

62.8

31.7

31.55

minicpm-1b-sft

INT8-CW

31

59

31.8

31.45

qwen3.5-0.8b

FP16

1024

421.9

31.8

31.45

deepseek-r1-distill-qwen-1.5b

INT4-MIXED

1024

369.8

32.2

31.06

deepseek-r1-distill-qwen-1.5b

INT8-CW

32

62.2

33.1

30.21

qwen2.5-1.5b-instruct

INT8-CW

32

61.6

33.1

30.21

qwen2.5-coder-1.5b-instruct

INT8-CW

32

62.6

33.2

30.12

glm-edge-1.5b-chat

INT8-CW

1024

560.5

33.5

29.85

gemma-2b-it

INT4-MIXED

32

54.6

33.7

29.67

phi-2

INT4-MIXED

32

65

33.7

29.67

stable-zephyr-3b-dpo

INT4-MIXED

32

63.6

34

29.41

stablelm-3b-4e1t

INT4-MIXED

32

64.4

34

29.41

deepseek-r1-distill-qwen-1.5b

INT8-CW

1024

359.4

34.5

28.99

qwen2.5-1.5b-instruct

INT8-CW

1024

361.9

34.5

28.99

qwen2.5-coder-1.5b-instruct

INT8-CW

1024

360.5

34.5

28.99

minicpm-1b-sft

INT8-CW

1014

410.9

34.9

28.65

gemma-2b-it

INT4-MIXED

1024

400.7

35.3

28.33

gemma-2b-it

INT4-MIXED

32

66.3

35.4

28.25

gemma-2-2b

INT4-MIXED

33

57.3

37

27.03

gemma-2b-it

INT4-MIXED

1024

493.6

37

27.03

whisper-large-v3

INT4-MIXED

prompt0

579.9

37.4

26.74

stable-zephyr-3b-dpo

INT4-MIXED

32

78.8

38

26.32

whisper-large-v3

INT4-MIXED

prompt1

642.5

38.2

26.18

phi-2

INT4-MIXED

1024

770

38.4

26.04

stable-zephyr-3b-dpo

INT4-MIXED

1024

835.6

38.7

25.84

stablelm-3b-4e1t

INT4-MIXED

1024

837.3

38.8

25.77

gemma-2-2b

INT4-MIXED

33

58.4

39.3

25.45

llama-3.2-3b-instruct

INT4-MIXED

32

69.1

39.8

25.13

phi-2

INT4-MIXED

32

84.2

40

25.00

gemma-2-2b

INT4-MIXED

1025

539.7

40.1

24.94

qwen2.5-coder-3b-instruct

INT4-MIXED

32

71.5

40.2

24.88

tiny-llama-1.1b-chat

FP16

32

49.9

40.7

24.57

llama-3.2-3b-instruct

INT4-MIXED

32

82

40.9

24.45

qwen2.5-coder-3b-instruct

INT4-MIXED

32

81.6

40.9

24.45

llama-3.2-3b-instruct

INT4-MIXED

32

81.9

41.5

24.10

tiny-llama-1.1b-chat

FP16

1024

336.1

42.2

23.70

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

536.7

42.3

23.64

gemma-2-2b

INT4-MIXED

1025

616.3

42.4

23.58

stable-zephyr-3b-dpo

INT4-MIXED

1024

940.7

42.7

23.42

llama-3.2-3b-instruct

INT4-MIXED

1024

589.4

42.8

23.36

stablelm-3b-4e1t

INT4-MIXED

32

88.9

42.9

23.31

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

556

43.2

23.15

gemma-3-1b-it

FP16

32

52.9

43.3

23.09

whisper-large-v3

INT8-CW

prompt0

551.3

43.3

23.09

whisper-large-v3

INT8-CW

prompt1

608.2

43.8

22.83

gemma-3-1b-it

FP16

1024

307.9

43.9

22.78

llama-3.2-3b-instruct

INT4-MIXED

1024

622.1

43.9

22.78

phi-3-mini-4k-instruct

INT4-MIXED

32

80.2

44.2

22.62

phi-3-mini-128k-instruct

INT4-MIXED

32

81.5

44.3

22.57

phi-3.5-mini-instruct

INT4-MIXED

32

79.1

44.3

22.57

llama-3.2-3b-instruct

INT4-MIXED

1024

706.3

44.4

22.52

phi-2

INT4-MIXED

1024

1016.2

44.7

22.37

phi-3-mini-4k-instruct

INT4-MIXED

32

96.5

45.6

21.93

phi-3.5-mini-instruct

INT4-MIXED

32

96.8

45.7

21.88

phi-3-mini-128k-instruct

INT4-MIXED

32

101.7

46.3

21.60

gemma-4-e2b-it

INT4-MIXED

274

816.8

47

21.28

stablelm-3b-4e1t

INT4-MIXED

1024

952.3

47.6

21.01

qwen2.5-coder-3b-instruct

INT4-MIXED

32

95.5

48

20.83

llama-3.2-1b-instruct

FP16

32

57.7

48.4

20.66

internvl2-4b

INT4-MIXED

297

496.7

48.8

20.49

gemma-4-e2b-it

INT4-MIXED

274

870.6

48.9

20.45

internvl2-4b

INT4-MIXED

297

535.7

49.3

20.28

phi-4-mini-instruct

INT4-MIXED

32

85.9

49.6

20.16

phi-4-mini-reasoning

INT4-MIXED

32

88.8

49.7

20.12

llama-3.2-1b-instruct

FP16

1024

331.3

49.9

20.04

qwen3-4b

INT4-MIXED

32

88.3

50.1

19.96

qwen2.5-coder-3b-instruct

INT4-MIXED

1024

1029.7

50.3

19.88

gemma-2b-it

INT8-CW

32

88.5

50.5

19.80

phi-3-mini-4k-instruct

INT4-MIXED

1024

910.5

50.6

19.76

phi-3.5-mini-instruct

INT4-MIXED

1024

911.3

50.6

19.76

phi-3-mini-128k-instruct

INT4-MIXED

1024

911.1

50.7

19.72

phi-4-mini-instruct

INT4-MIXED

32

102

50.8

19.69

phi-4-mini-reasoning

INT4-MIXED

32

100.3

51

19.61

phi-4-mini-reasoning

INT4-MIXED

32

101.8

51.3

19.49

phi-3-mini-4k-instruct

INT4-MIXED

1024

948

51.9

19.27

qwen3-4b

INT4-MIXED

32

104.1

51.9

19.27

phi-3.5-mini-instruct

INT4-MIXED

1024

951.2

52.1

19.19

gemma-2b-it

INT8-CW

1024

483.4

52.2

19.16

gemma-3-4b-it

INT4-MIXED

32

96.1

52.3

19.12

glm-edge-4b-chat

INT4-MIXED

32

119.6

52.6

19.01

phi-3-mini-128k-instruct

INT4-MIXED

1024

1162.5

52.7

18.98

phi-4-mini-instruct

INT4-MIXED

1024

905.2

52.9

18.90

phi-3.5-vision-instruct

INT4-MIXED

802

1189

53

18.87

phi-4-mini-reasoning

INT4-MIXED

1024

905.2

53

18.87

gemma-3-4b-it

INT4-MIXED

32

105.3

53.8

18.59

phi-3-mini-4k-instruct

INT4-MIXED

32

115.2

53.9

18.55

afm-4.5b

INT4-MIXED

32

104.8

54.1

18.48

phi-3.5-mini-instruct

INT4-MIXED

32

117.6

54.1

18.48

gemma-3-4b-it

INT4-MIXED

32

106.2

54.2

18.45

phi-4-mini-instruct

INT4-MIXED

1024

938.4

54.2

18.45

phi-4-mini-reasoning

INT4-MIXED

1024

943.8

54.3

18.42

internvl2-4b

INT4-MIXED

1027

1355.1

54.6

18.32

phi-4-mini-reasoning

INT4-MIXED

1024

1019.6

54.7

18.28

internvl2-4b

INT4-MIXED

1027

1381.9

54.9

18.21

phi-3.5-vision-instruct

INT4-MIXED

1032

1423.9

54.9

18.21

qwen3-4b

INT4-MIXED

1024

822.2

55

18.18

phi-2

INT8-CW

32

105.5

55.7

17.95

gemma-3-4b-it

INT4-MIXED

1024

869.7

55.9

17.89

gemma-2-2b

INT8-CW

33

106

56

17.86

stable-zephyr-3b-dpo

INT8-CW

32

106

56.3

17.76

stablelm-3b-4e1t

INT8-CW

32

106.1

56.3

17.76

qwen3-4b

INT4-MIXED

1024

860.8

56.5

17.70

glm-edge-4b-chat

INT4-MIXED

1024

1419.2

56.7

17.64

minicpm-1b-sft

FP16

31

68

56.8

17.61

afm-4.5b

INT4-MIXED

1024

923.1

57.4

17.42

gemma-3-4b-it

INT4-MIXED

1024

892.1

57.5

17.39

gemma-3-4b-it

INT4-MIXED

1024

957.5

57.9

17.27

glm-edge-1.5b-chat

FP16

32

75.7

58.5

17.09

gemma-2-2b

INT8-CW

1025

680.4

59

16.95

phi-3-mini-4k-instruct

INT4-MIXED

1024

1284.3

60.2

16.61

minicpm-1b-sft

FP16

1014

494.4

60.3

16.58

minicpm3-4b

INT4-MIXED

32

121.2

60.3

16.58

phi-2

INT8-CW

1024

889

60.4

16.56

phi-3.5-mini-instruct

INT4-MIXED

1024

1397.6

60.4

16.56

stablelm-3b-4e1t

INT8-CW

1024

949.6

60.9

16.42

stable-zephyr-3b-dpo

INT8-CW

1024

961.6

61

16.39

glm-edge-1.5b-chat

FP16

1024

606.8

61.1

16.37

qwen2.5-coder-3b-instruct

INT8-CW

32

112

61.6

16.23

deepseek-r1-distill-qwen-1.5b

FP16

32

74.6

61.9

16.16

qwen2.5-1.5b-instruct

FP16

32

75.3

61.9

16.16

qwen2.5-coder-1.5b-instruct

FP16

32

74.9

62

16.13

minicpm3-4b

INT4-MIXED

32

131.1

62.4

16.03

minicpm3-4b

INT4-MIXED

32

134.2

63

15.87

qwen2.5-1.5b-instruct

FP16

1024

452.8

63.2

15.82

qwen2.5-coder-1.5b-instruct

FP16

1024

455.1

63.2

15.82

deepseek-r1-distill-qwen-1.5b

FP16

1024

455.1

63.3

15.80

phi-4-multimodal-instruct

INT4-MIXED

578

1062.4

63.6

15.72

qwen2.5-coder-3b-instruct

INT8-CW

1024

668.2

63.7

15.70

whisper-large-v3

FP16

prompt0

588.1

64.1

15.60

phi-4-multimodal-instruct

INT4-MIXED

786

1271.9

64.3

15.55

whisper-large-v3

FP16

prompt1

657.1

64.4

15.53

chatglm3-6b

INT4-MIXED

32

112.2

65.4

15.29

llama-3.2-3b-instruct

INT8-CW

32

114.2

65.7

15.22

phi-4-multimodal-instruct

INT4-MIXED

1362

2567.3

66.4

15.06

gemma-4-e2b-it

INT4-MIXED

1024

1346.4

66.5

15.04

phi-4-multimodal-instruct

INT4-MIXED

1570

2829.5

67

14.93

chatglm3-6b

INT4-MIXED

32

141.6

68.4

14.62

gemma-4-e2b-it

INT4-MIXED

1024

1410

68.6

14.58

gemma-4-e2b-it

INT8-CW

274

936.3

68.6

14.58

llama-3.2-3b-instruct

INT8-CW

1024

740.9

68.6

14.58

chatglm3-6b

INT4-MIXED

1024

1166.7

68.8

14.53

chatglm3-6b

INT4-MIXED

1024

1265.5

71.3

14.03

minicpm3-4b

INT4-MIXED

1024

2123

72.8

13.74

llama-2-7b-chat-hf

INT4-MIXED

32

128.7

72.9

13.72

codellama-7b

INT4-MIXED

32

124.7

73.2

13.66

minicpm3-4b

INT4-MIXED

1024

2173.9

74.9

13.35

qwen3-vl-4b-instruct

INT4-MIXED

4907

14047.4

75.4

13.26

qwen3-vl-4b-instruct

INT4-MIXED

4937

14752.4

75.4

13.26

biomistral-7b-slerp

INT4-MIXED

7

84.6

75.5

13.25

llama-2-7b-chat-hf

INT4-MIXED

32

157.6

75.5

13.25

minicpm3-4b

INT4-MIXED

1024

2302.7

75.5

13.25

qwen3_8b_eagle3

INT4-MIXED

32

192.1

75.9

13.18

falcon-7b-instruct

INT4-MIXED

32

134.9

76.1

13.14

phi-4-multimodal-instruct

INT4-MIXED

578

1310.8

76.1

13.14

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

81.5

76.1

13.14

mistral-7b-instruct-v0.2

INT4-MIXED

32

135.2

76.2

13.12

neural-chat-7b-v3-3

INT4-MIXED

32

135.7

76.2

13.12

zephyr-7b-beta

INT4-MIXED

32

134

76.3

13.11

mistral-7b-instruct-v0.3

INT4-MIXED

32

137.2

76.4

13.09

codellama-7b

INT4-MIXED

32

158.8

76.5

13.07

phi-4-multimodal-instruct

INT4-MIXED

786

1558.9

76.8

13.02

phi-3-mini-128k-instruct

INT8-CW

32

142.2

77

12.99

qwen3-vl-4b-instruct

INT4-MIXED

4907

14219.3

77

12.99

phi-3-mini-4k-instruct

INT8-CW

32

144.4

77.1

12.97

phi-3.5-mini-instruct

INT8-CW

32

144.1

77.1

12.97

qwen3-vl-4b-instruct

INT4-MIXED

4937

14906.6

77.1

12.97

qwen3-vl-4b-instruct

INT4-MIXED

4937

15704.7

78

12.82

qwen3-vl-4b-instruct

INT4-MIXED

4907

14768.1

78.1

12.80

mistral-7b-instruct-v0.3

INT4-MIXED

32

169

78.7

12.71

phi-4-mini-instruct

INT8-CW

32

138.5

78.7

12.71

mistral-7b-instruct-v0.2

INT4-MIXED

32

168.9

79

12.66

phi-4-mini-reasoning

INT8-CW

32

139.1

79

12.66

neural-chat-7b-v3-3

INT4-MIXED

32

171.3

79.5

12.58

mistral-7b-instruct-v0.2

INT4-MIXED

32

173.6

79.6

12.56

mistral-7b-instruct-v0.3

INT4-MIXED

32

172.5

79.6

12.56

mistral-7b-instruct-v0.1

INT4-MIXED

32

173.5

79.7

12.55

llama-2-7b-chat-hf

INT4-MIXED

1024

1176.4

79.8

12.53

phi-4-multimodal-instruct

INT4-MIXED

1362

3183

79.9

12.52

codellama-7b

INT4-MIXED

1024

1180.7

80

12.50

falcon-7b-instruct

INT4-MIXED

1024

1270.5

80

12.50

gemma-3-4b-it

INT8-CW

32

152.9

80

12.50

qwen2.5-7b-instruct

INT4-MIXED

32

151.9

80.1

12.48

qwen2.5-7b-instruct-1m

INT4-MIXED

32

146.9

80.1

12.48

phi-4-multimodal-instruct

INT4-MIXED

1570

3482.3

80.2

12.47

mistral-7b-instruct-v0.2

INT4-MIXED

1024

1270.3

80.4

12.44

zephyr-7b-beta

INT4-MIXED

1024

1281.7

80.4

12.44

stable-diffusion-xl-1.0-inpainting-0.1

INT8-CW

32

83.7

80.5

12.42

mistral-7b-instruct-v0.3

INT4-MIXED

1024

1261.5

80.7

12.39

neural-chat-7b-v3-3

INT4-MIXED

1024

1267

80.7

12.39

qwen3-4b

INT8-CW

32

147.6

81.4

12.29

biomistral-7b-slerp

INT4-MIXED

7

91.1

81.7

12.24

minicpm-v-2_6

INT4-MIXED

228

1203.8

81.7

12.24

minicpm-o-2_6

INT4-MIXED

238

1197.9

81.9

12.21

internvl2-4b

INT8-CW

297

624

82

12.20

phi-4-mini-instruct

INT8-CW

1024

1055.2

82

12.20

llama-2-7b-chat-hf

INT4-MIXED

1024

1282.3

82.3

12.15

phi-4-mini-reasoning

INT8-CW

1024

1077

82.4

12.14

qwen2-7b-instruct

INT4-MIXED

32

170.7

82.6

12.11

qwen2.5-7b-instruct

INT4-MIXED

32

173.4

82.7

12.09

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

172.1

82.8

12.08

mistral-7b-instruct-v0.3

INT4-MIXED

1024

1360.8

82.9

12.06

mistral-7b-instruct-v0.2

INT4-MIXED

1024

1358.3

83.1

12.03

qwen2.5-7b-instruct

INT4-MIXED

1024

1228.6

83.1

12.03

phi-3-mini-128k-instruct

INT8-CW

1024

1227.5

83.2

12.02

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

1225.6

83.2

12.02

llama-3-8b-instruct

INT4-MIXED

32

143.7

83.3

12.00

llama-3.1-8b-instruct

INT4-MIXED

32

147.7

83.3

12.00

codellama-7b

INT4-MIXED

1024

1412.5

83.4

11.99

deepseek-r1-distill-qwen-7b

INT4-MIXED

32

174.2

83.4

11.99

phi-3-mini-4k-instruct

INT8-CW

1024

1227.1

83.4

11.99

phi-3.5-mini-instruct

INT8-CW

1024

1214.6

83.4

11.99

qwen2.5-7b-instruct-1m

INT4-MIXED

32

176.1

83.4

11.99

mistral-7b-instruct-v0.2

INT4-MIXED

1024

1546.4

83.6

11.96

phi-3.5-vision-instruct

INT8-CW

802

1269.2

83.7

11.95

deepseek-r1-distill-llama-8b

INT4-MIXED

32

143.7

83.8

11.93

gemma-3-4b-it

INT8-CW

1024

1020

83.8

11.93

qwen2-7b-instruct

INT4-MIXED

32

176

83.8

11.93

qwen2.5-7b-instruct

INT4-MIXED

32

176.4

83.8

11.93

neural-chat-7b-v3-3

INT4-MIXED

1024

1544.8

83.9

11.92

mistral-7b-instruct-v0.1

INT4-MIXED

1025

1667.5

84

11.90

mistral-7b-instruct-v0.3

INT4-MIXED

1024

1535.1

84

11.90

minicpm-v-2_6

INT4-MIXED

228

1231.1

84.3

11.86

qwen3_8b_eagle3

INT4-MIXED

1024

1632.2

84.7

11.81

phi-3.5-vision-instruct

INT8-CW

1032

1567.2

85.3

11.72

qwen2-7b-instruct

INT4-MIXED

1024

1303.6

85.5

11.70

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

1302.8

85.6

11.68

minicpm-v-2_6

INT4-MIXED

228

1232.2

85.6

11.68

minicpm-o-2_6

INT4-MIXED

238

1227.5

85.7

11.67

qwen2.5-7b-instruct

INT4-MIXED

1024

1294.4

85.7

11.67

deepseek-r1-distill-llama-8b

INT4-MIXED

32

175.4

85.9

11.64

llama-3-8b-instruct

INT4-MIXED

32

178.4

86

11.63

qwen3-8b

INT4-MIXED

32

146.8

86

11.63

llama-3-8b-instruct

INT4-MIXED

32

175.9

86.1

11.61

llama-3.1-8b-instruct

INT4-MIXED

32

174.7

86.1

11.61

qwen3-4b

INT8-CW

1024

997.8

86.1

11.61

minicpm4-8b

INT4-MIXED

32

157.6

86.3

11.59

deepseek-r1-distill-qwen-7b

INT4-MIXED

1024

1480.4

86.4

11.57

qwen2.5-7b-instruct-1m

INT4-MIXED

1024

1475.2

86.4

11.57

qwen2.5-vl-7b-instruct

INT4-MIXED

32

239.3

86.4

11.57

qwen2.5-7b-instruct

INT4-MIXED

1024

1482.6

86.5

11.56

qwen2-7b-instruct

INT4-MIXED

1024

1486.3

86.9

11.51

glm-edge-4b-chat

INT8-CW

32

166.8

87

11.49

llama-3-8b-instruct

INT4-MIXED

32

179

87

11.49

falcon-7b-instruct

INT4-MIXED

32

186.2

87.6

11.42

minicpm-v-4_5

INT4-MIXED

217

1233.2

87.6

11.42

llama-3.1-8b-instruct

INT4-MIXED

1024

1274.9

87.7

11.40

llama-3-8b-instruct

INT4-MIXED

1024

1279.6

87.8

11.39

bloomz-7b1

INT4-MIXED

32

140.6

87.9

11.38

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

1305.5

87.9

11.38

qwen3-8b

INT4-MIXED

32

184.7

88.8

11.26

gemma-4-e2b-it

INT8-CW

1024

1452

89

11.24

minicpm4-8b

INT4-MIXED

32

190.2

89.1

11.22

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

1400.2

89.3

11.20

afm-4.5b

INT8-CW

32

168.6

89.4

11.19

internvl2-4b

INT8-CW

1027

1554.7

89.4

11.19

qwen2.5-vl-7b-instruct

INT4-MIXED

32

257.1

89.4

11.19

fara-7b

INT4-MIXED

32

271.7

89.8

11.14

qwen2.5-vl-7b-instruct

INT4-MIXED

32

258.9

90

11.11

minicpm4-8b

INT4-MIXED

32

190

90.1

11.10

deepseek-r1-distill-llama-8b

INT4-MIXED

1024

1377

90.3

11.07

llama-3-8b-instruct

INT4-MIXED

1024

1370.4

90.3

11.07

llama-3.1-8b-instruct

INT4-MIXED

1024

1369.7

90.3

11.07

bloomz-7b1

INT4-MIXED

32

170.8

90.4

11.06

llama-3-8b-instruct

INT4-MIXED

1024

1481.2

90.4

11.06

minicpm4-8b

INT4-MIXED

1024

1345.2

90.4

11.06

zephyr-7b-beta

INT4-MIXED

32

181.7

90.6

11.04

glm-edge-4b-chat

INT8-CW

1024

1488.2

91.1

10.98

llama-3-8b-instruct

INT4-MIXED

1024

1562.5

91.3

10.95

minicpm-v-4_5

INT4-MIXED

217

1550.6

91.4

10.94

qwen3-8b

INT4-MIXED

1024

1310.6

91.4

10.94

falcon-7b-instruct

INT4-MIXED

1024

1604.6

91.6

10.92

qwen3-8b

INT4-MIXED

32

189.4

91.9

10.88

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

1465.9

92.2

10.85

afm-4.5b

INT8-CW

1024

986.9

92.8

10.78

minicpm4-8b

INT4-MIXED

1024

1465.9

93

10.75

qwen2.5-vl-7b-instruct

INT4-MIXED

1024

1640

93.1

10.74

fara-7b

INT4-MIXED

1024

1974.8

93.4

10.71

qwen3-8b

INT4-MIXED

1024

1395.6

94

10.64

bloomz-7b1

INT4-MIXED

1024

1606.9

94.2

10.62

minicpm4-8b

INT4-MIXED

1024

1677.2

94.3

10.60

minicpm3-4b

INT8-CW

32

181

94.4

10.59

qwen3.5-9b

INT4-MIXED

83

334

94.8

10.55

zephyr-7b-beta

INT4-MIXED

1024

1418.2

94.9

10.54

phi-4-multimodal-instruct

INT8-CW

578

1249.2

96.4

10.37

qwen3.5-9b

INT4-MIXED

1024

2018

96.4

10.37

phi-4-multimodal-instruct

INT8-CW

786

1510.7

96.7

10.34

qwen3-8b

INT4-MIXED

1024

1622.1

96.8

10.33

bloomz-7b1

INT4-MIXED

1024

1927.3

96.9

10.32

glm-4-9b-chat-hf

INT4-MIXED

32

168.3

98.1

10.19

gemma-2b-it

FP16

32

115.3

98.9

10.11

baichuan2-7b-chat

INT4-MIXED

32

192.8

99.3

10.07

gemma-2b-it

FP16

1024

654.6

99.7

10.03

phi-4-multimodal-instruct

INT8-CW

1362

2977

100

10.00

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1