openvino_genai.SparseAttentionMode#

class openvino_genai.SparseAttentionMode#

Bases: pybind11_object

Represents the mode of sparse attention applied during generation.
param SparseAttentionMode.TRISHAPE:

Sparse attention will be applied to prefill stage only, with a configurable number of start and recent cache tokens to be retained. A number of prefill tokens in the end of the prompt can be configured to have dense attention applied to them instead, to retain generation accuracy.

param SparseAttentionMode.XATTENTION:

Following https://arxiv.org/pdf/2503.16428, introduces importance score threshold-based block sparsity into the prefill stage. Computing importance scores introduces an overhead, but the total inference time is expected to be reduced even more.

Members:

TRISHAPE

XATTENTION

__init__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, value: SupportsInt) None#

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(self, other, /)

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__getstate__(self, /)

__gt__(value, /)

Return self>value.

__hash__(self, /)

__index__(self, /)

__init__(self, value)

__init_subclass__

This method is called when a class is subclassed.

__int__(self, /)

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(self, other, /)

__new__(**kwargs)

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__(self, /)

__setattr__(name, value, /)

Implement setattr(self, name, value).

__setstate__(self, state, /)

__sizeof__()

Size of object in memory, in bytes.

__str__(self, /)

__subclasshook__

Abstract classes can override this to customize issubclass().

_pybind11_conduit_v1_

Attributes

TRISHAPE

XATTENTION

__annotations__

__entries

__members__

name

value

TRISHAPE = <SparseAttentionMode.TRISHAPE: 0>#
XATTENTION = <SparseAttentionMode.XATTENTION: 1>#
__annotations__ = {}#
__class__#

alias of pybind11_type

__delattr__(name, /)#

Implement delattr(self, name).

__dir__()#

Default dir() implementation.

__eq__(self: object, other: object, /) bool#
__format__(format_spec, /)#

Default object formatter.

Return str(self) if format_spec is empty. Raise TypeError otherwise.

__ge__(value, /)#

Return self>=value.

__getattribute__(name, /)#

Return getattr(self, name).

__getstate__(self: object, /) int#
__gt__(value, /)#

Return self>value.

__hash__(self: object, /) int#
__index__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, /) int#
__init__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, value: SupportsInt) None#
__init_subclass__()#

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__int__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, /) int#
__le__(value, /)#

Return self<=value.

__lt__(value, /)#

Return self<value.

__members__ = {'TRISHAPE': <SparseAttentionMode.TRISHAPE: 0>, 'XATTENTION': <SparseAttentionMode.XATTENTION: 1>}#
__ne__(self: object, other: object, /) bool#
__new__(**kwargs)#
__reduce__()#

Helper for pickle.

__reduce_ex__(protocol, /)#

Helper for pickle.

__repr__(self: object, /) str#
__setattr__(name, value, /)#

Implement setattr(self, name, value).

__setstate__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, state: SupportsInt, /) None#
__sizeof__()#

Size of object in memory, in bytes.

__str__(self: object, /) str#
__subclasshook__()#

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

_pybind11_conduit_v1_()#
property name#
property value#