openvino_genai.SparseAttentionMode#
- class openvino_genai.SparseAttentionMode#
Bases:
pybind11_object
- Represents the mode of sparse attention applied during generation.
- param SparseAttentionMode.TRISHAPE:
Sparse attention will be applied to prefill stage only, with a configurable number of start and recent cache tokens to be retained. A number of prefill tokens in the end of the prompt can be configured to have dense attention applied to them instead, to retain generation accuracy.
- param SparseAttentionMode.XATTENTION:
Following https://arxiv.org/pdf/2503.16428, introduces importance score threshold-based block sparsity into the prefill stage. Computing importance scores introduces an overhead, but the total inference time is expected to be reduced even more.
Members:
TRISHAPE
XATTENTION
- __init__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, value: SupportsInt) None #
Methods
__delattr__
(name, /)Implement delattr(self, name).
__dir__
()Default dir() implementation.
__eq__
(self, other, /)__format__
(format_spec, /)Default object formatter.
__ge__
(value, /)Return self>=value.
__getattribute__
(name, /)Return getattr(self, name).
__getstate__
(self, /)__gt__
(value, /)Return self>value.
__hash__
(self, /)__index__
(self, /)__init__
(self, value)This method is called when a class is subclassed.
__int__
(self, /)__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(self, other, /)__new__
(**kwargs)Helper for pickle.
__reduce_ex__
(protocol, /)Helper for pickle.
__repr__
(self, /)__setattr__
(name, value, /)Implement setattr(self, name, value).
__setstate__
(self, state, /)Size of object in memory, in bytes.
__str__
(self, /)Abstract classes can override this to customize issubclass().
Attributes
__entries
- TRISHAPE = <SparseAttentionMode.TRISHAPE: 0>#
- XATTENTION = <SparseAttentionMode.XATTENTION: 1>#
- __annotations__ = {}#
- __class__#
alias of
pybind11_type
- __delattr__(name, /)#
Implement delattr(self, name).
- __dir__()#
Default dir() implementation.
- __eq__(self: object, other: object, /) bool #
- __format__(format_spec, /)#
Default object formatter.
Return str(self) if format_spec is empty. Raise TypeError otherwise.
- __ge__(value, /)#
Return self>=value.
- __getattribute__(name, /)#
Return getattr(self, name).
- __getstate__(self: object, /) int #
- __gt__(value, /)#
Return self>value.
- __hash__(self: object, /) int #
- __index__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, /) int #
- __init__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, value: SupportsInt) None #
- __init_subclass__()#
This method is called when a class is subclassed.
The default implementation does nothing. It may be overridden to extend subclasses.
- __int__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, /) int #
- __le__(value, /)#
Return self<=value.
- __lt__(value, /)#
Return self<value.
- __members__ = {'TRISHAPE': <SparseAttentionMode.TRISHAPE: 0>, 'XATTENTION': <SparseAttentionMode.XATTENTION: 1>}#
- __ne__(self: object, other: object, /) bool #
- __new__(**kwargs)#
- __reduce__()#
Helper for pickle.
- __reduce_ex__(protocol, /)#
Helper for pickle.
- __repr__(self: object, /) str #
- __setattr__(name, value, /)#
Implement setattr(self, name, value).
- __setstate__(self: openvino_genai.py_openvino_genai.SparseAttentionMode, state: SupportsInt, /) None #
- __sizeof__()#
Size of object in memory, in bytes.
- __str__(self: object, /) str #
- __subclasshook__()#
Abstract classes can override this to customize issubclass().
This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).
- _pybind11_conduit_v1_()#
- property name#
- property value#