openvino_genai.CacheEvictionConfig#

class openvino_genai.CacheEvictionConfig#

Bases: pybind11_object

Configuration struct for the cache eviction algorithm. :param start_size: Number of tokens in the beginning of KV cache that should be retained in the KV cache for this sequence during generation. Must be non-zero and a multiple of the KV cache block size for this pipeline. :type start_size: int

Parameters:
  • recent_size (int) – Number of tokens in the end of KV cache that should be retained in the KV cache for this sequence during generation. Must be non-zero and a multiple of the KV cache block size for this pipeline.

  • max_cache_size (int) – Maximum number of tokens that should be kept in the KV cache. The evictable block area will be located between the “start” and “recent” blocks and its size will be calculated as (max_cache_size - start_size - recent_size). Must be non-zero, larger than (start_size + recent_size), and a multiple of the KV cache block size for this pipeline. Note that since only the completely filled blocks are evicted, the actual maximum per-sequence KV cache size in tokens may be up to (max_cache_size + SchedulerConfig.block_size - 1).

  • aggregation_mode (openvino_genai.AggregationMode) – The mode used to compute the importance of tokens for eviction

  • apply_rotation (bool) – Whether to apply cache rotation (RoPE-based) after each eviction. Set this to false if your model has different RoPE scheme from the one used in the original llama model and you experience accuracy issues with cache eviction enabled.

:param snapkv_window_size The size of the importance score aggregation window (in token positions from the end of the prompt) for

computing initial importance scores at the beginning of the generation phase for purposes of eviction, following the SnapKV article approach (https://arxiv.org/abs/2404.14469).

:type snapkv_window_size int

__init__(self: openvino_genai.py_openvino_genai.CacheEvictionConfig, start_size: SupportsInt, recent_size: SupportsInt, max_cache_size: SupportsInt, aggregation_mode: openvino_genai.py_openvino_genai.AggregationMode, apply_rotation: bool = False, snapkv_window_size: SupportsInt = 8, kvcrush_config: object = None) None#

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__getstate__()

Helper for pickle.

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__(self, start_size, recent_size, ...)

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

_pybind11_conduit_v1_

get_evictable_size(self)

get_max_cache_size(self)

get_recent_size(self)

get_start_size(self)

Attributes

__annotations__

aggregation_mode

apply_rotation

kvcrush_config

snapkv_window_size

__annotations__ = {}#
__class__#

alias of pybind11_type

__delattr__(name, /)#

Implement delattr(self, name).

__dir__()#

Default dir() implementation.

__eq__(value, /)#

Return self==value.

__format__(format_spec, /)#

Default object formatter.

Return str(self) if format_spec is empty. Raise TypeError otherwise.

__ge__(value, /)#

Return self>=value.

__getattribute__(name, /)#

Return getattr(self, name).

__getstate__()#

Helper for pickle.

__gt__(value, /)#

Return self>value.

__hash__()#

Return hash(self).

__init__(self: openvino_genai.py_openvino_genai.CacheEvictionConfig, start_size: SupportsInt, recent_size: SupportsInt, max_cache_size: SupportsInt, aggregation_mode: openvino_genai.py_openvino_genai.AggregationMode, apply_rotation: bool = False, snapkv_window_size: SupportsInt = 8, kvcrush_config: object = None) None#
__init_subclass__()#

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)#

Return self<=value.

__lt__(value, /)#

Return self<value.

__ne__(value, /)#

Return self!=value.

__new__(**kwargs)#
__reduce__()#

Helper for pickle.

__reduce_ex__(protocol, /)#

Helper for pickle.

__repr__()#

Return repr(self).

__setattr__(name, value, /)#

Implement setattr(self, name, value).

__sizeof__()#

Size of object in memory, in bytes.

__str__()#

Return str(self).

__subclasshook__()#

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

_pybind11_conduit_v1_()#
property aggregation_mode#
property apply_rotation#
get_evictable_size(self: openvino_genai.py_openvino_genai.CacheEvictionConfig) int#
get_max_cache_size(self: openvino_genai.py_openvino_genai.CacheEvictionConfig) int#
get_recent_size(self: openvino_genai.py_openvino_genai.CacheEvictionConfig) int#
get_start_size(self: openvino_genai.py_openvino_genai.CacheEvictionConfig) int#
property kvcrush_config#
property snapkv_window_size#