openvino_genai.CacheEvictionConfig#

class openvino_genai.CacheEvictionConfig#

Bases: pybind11_object

Configuration struct for the cache eviction algorithm. :param start_size: Number of tokens in the beginning of KV cache that should be retained in the KV cache for this sequence during generation. Must be non-zero and a multiple of the KV cache block size for this pipeline. :type start_size: int

Parameters:
  • recent_size (int) – Number of tokens in the end of KV cache that should be retained in the KV cache for this sequence during generation. Must be non-zero and a multiple of the KV cache block size for this pipeline.

  • max_cache_size (int) – Maximum number of tokens that should be kept in the KV cache. The evictable block area will be located between the “start” and “recent” blocks and its size will be calculated as (max_cache_size - start_size - recent_size). Must be non-zero, larger than (start_size + recent_size), and a multiple of the KV cache block size for this pipeline. Note that since only the completely filled blocks are evicted, the actual maximum per-sequence KV cache size in tokens may be up to (max_cache_size + SchedulerConfig.block_size - 1).

  • aggregation_mode (openvino_genai.AggregationMode) – The mode used to compute the importance of tokens for eviction

__init__(self: openvino_genai.py_openvino_genai.CacheEvictionConfig, start_size: int, recent_size: int, max_cache_size: int, aggregation_mode: ov::genai::AggregationMode) None#

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__(self, start_size, recent_size, ...)

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

Attributes

aggregation_mode

__class__#

alias of pybind11_type

__delattr__(name, /)#

Implement delattr(self, name).

__dir__()#

Default dir() implementation.

__eq__(value, /)#

Return self==value.

__format__(format_spec, /)#

Default object formatter.

__ge__(value, /)#

Return self>=value.

__getattribute__(name, /)#

Return getattr(self, name).

__gt__(value, /)#

Return self>value.

__hash__()#

Return hash(self).

__init__(self: openvino_genai.py_openvino_genai.CacheEvictionConfig, start_size: int, recent_size: int, max_cache_size: int, aggregation_mode: ov::genai::AggregationMode) None#
__init_subclass__()#

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)#

Return self<=value.

__lt__(value, /)#

Return self<value.

__ne__(value, /)#

Return self!=value.

__new__(**kwargs)#
__reduce__()#

Helper for pickle.

__reduce_ex__(protocol, /)#

Helper for pickle.

__repr__()#

Return repr(self).

__setattr__(name, value, /)#

Implement setattr(self, name, value).

__sizeof__()#

Size of object in memory, in bytes.

__str__()#

Return str(self).

__subclasshook__()#

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

property aggregation_mode#