openvino_genai.SchedulerConfig#

class openvino_genai.SchedulerConfig#

Bases: pybind11_object

SchedulerConfig to construct ContinuousBatchingPipeline

Parameters: max_num_batched_tokens: a maximum number of tokens to batch (in contrast to max_batch_size which combines

independent sequences, we consider total amount of tokens in a batch).

num_kv_blocks: total number of KV blocks available to scheduler logic. cache_size: total size of KV cache in GB. block_size: block size for KV cache. dynamic_split_fuse: whether to split prompt / generate to different scheduling phases.

vLLM-like settings: max_num_seqs: max number of scheduled sequences (you can think of it as “max batch size”). enable_prefix_caching: Enable caching of KV-blocks.

When turned on all previously calculated KV-caches are kept in memory for future usages. KV-caches can be rewritten if KV-cache limit is reached, but blocks are not released. This results in more RAM usage, maximum RAM usage is determined by cache_size or num_kv_blocks parameters. When turend off only KV-cache required for batch calculation is kept in memory and when a sequence has finished genegartion its cache is released.

__init__(self: openvino_genai.py_openvino_genai.SchedulerConfig) None#

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__(self)

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

Attributes

cache_eviction_config

cache_size

dynamic_split_fuse

enable_prefix_caching

max_num_batched_tokens

max_num_seqs

num_kv_blocks

use_cache_eviction

__class__#

alias of pybind11_type

__delattr__(name, /)#

Implement delattr(self, name).

__dir__()#

Default dir() implementation.

__eq__(value, /)#

Return self==value.

__format__(format_spec, /)#

Default object formatter.

__ge__(value, /)#

Return self>=value.

__getattribute__(name, /)#

Return getattr(self, name).

__gt__(value, /)#

Return self>value.

__hash__()#

Return hash(self).

__init__(self: openvino_genai.py_openvino_genai.SchedulerConfig) None#
__init_subclass__()#

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)#

Return self<=value.

__lt__(value, /)#

Return self<value.

__ne__(value, /)#

Return self!=value.

__new__(**kwargs)#
__reduce__()#

Helper for pickle.

__reduce_ex__(protocol, /)#

Helper for pickle.

__repr__()#

Return repr(self).

__setattr__(name, value, /)#

Implement setattr(self, name, value).

__sizeof__()#

Size of object in memory, in bytes.

__str__()#

Return str(self).

__subclasshook__()#

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

property cache_eviction_config#
property cache_size#
property dynamic_split_fuse#
property enable_prefix_caching#
property max_num_batched_tokens#
property max_num_seqs#
property num_kv_blocks#
property use_cache_eviction#