openvino_genai#

openvino genai module namespace, exposing pipelines and configs to create these pipelines.

Classes

Adapter

Immutable LoRA Adapter that carries the adaptation matrices and serves as unique adapter identifier.

AdapterConfig

Adapter config that defines a combination of LoRA adapters with blending parameters.

AggregationMode

Represents the mode of per-token score aggregation when determining least important tokens for eviction from cache

AutoencoderKL

AutoencoderKL class.

CLIPTextModel

CLIPTextModel class.

CLIPTextModelWithProjection

CLIPTextModelWithProjection class.

CacheEvictionConfig

Configuration struct for the cache eviction algorithm.

ContinuousBatchingPipeline

This class is used for generation with LLMs with continuous batchig

CppStdGenerator

This class wraps std::mt19937 pseudo-random generator.

DecodedResults

Structure to store resulting batched text outputs and scores for each batch.

EncodedResults

Structure to store resulting batched tokens and scores for each batch sequence.

GenerationConfig

Structure to keep generation config parameters.

GenerationResult

GenerationResult stores resulting batched tokens and scores.

Generator

This class is used for storing pseudo-random generator.

LLMPipeline

This class is used for generation with LLMs

PerfMetrics

Holds performance metrics for each generate call.

RawPerfMetrics

Structure with raw performance metrics for each generation before any statistics are calculated.

Scheduler

Scheduler for image generation pipelines.

SchedulerConfig

SchedulerConfig to construct ContinuousBatchingPipeline

StopCriteria

StopCriteria controls the stopping condition for grouped beam search.

StreamerBase

Base class for streamers.

Text2ImagePipeline

This class is used for generation with text-to-image models.

TokenizedInputs

Tokenizer

openvino_genai.Tokenizer object is used to initialize Tokenizer if it's located in a different path than the main model.

UNet2DConditionModel

UNet2DConditionModel class.

VLMPipeline

This class is used for generation with VLMs

WhisperGenerationConfig

param max_length:

the maximum length the generated tokens can have. Corresponds to the length of the input prompt +

WhisperPipeline

Automatic speech recognition pipeline

draft_model

This class is used to enable Speculative Decoding