openvino_genai#

openvino genai module namespace, exposing pipelines and configs to create these pipelines.

Functions

draft_model(models_path[, device])

device on which inference will be performed

get_version()

OpenVINO GenAI version

Classes

Adapter

Immutable LoRA Adapter that carries the adaptation matrices and serves as unique adapter identifier.

AdapterConfig

Adapter config that defines a combination of LoRA adapters with blending parameters.

AggregationMode

Represents the mode of per-token score aggregation when determining least important tokens for eviction from cache

AutoencoderKL

AutoencoderKL class.

CLIPTextModel

CLIPTextModel class.

CLIPTextModelWithProjection

CLIPTextModelWithProjection class.

CacheEvictionConfig

Configuration struct for the cache eviction algorithm.

ChunkStreamerBase

Base class for chunk streamers.

ContinuousBatchingPipeline

This class is used for generation with LLMs with continuous batchig

CppStdGenerator

This class wraps std::mt19937 pseudo-random generator.

DecodedResults

Structure to store resulting batched text outputs and scores for each batch.

EncodedResults

Structure to store resulting batched tokens and scores for each batch sequence.

FluxTransformer2DModel

FluxTransformer2DModel class.

GenerationConfig

Structure to keep generation config parameters.

GenerationFinishReason

Members:

GenerationResult

GenerationResult stores resulting batched tokens and scores.

GenerationStatus

Members:

Generator

This class is used for storing pseudo-random generator.

Image2ImagePipeline

This class is used for generation with image-to-image models.

ImageGenerationConfig

This class is used for storing generation config for image generation pipeline.

ImageGenerationPerfMetrics

Holds performance metrics for each generate call.

InpaintingPipeline

This class is used for generation with inpainting models.

KVCrushAnchorPointMode

Represents the anchor point types for KVCrush cache eviction

KVCrushConfig

Configuration for KVCrush cache eviction algorithm

LLMPipeline

This class is used for generation with LLMs

PerfMetrics

Holds performance metrics for each generate call.

RawImageGenerationPerfMetrics

Structure with raw performance metrics for each generation before any statistics are calculated.

RawPerfMetrics

Structure with raw performance metrics for each generation before any statistics are calculated.

SD3Transformer2DModel

SD3Transformer2DModel class.

Scheduler

Scheduler for image generation pipelines.

SchedulerConfig

SchedulerConfig to construct ContinuousBatchingPipeline

SparseAttentionConfig

Configuration struct for the sparse attention functionality.

SparseAttentionMode

Represents the mode of sparse attention applied during generation.

SpeechGenerationConfig

Speech-generation specific parameters: :param minlenratio: minimum ratio of output length to input text length; prevents output that's too short.

SpeechGenerationPerfMetrics

Structure with raw performance metrics for each generation before any statistics are calculated.

StopCriteria

StopCriteria controls the stopping condition for grouped beam search.

StreamerBase

Base class for streamers.

StreamingStatus

Members:

StructuralTagItem

Structure to keep generation config parameters for structural tags in structured output generation.

StructuralTagsConfig

Configures structured output generation by combining regular sampling with structural tags.

StructuredOutputConfig

Structure to keep generation config parameters for structured output generation.

T5EncoderModel

T5EncoderModel class.

Text2ImagePipeline

This class is used for generation with text-to-image models.

Text2SpeechDecodedResults

Structure that stores the result from the generate method, including a list of waveform tensors sampled at 16 kHz, along with performance metrics

Text2SpeechPipeline

Text-to-speech pipeline

TextEmbeddingPipeline

Text embedding pipeline

TextRerankPipeline

Text rerank pipeline

TextStreamer

TextStreamer is used to decode tokens into text and call a user-defined callback function.

TokenizedInputs

Tokenizer

The class is used to encode prompts and decode resulting tokens

TorchGenerator

This class provides OpenVINO GenAI Generator wrapper for torch.Generator

UNet2DConditionModel

UNet2DConditionModel class.

VLMPipeline

This class is used for generation with VLMs

WhisperGenerationConfig

Whisper specific parameters: :param decoder_start_token_id: Corresponds to the ”<|startoftranscript|>” token.

WhisperPerfMetrics

Structure with raw performance metrics for each generation before any statistics are calculated.

WhisperPipeline

Automatic speech recognition pipeline

WhisperRawPerfMetrics

Structure with whisper specific raw performance metrics for each generation before any statistics are calculated.