openvino_genai.WhisperPipeline#

class openvino_genai.WhisperPipeline#

Bases: pybind11_object

Automatic speech recognition pipeline

__init__(self: openvino_genai.py_openvino_genai.WhisperPipeline, models_path: os.PathLike, device: str, **kwargs) → None#: WhisperPipeline class constructor. models_path (os.PathLike): Path to the model file. device (str): Device to run the model on (e.g., CPU, GPU).

Methods

`__delattr__`(name, /)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattribute__`(name, /)	Return getattr(self, name).
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init__`(self, models_path, device, **kwargs)	WhisperPipeline class constructor.
`__init_subclass__`	This method is called when a class is subclassed.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__ne__`(value, /)	Return self!=value.
`__new__`(**kwargs)
`__reduce__`()	Helper for pickle.
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value, /)	Implement setattr(self, name, value).
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`generate`(self, raw_speech_input[, ...])	High level generate that receives raw speech as a vector of floats and returns decoded output.
`get_generation_config`(self)
`get_tokenizer`(self)
`set_generation_config`(self, config)

__class__#: alias of pybind11_type

__delattr__(name, /)#: Implement delattr(self, name).

__dir__()#: Default dir() implementation.

__eq__(value, /)#: Return self==value.

__format__(format_spec, /)#: Default object formatter.

__ge__(value, /)#: Return self>=value.

__getattribute__(name, /)#: Return getattr(self, name).

__gt__(value, /)#: Return self>value.

__hash__()#: Return hash(self).

__init__(self: openvino_genai.py_openvino_genai.WhisperPipeline, models_path: os.PathLike, device: str, **kwargs) → None#: WhisperPipeline class constructor. models_path (os.PathLike): Path to the model file. device (str): Device to run the model on (e.g., CPU, GPU).

__init_subclass__()#

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)#: Return self<=value.

__lt__(value, /)#: Return self<value.

__ne__(value, /)#: Return self!=value.

__new__(**kwargs)#

__reduce__()#: Helper for pickle.

__reduce_ex__(protocol, /)#: Helper for pickle.

__repr__()#: Return repr(self).

__setattr__(name, value, /)#: Implement setattr(self, name, value).

__sizeof__()#: Size of object in memory, in bytes.

__str__()#: Return str(self).

__subclasshook__()#

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

generate(self: openvino_genai.py_openvino_genai.WhisperPipeline, raw_speech_input: list[float], generation_config: openvino_genai.py_openvino_genai.WhisperGenerationConfig | None = None, streamer: Callable[[str], bool] | openvino_genai.py_openvino_genai.StreamerBase | None = None, **kwargs) → openvino_genai.py_openvino_genai.DecodedResults#

High level generate that receives raw speech as a vector of floats and returns decoded output.

Parameters:

raw_speech_input (List[float]) – inputs in the form of list of floats. Required to be normalized to near [-1, 1] range and have 16k Hz sampling rate.
generation_config (WhisperGenerationConfig or a Dict) – generation_config
streamer – streamer either as a lambda with a boolean returning flag whether generation should be stopped. Streamer supported for short-form audio (< 30 seconds) with return_timestamps=False only

:type : Callable[[str], bool], ov.genai.StreamerBase

Parameters:: kwargs – arbitrary keyword arguments with keys corresponding to WhisperGenerationConfig fields.

:type : Dict

Returns:: return results in encoded, or decoded form depending on inputs type
Return type:: DecodedResults

WhisperGenerationConfig :param max_length: the maximum length the generated tokens can have. Corresponds to the length of the input prompt +

max_new_tokens. Its effect is overridden by max_new_tokens, if also set.

Parameters:

max_new_tokens (int) – the maximum numbers of tokens to generate, excluding the number of tokens in the prompt. max_new_tokens has priority over max_length.
eos_token_id (int) – End of stream token id.

Whisper specific parameters:

Parameters:

decoder_start_token_id (int) – Corresponds to the ”<|startoftranscript|>” token.
pad_token_id (int) – Padding token id.
translate_token_id (int) – Translate token id.
transcribe_token_id (int) – Transcribe token id.
no_timestamps_token_id (int) – No timestamps token id.
is_multilingual (bool)
begin_suppress_tokens (list[int]) – A list containing tokens that will be suppressed at the beginning of the sampling process.
suppress_tokens (list[int]) – A list containing the non-speech tokens that will be suppressed during generation.
language (Optional[str]) – Language token to use for generation in the form of <|en|>. You can find all the possible language tokens in the generation_config.json lang_to_id dictionary.
lang_to_id (Dict[str, int]) – Language token to token_id map. Initialized from the generation_config.json lang_to_id dictionary.
task (int) – Task to use for generation, either “translate” or “transcribe”
return_timestamps (bool) –
If true the pipeline will return timestamps along the text for segments of words in the text. For instance, if you get WhisperDecodedResultChunk

start_ts = 0.5 end_ts = 1.5 text = “ Hi there!”

then it means the model predicts that the segment “Hi there!” was spoken after 0.5 and before 1.5 seconds. Note that a segment of text refers to a sequence of one or more words, rather than individual words.

get_generation_config(self: openvino_genai.py_openvino_genai.WhisperPipeline) → openvino_genai.py_openvino_genai.WhisperGenerationConfig#

get_tokenizer(self: openvino_genai.py_openvino_genai.WhisperPipeline) → openvino_genai.py_openvino_genai.Tokenizer#

set_generation_config(self: openvino_genai.py_openvino_genai.WhisperPipeline, config: openvino_genai.py_openvino_genai.WhisperGenerationConfig) → None#