openvino_genai.WhisperPipeline#

class openvino_genai.WhisperPipeline#

Bases: pybind11_object

Automatic speech recognition pipeline

__init__(self: openvino_genai.py_openvino_genai.WhisperPipeline, models_path: os.PathLike, device: str, **kwargs) None#

WhisperPipeline class constructor. models_path (os.PathLike): Path to the model file. device (str): Device to run the model on (e.g., CPU, GPU).

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__(self, models_path, device, **kwargs)

WhisperPipeline class constructor.

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

generate(self, raw_speech_input[, ...])

High level generate that receives raw speech as a vector of floats and returns decoded output.

get_generation_config(self)

get_tokenizer(self)

set_generation_config(self, config)

__class__#

alias of pybind11_type

__delattr__(name, /)#

Implement delattr(self, name).

__dir__()#

Default dir() implementation.

__eq__(value, /)#

Return self==value.

__format__(format_spec, /)#

Default object formatter.

__ge__(value, /)#

Return self>=value.

__getattribute__(name, /)#

Return getattr(self, name).

__gt__(value, /)#

Return self>value.

__hash__()#

Return hash(self).

__init__(self: openvino_genai.py_openvino_genai.WhisperPipeline, models_path: os.PathLike, device: str, **kwargs) None#

WhisperPipeline class constructor. models_path (os.PathLike): Path to the model file. device (str): Device to run the model on (e.g., CPU, GPU).

__init_subclass__()#

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)#

Return self<=value.

__lt__(value, /)#

Return self<value.

__ne__(value, /)#

Return self!=value.

__new__(**kwargs)#
__reduce__()#

Helper for pickle.

__reduce_ex__(protocol, /)#

Helper for pickle.

__repr__()#

Return repr(self).

__setattr__(name, value, /)#

Implement setattr(self, name, value).

__sizeof__()#

Size of object in memory, in bytes.

__str__()#

Return str(self).

__subclasshook__()#

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

generate(self: openvino_genai.py_openvino_genai.WhisperPipeline, raw_speech_input: list[float], generation_config: openvino_genai.py_openvino_genai.WhisperGenerationConfig | None = None, streamer: Callable[[str], int | None] | openvino_genai.py_openvino_genai.StreamerBase | None = None, **kwargs) openvino_genai.py_openvino_genai.WhisperDecodedResults#

High level generate that receives raw speech as a vector of floats and returns decoded output.

Parameters:
  • raw_speech_input (List[float]) – inputs in the form of list of floats. Required to be normalized to near [-1, 1] range and have 16k Hz sampling rate.

  • generation_config (WhisperGenerationConfig or a Dict) – generation_config

  • streamer – streamer either as a lambda with a boolean returning flag whether generation should be stopped. Streamer supported for short-form audio (< 30 seconds) with return_timestamps=False only

:type : Callable[[str], bool], ov.genai.StreamerBase

Parameters:

kwargs – arbitrary keyword arguments with keys corresponding to WhisperGenerationConfig fields.

:type : Dict

Returns:

return results in decoded form

Return type:

WhisperDecodedResults

WhisperGenerationConfig

Whisper specific parameters: :param decoder_start_token_id: Corresponds to the ”<|startoftranscript|>” token. :type decoder_start_token_id: int

Parameters:
  • pad_token_id (int) – Padding token id.

  • translate_token_id (int) – Translate token id.

  • transcribe_token_id (int) – Transcribe token id.

  • no_timestamps_token_id (int) – No timestamps token id.

  • prev_sot_token_id (int) – Corresponds to the ”<|startofprev|>” token.

  • is_multilingual (bool)

  • begin_suppress_tokens (list[int]) – A list containing tokens that will be suppressed at the beginning of the sampling process.

  • suppress_tokens (list[int]) – A list containing the non-speech tokens that will be suppressed during generation.

  • language (Optional[str]) – Language token to use for generation in the form of <|en|>. You can find all the possible language tokens in the generation_config.json lang_to_id dictionary.

  • lang_to_id (Dict[str, int]) – Language token to token_id map. Initialized from the generation_config.json lang_to_id dictionary.

  • task (int) – Task to use for generation, either “translate” or “transcribe”

  • return_timestamps (bool) –

    If true the pipeline will return timestamps along the text for segments of words in the text. For instance, if you get WhisperDecodedResultChunk

    start_ts = 0.5 end_ts = 1.5 text = “ Hi there!”

    then it means the model predicts that the segment “Hi there!” was spoken after 0.5 and before 1.5 seconds. Note that a segment of text refers to a sequence of one or more words, rather than individual words.

  • initial_prompt – Initial prompt tokens passed as a previous transcription (after <|startofprev|> token) to the first processing

window. Can be used to steer the model to use particular spellings or styles.

Example:

auto result = pipeline.generate(raw_speech); // He has gone and gone for good answered Paul Icrom who…

auto result = pipeline.generate(raw_speech, ov::genai::initial_prompt(“Polychrome”)); // He has gone and gone for good answered Polychrome who…

Parameters:

hotwords – Hotwords tokens passed as a previous transcription (after <|startofprev|> token) to the all processing windows.

Can be used to steer the model to use particular spellings or styles.

Example:

auto result = pipeline.generate(raw_speech); // He has gone and gone for good answered Paul Icrom who…

auto result = pipeline.generate(raw_speech, ov::genai::hotwords(“Polychrome”)); // He has gone and gone for good answered Polychrome who…

Generic parameters: max_length: the maximum length the generated tokens can have. Corresponds to the length of the input prompt +

max_new_tokens. Its effect is overridden by max_new_tokens, if also set.

max_new_tokens: the maximum numbers of tokens to generate, excluding the number of tokens in the prompt. max_new_tokens has priority over max_length. min_new_tokens: set 0 probability for eos_token_id for the first eos_token_id generated tokens. ignore_eos: if set to true, then generation will not stop even if <eos> token is met. eos_token_id: token_id of <eos> (end of sentence) stop_strings: a set of strings that will cause pipeline to stop generating further tokens. include_stop_str_in_output: if set to true stop string that matched generation will be included in generation output (default: false) stop_token_ids: a set of tokens that will cause pipeline to stop generating further tokens. echo: if set to true, the model will echo the prompt in the output. logprobs: number of top logprobs computed for each position, if set to 0, logprobs are not computed and value 0.0 is returned.

Currently only single top logprob can be returned, so any logprobs > 1 is treated as logprobs == 1. (default: 0).

repetition_penalty: the parameter for repetition penalty. 1.0 means no penalty. presence_penalty: reduces absolute log prob if the token was generated at least once. frequency_penalty: reduces absolute log prob as many times as the token was generated.

Beam search specific parameters: num_beams: number of beams for beam search. 1 disables beam search. num_beam_groups: number of groups to divide num_beams into in order to ensure diversity among different groups of beams. diversity_penalty: value is subtracted from a beam’s score if it generates the same token as any beam from other group at a particular time. length_penalty: exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to

the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences.

num_return_sequences: the number of sequences to return for grouped beam search decoding. no_repeat_ngram_size: if set to int > 0, all ngrams of that size can only occur once. stop_criteria: controls the stopping condition for grouped beam search. It accepts the following values:

“openvino_genai.StopCriteria.EARLY”, where the generation stops as soon as there are num_beams complete candidates; “openvino_genai.StopCriteria.HEURISTIC” is applied and the generation stops when is it very unlikely to find better candidates; “openvino_genai.StopCriteria.NEVER”, where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm).

Random sampling parameters: temperature: the value used to modulate token probabilities for random sampling. top_p: if set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. top_k: the number of highest probability vocabulary tokens to keep for top-k-filtering. do_sample: whether or not to use multinomial random sampling that add up to top_p or higher are kept. num_return_sequences: the number of sequences to generate from a single prompt.

get_generation_config(self: openvino_genai.py_openvino_genai.WhisperPipeline) openvino_genai.py_openvino_genai.WhisperGenerationConfig#
get_tokenizer(self: openvino_genai.py_openvino_genai.WhisperPipeline) openvino_genai.py_openvino_genai.Tokenizer#
set_generation_config(self: openvino_genai.py_openvino_genai.WhisperPipeline, config: openvino_genai.py_openvino_genai.WhisperGenerationConfig) None#