openvino_genai.Tokenizer#

class openvino_genai.Tokenizer#

Bases: pybind11_object

The class is used to encode prompts and decode resulting tokens

Chat tempalte is initialized from sources in the following order overriding the previos value: 1. chat_template entry from tokenizer_config.json 2. chat_template entry from processor_config.json 3. chat_template entry from chat_template.json 4. chat_tempalte entry from rt_info section of openvino.Model 5. If the tempalte is known to be not supported by GenAI, it’s

replaced with a simplified supported version.

  1. Patch chat_tempalte replacing not supported instructions with

    eqvivalents.

  2. If the template was not in the list of not supported GenAI

    templates from (5), it’s blindly replaced with simplified_chat_template entry from rt_info section of openvino.Model if the entry exists.

__init__(*args, **kwargs)#

Overloaded function.

  1. __init__(self: openvino_genai.py_openvino_genai.Tokenizer, tokenizer_path: os.PathLike, properties: dict[str, object] = {}, **kwargs) -> None

  2. __init__(self: openvino_genai.py_openvino_genai.Tokenizer, tokenizer_model: str, tokenizer_weights: openvino._pyopenvino.Tensor, detokenizer_model: str, detokenizer_weights: openvino._pyopenvino.Tensor, **kwargs) -> None

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__(*args, **kwargs)

Overloaded function.

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

apply_chat_template(self, history, ...[, ...])

Embeds input prompts with special tags for a chat scenario.

decode(*args, **kwargs)

Overloaded function.

encode(*args, **kwargs)

Overloaded function.

get_bos_token(self)

get_bos_token_id(self)

get_eos_token(self)

get_eos_token_id(self)

get_pad_token(self)

get_pad_token_id(self)

set_chat_template(self, chat_template)

Override a chat_template read from tokenizer_config.json.

Attributes

chat_template

__class__#

alias of pybind11_type

__delattr__(name, /)#

Implement delattr(self, name).

__dir__()#

Default dir() implementation.

__eq__(value, /)#

Return self==value.

__format__(format_spec, /)#

Default object formatter.

__ge__(value, /)#

Return self>=value.

__getattribute__(name, /)#

Return getattr(self, name).

__gt__(value, /)#

Return self>value.

__hash__()#

Return hash(self).

__init__(*args, **kwargs)#

Overloaded function.

  1. __init__(self: openvino_genai.py_openvino_genai.Tokenizer, tokenizer_path: os.PathLike, properties: dict[str, object] = {}, **kwargs) -> None

  2. __init__(self: openvino_genai.py_openvino_genai.Tokenizer, tokenizer_model: str, tokenizer_weights: openvino._pyopenvino.Tensor, detokenizer_model: str, detokenizer_weights: openvino._pyopenvino.Tensor, **kwargs) -> None

__init_subclass__()#

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)#

Return self<=value.

__lt__(value, /)#

Return self<value.

__ne__(value, /)#

Return self!=value.

__new__(**kwargs)#
__reduce__()#

Helper for pickle.

__reduce_ex__(protocol, /)#

Helper for pickle.

__repr__()#

Return repr(self).

__setattr__(name, value, /)#

Implement setattr(self, name, value).

__sizeof__()#

Size of object in memory, in bytes.

__str__()#

Return str(self).

__subclasshook__()#

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

apply_chat_template(self: openvino_genai.py_openvino_genai.Tokenizer, history: list[dict[str, str]], add_generation_prompt: bool, chat_template: str = '') str#

Embeds input prompts with special tags for a chat scenario.

property chat_template#
decode(*args, **kwargs)#

Overloaded function.

  1. decode(self: openvino_genai.py_openvino_genai.Tokenizer, tokens: list[int], skip_special_tokens: bool = True) -> str

Decode a sequence into a string prompt.

  1. decode(self: openvino_genai.py_openvino_genai.Tokenizer, tokens: openvino._pyopenvino.Tensor, skip_special_tokens: bool = True) -> list[str]

Decode tensor into a list of string prompts.

  1. decode(self: openvino_genai.py_openvino_genai.Tokenizer, tokens: list[list[int]], skip_special_tokens: bool = True) -> list[str]

Decode a batch of tokens into a list of string prompt.

encode(*args, **kwargs)#

Overloaded function.

  1. encode(self: openvino_genai.py_openvino_genai.Tokenizer, prompts: list[str], add_special_tokens: bool = True, pad_to_max_length: bool = False, max_length: Optional[int] = None) -> openvino_genai.py_openvino_genai.TokenizedInputs

Encodes a list of prompts into tokenized inputs.

  1. encode(self: openvino_genai.py_openvino_genai.Tokenizer, prompt: str, add_special_tokens: bool = True, pad_to_max_length: bool = False, max_length: Optional[int] = None) -> openvino_genai.py_openvino_genai.TokenizedInputs

Encodes a single prompt into tokenized input.

get_bos_token(self: openvino_genai.py_openvino_genai.Tokenizer) str#
get_bos_token_id(self: openvino_genai.py_openvino_genai.Tokenizer) int#
get_eos_token(self: openvino_genai.py_openvino_genai.Tokenizer) str#
get_eos_token_id(self: openvino_genai.py_openvino_genai.Tokenizer) int#
get_pad_token(self: openvino_genai.py_openvino_genai.Tokenizer) str#
get_pad_token_id(self: openvino_genai.py_openvino_genai.Tokenizer) int#
set_chat_template(self: openvino_genai.py_openvino_genai.Tokenizer, chat_template: str) None#

Override a chat_template read from tokenizer_config.json.