Obtaining a Stateful OpenVINO Model#

If the original framework does not offer a dedicated API for working with states, the resulting OpenVINO IR model will not be stateful by default. This means it will not contain either a state or the Assign and ReadValue operations. You can still make such models stateful (see benefits), and you have three ways to do it:

Optimum-Intel - an automated solution applicable to a selection of models (not covered by this article, for a usage guide refer to the LLM Inference with Hugging Face and Optimum Intel article).
MakeStateful transformation - to choose which pairs of Parameter and Result to replace.
LowLatency2 transformation - to detect and replace Parameter and Result pairs connected to hidden and cell state inputs of LSTM/RNN/GRU operations or Loop/TensorIterator operations.

MakeStateful Transformation#

The MakeStateful transformation changes the structure of the model by replacing the user-defined pairs of Parameter and Results with the Assign and ReadValue operations:

Only strict syntax is supported. As shown in the example below, the transformation call must be enclosed in double quotes “MakeStateful[…]”, tensor names - in single quotes without spaces ‘tensor_name_1’.

State naming rule: in most cases, the name of a state is a concatenation of the Parameter/Result tensor names. If there are no tensor names, friendly names are used.

Examples:

detailed diagram of MakeStateful Transformation

Python

Using tensor names

    core = ov.Core()
    ov_model = core.read_model("path_to_the_model")
    tensor_names = {"tensor_name_1": "tensor_name_4",
                    "tensor_name_3": "tensor_name_6"}
    manager = Manager()
    manager.register_pass(MakeStateful(tensor_names))
    manager.run_passes(ov_model)

Using Parameter/Result operations

    core = ov.Core()
    ov_model = core.read_model("path_to_the_model")
    # Parameter_1, Result_1, Parameter_3, Result_3 are 
    # ops.parameter/ops.result in the ov_model
    pairs = ["""(Parameter_1, Result_1), (Parameter_3, Result_3)"""]
    manager = Manager()
    manager.register_pass(MakeStateful(pairs))
    manager.run_passes(ov_model)

C++

Using tensor names

    ov::Core core;
    auto ov_model = core.read_model("path_to_the_model");
    std::map<std::string, std::string> tensor_names = {{"tensor_name_1", "tensor_name_4"},
                                                  {"tensor_name_3", "tensor_name_6"}};
    ov::pass::Manager manager;
    manager.register_pass<ov::pass::MakeStateful>(tensor_names);
    manager.run_passes(ov_model);

Using Parameter/Result operations

    ov::Core core;
    auto ov_model = core.read_model("path_to_the_model");
    // Parameter_1, Result_1, Parameter_3, Result_3 are shared_ptr<Parameter/Result> in the ov_model
    std::vector<std::pair<std::shared_ptr<ov::opset8::Parameter>, std::shared_ptr<ov::opset8::Result>>> pairs
            = {/*Parameter_1, Result_1, Parameter_3, Result_3*/};
    ov::pass::Manager manager;
    manager.register_pass<ov::pass::MakeStateful>(pairs);
    manager.run_passes(ov_model);

command line

Using tensor names

--input_model <INPUT_MODEL> --transform "MakeStateful[param_res_names={'tensor_name_1':'tensor_name_4','tensor_name_3':'tensor_name_6'}]"

Stateful Model from Scratch#

The main approach to obtaining stateful OpenVINO IR models is converting from other frameworks. Nonetheless, it is possible to create a model from scratch. Check how to do so in the Build OpenVINO Model section.

Here is also an example of how ov::SinkVector is used to create ov::Model. For a model with states, except inputs and outputs, Assign nodes should also point to Model to avoid deleting it during graph transformations. You can do it with the constructor, as in the example, or with the add_sinks(const SinkVector& sinks) method. Also, you can delete a sink from ov::Model after deleting the node from the graph with the delete_sink() method.

Python

    input = ops.parameter([1, 1], dtype=np.float32, name="data")
    init_const = ops.constant([[0]], dtype=np.float32)

    # Typically ReadValue/Assign operations are presented as pairs in models.
    # ReadValue operation reads information from an internal memory buffer, Assign operation writes data to this buffer.
    # For each pair, its own Variable object must be created.
    # Variable defines name, shape and type of the buffer.
    var_info = VariableInfo()
    var_info.data_shape = init_const.get_shape()
    var_info.data_type = init_const.get_element_type()
    var_info.variable_id = "variable0"
    variable = Variable(var_info)

    # Creating Model
    read = ops.read_value(init_const, variable)
    add = ops.add(input, read)
    assign = ops.assign(add, variable)
    result = ops.result(add)
    model = ov.Model(results=[result], sinks=[assign], parameters=[input], name="model")

C++

    // ...

    auto input = std::make_shared<ov::opset8::Parameter>(ov::element::f32, ov::Shape{1, 1});
    auto init_const = ov::opset8::Constant::create(ov::element::f32, ov::Shape{1, 1}, {0});

    // Typically ReadValue/Assign operations are presented as pairs in models.
    // ReadValue operation reads information from an internal memory buffer, Assign operation writes data to this buffer.
    // For each pair, its own Variable object must be created.
    // Variable defines name, shape and type of the buffer.
    const std::string variable_name("variable0");
    ov::op::util::VariableInfo var_info = {init_const->get_shape(),
                                           init_const->get_element_type(),
                                           variable_name};
    auto variable = std::make_shared<ov::op::util::Variable>(var_info);

    // Creating ov::Model
    auto read = std::make_shared<ov::opset8::ReadValue>(init_const, variable);
    auto add = std::make_shared<ov::opset8::Add>(input, read);
    auto save = std::make_shared<ov::opset8::Assign>(add, variable);
    auto result = std::make_shared<ov::opset8::Result>(add);

    auto model = std::make_shared<ov::Model>(ov::ResultVector({result}),
                                             ov::SinkVector({save}),
                                             ov::ParameterVector({input}));

Note

ONNX and frameworks supported via ONNX format: LSTM, RNN, GRU original layers are converted to the GRU/RNN/LSTM Sequence operations. ONNX Loop layer is converted to the OpenVINO Loop operation.

TensorFlow: BlockLSTM is converted to a TensorIterator operation. The TensorIterator body contains LSTM Cell operation. Modifications such as Peepholes and InputForget are not supported. The While layer is converted to a TensorIterator. The TensorIterator body can contain any supported operations. However, dynamic cases where the count of iterations cannot be calculated during shape inference are not supported.

TensorFlow2: While layer is converted to a Loop operation. The Loop body can contain any supported operations.

Obtaining a Stateful OpenVINO Model#

MakeStateful Transformation#

LowLatency2 Transformation#

Stateful Model from Scratch#