class ov::IAsyncInferRequest

Overview

Base class with default implementation of asynchronous multi staged inference request. To customize pipeline stages derived class should change the content of IAsyncInferRequest::m_pipeline member container. It consists of pairs of tasks and executors which will run the task. The class is recommended to be used by plugins as a base class for asynchronous inference request implementation. More…

#include <iasync_infer_request.hpp>

class IAsyncInferRequest: public ov::IInferRequest
{
public:
    // structs

    struct DisableCallbackGuard;

    // construction

    IAsyncInferRequest(
        const std::shared_ptr<IInferRequest>& request,
        const std::shared_ptr<ov::threading::ITaskExecutor>& task_executor,
        const std::shared_ptr<ov::threading::ITaskExecutor>& callback_executor
        );

    // methods

    virtual void start_async();
    virtual void wait();
    virtual bool wait_for(const std::chrono::milliseconds& timeout);
    virtual void cancel();
    virtual void set_callback(std::function<void(std::exception_ptr)> callback);
    virtual void infer();
    virtual std::vector<ov::ProfilingInfo> get_profiling_info() const;
    virtual ov::Tensor get_tensor(const ov::Output<const ov::Node>& port) const;

    virtual void set_tensor(
        const ov::Output<const ov::Node>& port,
        const ov::Tensor& tensor
        );

    virtual std::vector<ov::Tensor> get_tensors(const ov::Output<const ov::Node>& port) const;

    virtual void set_tensors(
        const ov::Output<const ov::Node>& port,
        const std::vector<ov::Tensor>& tensors
        );

    virtual std::vector<std::shared_ptr<ov::IVariableState>> query_state() const;
    virtual const std::shared_ptr<const ov::ICompiledModel>& get_compiled_model() const;
    virtual const std::vector<ov::Output<const ov::Node>>& get_inputs() const;
    virtual const std::vector<ov::Output<const ov::Node>>& get_outputs() const;
};

Inherited Members

public:
    // methods

    virtual void infer() = 0;
    virtual std::vector<ov::ProfilingInfo> get_profiling_info() const = 0;
    virtual ov::Tensor get_tensor(const ov::Output<const ov::Node>& port) const = 0;

    virtual void set_tensor(
        const ov::Output<const ov::Node>& port,
        const ov::Tensor& tensor
        ) = 0;

    virtual std::vector<ov::Tensor> get_tensors(const ov::Output<const ov::Node>& port) const = 0;

    virtual void set_tensors(
        const ov::Output<const ov::Node>& port,
        const std::vector<ov::Tensor>& tensors
        ) = 0;

    virtual std::vector<std::shared_ptr<ov::IVariableState>> query_state() const = 0;
    virtual const std::shared_ptr<const ov::ICompiledModel>& get_compiled_model() const = 0;
    virtual const std::vector<ov::Output<const ov::Node>>& get_inputs() const = 0;
    virtual const std::vector<ov::Output<const ov::Node>>& get_outputs() const = 0;

Detailed Documentation

Base class with default implementation of asynchronous multi staged inference request. To customize pipeline stages derived class should change the content of IAsyncInferRequest::m_pipeline member container. It consists of pairs of tasks and executors which will run the task. The class is recommended to be used by plugins as a base class for asynchronous inference request implementation.

To synchronize derived context with stages derived class should call IAsyncInferRequest::stop_and_wait() function in destructor.

Here is an example of asynchronous inference request implementation for some accelerator device. It uses 5 different executors to run different stages of a synchronous inference request.

Methods

virtual void start_async()

Start inference of specified input(s) in asynchronous mode.

The method returns immediately. Inference starts also immediately.

virtual void wait()

Waits for the result to become available.

virtual bool wait_for(const std::chrono::milliseconds& timeout)

Waits for the result to become available. Blocks until specified timeout has elapsed or the result becomes available, whichever comes first.

Parameters:

timeout

  • maximum duration in milliseconds to block for

Returns:

A true if results are ready.

virtual void cancel()

Cancel current inference request execution.

virtual void set_callback(std::function<void(std::exception_ptr)> callback)

Set callback function which will be called on success or failure of asynchronous request.

Parameters:

callback

  • function to be called with the following description:

virtual void infer()

Infers specified input(s) in synchronous mode.

blocks all method of InferRequest while request is ongoing (running or waiting in queue)

virtual std::vector<ov::ProfilingInfo> get_profiling_info() const

Queries performance measures per layer to identify the most time consuming operation.

Not all plugins provide meaningful data.

Returns:

Vector of profiling information for operations in a model.

virtual ov::Tensor get_tensor(const ov::Output<const ov::Node>& port) const

Gets an input/output tensor for inference.

If the tensor with the specified port is not found, an exception is thrown.

Parameters:

port

Port of the tensor to get.

Returns:

Tensor for the port port.

virtual void set_tensor(
    const ov::Output<const ov::Node>& port,
    const ov::Tensor& tensor
    )

Sets an input/output tensor to infer.

Parameters:

port

Port of the input or output tensor.

tensor

Reference to a tensor. The element_type and shape of a tensor must match the model’s input/output element_type and size.

virtual std::vector<ov::Tensor> get_tensors(const ov::Output<const ov::Node>& port) const

Gets a batch of tensors for input data to infer by input port. Model input must have batch dimension, and the number of tensors must match the batch size. The current version supports setting tensors to model inputs only. If port is associated with output (or any other non-input node), an exception is thrown.

Parameters:

port

Port of the input tensor.

tensors

Input tensors for batched infer request. The type of each tensor must match the model input element type and shape (except batch dimension). Total size of tensors must match the input size.

Returns:

vector of tensors

virtual void set_tensors(
    const ov::Output<const ov::Node>& port,
    const std::vector<ov::Tensor>& tensors
    )

Sets a batch of tensors for input data to infer by input port. Model input must have batch dimension, and the number of tensors must match the batch size. The current version supports setting tensors to model inputs only. If port is associated with output (or any other non-input node), an exception is thrown.

Parameters:

port

Port of the input tensor.

tensors

Input tensors for batched infer request. The type of each tensor must match the model input element type and shape (except batch dimension). Total size of tensors must match the input size.

virtual std::vector<std::shared_ptr<ov::IVariableState>> query_state() const

Gets state control interface for the given infer request.

State control essential for recurrent models.

Returns:

Vector of Variable State objects.

virtual const std::shared_ptr<const ov::ICompiledModel>& get_compiled_model() const

Gets pointer to compiled model (usually synchronous request holds the compiled model)

Returns:

Pointer to the compiled model

virtual const std::vector<ov::Output<const ov::Node>>& get_inputs() const

Gets inputs for infer request.

Returns:

vector of input ports

virtual const std::vector<ov::Output<const ov::Node>>& get_outputs() const

Gets outputs for infer request.

Returns:

vector of output ports