Remote Tensor API of NPU Plugin#
The NPU plugin implementation of the ov::RemoteContext
and ov::RemoteTensor
interface assists NPU
pipeline developers who need memory sharing with existing native APIs (for example, OpenCL, Vulkan, DirectX 12)
by exporting an NT handle on Windows, or DMA-BUF System Heap on Linux and passing that pointer as the
shared_buffer
member to the remote_tensor(..., shared_buffer)
create function. They allow you
to avoid any memory copy overhead when plugging OpenVINO™ inference into an existing NPU pipeline.
Supported scenario by the Remote Tensor API:
The NPU plugin context and memory objects can be constructed from low-level device, display, or memory handles and used to create the OpenVINO™
ov::CompiledModel
orov::Tensor
objects.
Class and function declarations for the API are defined in the following file: src/inference/include/openvino/runtime/intel_npu/level_zero/level_zero.hpp
The most common way to enable the interaction of your application with the Remote Tensor API is to use user-side utility classes and functions that consume or produce native handles directly.
Context Sharing Between Application and NPU Plugin#
NPU plugin classes that implement the ov::RemoteContext
interface are responsible for context sharing.
Obtaining a context object is the first step in sharing pipeline objects.
The context object of the NPU plugin directly wraps Level Zero context, setting a scope for sharing the
ov::RemoteTensor
objects. The ov::RemoteContext
object is retrieved from the NPU plugin.
Once you have obtained the context, you can use it to create the ov::RemoteTensor
objects.
Getting RemoteContext from the Plugin#
To request the current default context of the plugin, use one of the following methods:
auto npu_context = core.get_default_context("NPU").as<ov::intel_npu::level_zero::ZeroContext>();
// Extract raw level zero context handle from RemoteContext
void* context_handle = npu_context.get();
auto npu_context = compiled_model.get_context().as<ov::intel_npu::level_zero::ZeroContext>();
// Extract raw level zero context handle from RemoteContext
void* context_handle = npu_context.get();
Memory Sharing Between Application and NPU Plugin#
The classes that implement the ov::RemoteTensor
interface are the wrappers for native API
memory handles, which can be obtained from them at any time.
To create a shared tensor from a native memory handle, use dedicated create_tensor
, create_l0_host_tensor
, or create_host_tensor
methods of the ov::RemoteContext
sub-classes.
ov::intel_npu::level_zero::LevelZero
has multiple overloads methods which enable wrapping pre-allocated native handles with the ov::RemoteTensor
object or requesting plugin to allocate specific device memory.
For more details, see the code snippets below:
void* shared_buffer = nullptr; // create the NT handle
auto remote_tensor = npu_context.create_tensor(in_element_type, in_shape, shared_buffer);
int32_t fd_heap = 0; // create the DMA-BUF System Heap file descriptor
auto remote_tensor = npu_context.create_tensor(in_element_type, in_shape, fd_heap);
auto remote_tensor = npu_context.create_l0_host_tensor(in_element_type, in_shape);
// Extract raw level zero pointer from remote tensor
void* level_zero_ptr = remote_tensor.get();
auto tensor = npu_context.create_host_tensor(in_element_type, in_shape);
// Extract raw level zero pointer from remote tensor
void* level_zero_ptr = tensor.data();
Limitations#
Allocation of the NT handle or DMA-BUF System Heap file descriptor is done manually.
Low-Level Methods for RemoteContext and RemoteTensor Creation#
The high-level wrappers mentioned above bring a direct dependency on native APIs to your program.
If you want to avoid the dependency, you still can directly use the ov::Core::create_context()
,
ov::RemoteContext::create_tensor()
, and ov::RemoteContext::get_params()
methods.
On this level, native handles are re-interpreted as void pointers and all arguments are passed
using ov::AnyMap
containers that are filled with the std::string, ov::Any
pairs.
Two types of map entries are possible: a descriptor and a container.
The descriptor sets the expected structure and possible parameter values of the map.
For possible low-level properties and their description, refer to the header file: remote_properties.hpp.