Remote Tensor API of NPU Plugin#

The NPU plugin implementation of the ov::RemoteContext and ov::RemoteTensor interface assists NPU pipeline developers who need memory sharing with existing native APIs (for example, OpenCL, Vulkan, DirectX 12) by exporting an NT handle on Windows, or DMA-BUF System Heap on Linux and passing that pointer as the shared_buffer member to the remote_tensor(..., shared_buffer) create function. They allow you to avoid any memory copy overhead when plugging OpenVINO™ inference into an existing NPU pipeline.

Supported scenario by the Remote Tensor API:

The NPU plugin context and memory objects can be constructed from low-level device, display, or memory handles and used to create the OpenVINO™ ov::CompiledModel or ov::Tensor objects.

Class and function declarations for the API are defined in the following file: src/inference/include/openvino/runtime/intel_npu/level_zero/level_zero.hpp

The most common way to enable the interaction of your application with the Remote Tensor API is to use user-side utility classes and functions that consume or produce native handles directly.

Low-Level Methods for RemoteContext and RemoteTensor Creation#

The high-level wrappers mentioned above bring a direct dependency on native APIs to your program. If you want to avoid the dependency, you still can directly use the ov::Core::create_context(), ov::RemoteContext::create_tensor(), and ov::RemoteContext::get_params() methods. On this level, native handles are re-interpreted as void pointers and all arguments are passed using ov::AnyMap containers that are filled with the std::string, ov::Any pairs. Two types of map entries are possible: a descriptor and a container. The descriptor sets the expected structure and possible parameter values of the map.

For possible low-level properties and their description, refer to the header file: remote_properties.hpp.

Remote Tensor API of NPU Plugin#

Context Sharing Between Application and NPU Plugin#

Getting RemoteContext from the Plugin#

Memory Sharing Between Application and NPU Plugin#

Limitations#

Low-Level Methods for RemoteContext and RemoteTensor Creation#

Additional Resources#