Remote Tensor API of NPU Plugin#

The NPU plugin implementation of the ov::RemoteContext and ov::RemoteTensor interface assists NPU pipeline developers who need memory sharing with existing native APIs (for example, OpenCL, Vulkan, DirectX 12) by exporting an NT handle on Windows, or DMA-BUF System Heap on Linux and passing that pointer as the shared_buffer member to the remote_tensor(..., shared_buffer) create function. They allow you to avoid any memory copy overhead when plugging OpenVINO™ inference into an existing NPU pipeline.

Supported scenario by the Remote Tensor API:

  • The NPU plugin context and memory objects can be constructed from low-level device, display, or memory handles and used to create the OpenVINO™ ov::CompiledModel or ov::Tensor objects.

Class and function declarations for the API are defined in the following file: src/inference/include/openvino/runtime/intel_npu/level_zero/level_zero.hpp

The most common way to enable the interaction of your application with the Remote Tensor API is to use user-side utility classes and functions that consume or produce native handles directly.

Context Sharing Between Application and NPU Plugin#

NPU plugin classes that implement the ov::RemoteContext interface are responsible for context sharing. Obtaining a context object is the first step in sharing pipeline objects. The context object of the NPU plugin directly wraps Level Zero context, setting a scope for sharing the ov::RemoteTensor objects. The ov::RemoteContext object is retrieved from the NPU plugin.

Once you have obtained the context, you can use it to create the ov::RemoteTensor objects.

Getting RemoteContext from the Plugin#

To request the current default context of the plugin, use one of the following methods:

        auto npu_context = core.get_default_context("NPU").as<ov::intel_npu::level_zero::ZeroContext>();
        // Extract raw level zero context handle from RemoteContext
        void* context_handle = npu_context.get();
        auto npu_context = compiled_model.get_context().as<ov::intel_npu::level_zero::ZeroContext>();
        // Extract raw level zero context handle from RemoteContext
        void* context_handle = npu_context.get();

Memory Sharing Between Application and NPU Plugin#

The classes that implement the ov::RemoteTensor interface are the wrappers for native API memory handles, which can be obtained from them at any time.

To create a shared tensor from a native memory handle, use dedicated create_tensor, create_l0_host_tensor, or create_host_tensor methods of the ov::RemoteContext sub-classes. ov::intel_npu::level_zero::LevelZero has multiple overloads methods which enable wrapping pre-allocated native handles with the ov::RemoteTensor object or requesting plugin to allocate specific device memory. For more details, see the code snippets below:

        void* shared_buffer = nullptr;  // create the NT handle
        auto remote_tensor = npu_context.create_tensor(in_element_type, in_shape, shared_buffer);
        int32_t fd_heap = 0;  // create the DMA-BUF System Heap file descriptor
        auto remote_tensor = npu_context.create_tensor(in_element_type, in_shape, fd_heap);
        auto remote_tensor = npu_context.create_l0_host_tensor(in_element_type, in_shape);
        // Extract raw level zero pointer from remote tensor
        void* level_zero_ptr = remote_tensor.get();
        auto tensor = npu_context.create_host_tensor(in_element_type, in_shape);
        // Extract raw level zero pointer from remote tensor
        void* level_zero_ptr = tensor.data();

Limitations#

  • Allocation of the NT handle or DMA-BUF System Heap file descriptor is done manually.

Low-Level Methods for RemoteContext and RemoteTensor Creation#

The high-level wrappers mentioned above bring a direct dependency on native APIs to your program. If you want to avoid the dependency, you still can directly use the ov::Core::create_context(), ov::RemoteContext::create_tensor(), and ov::RemoteContext::get_params() methods. On this level, native handles are re-interpreted as void pointers and all arguments are passed using ov::AnyMap containers that are filled with the std::string, ov::Any pairs. Two types of map entries are possible: a descriptor and a container. The descriptor sets the expected structure and possible parameter values of the map.

For possible low-level properties and their description, refer to the header file: remote_properties.hpp.

Additional Resources#