Remote Tensor API of NPU Plugin#

The NPU plugin supports memory sharing between OpenVINO and native APIs such as OpenCL, Vulkan, or DirectX 12. It implements the ov::RemoteContext and ov::RemoteTensor interfaces, providing mechanisms for efficient memory sharing. On Windows, the plugin exports an NT handle; on Linux, it uses a DMA-BUF System Heap. You can share this memory by passing the pointer as the shared_buffer member to the remote_tensor(..., shared_buffer) create function. Another option is to share memory by mapping a file into memory. These methods help avoid memory copy overhead when plugging OpenVINO inference into an existing NPU pipeline.

Supported scenario by the Remote Tensor API:

  • The NPU plugin context and memory objects can be constructed from low-level device, display, or memory handles and used to create the OpenVINO™ ov::CompiledModel or ov::Tensor objects.

Class and function declarations for the API are defined in the following file: src/inference/include/openvino/runtime/intel_npu/level_zero/level_zero.hpp

The most common way to enable the interaction of your application with the Remote Tensor API is to use user-side utility classes and functions that consume or produce native handles directly.

Context Sharing Between Application and NPU Plugin#

NPU plugin classes that implement the ov::RemoteContext interface are responsible for context sharing. Obtaining a context object is the first step in sharing pipeline objects. The context object of the NPU plugin directly wraps Level Zero context, setting a scope for sharing the ov::RemoteTensor objects. The ov::RemoteContext object is retrieved from the NPU plugin.

Once you have obtained the context, you can use it to create the ov::RemoteTensor objects.

Getting RemoteContext from the Plugin#

To request the current default context of the plugin, use one of the following methods:

        auto npu_context = core.get_default_context("NPU").as<ov::intel_npu::level_zero::ZeroContext>();
        // Extract raw level zero context handle from RemoteContext
        void* context_handle = npu_context.get();
        auto npu_context = compiled_model.get_context().as<ov::intel_npu::level_zero::ZeroContext>();
        // Extract raw level zero context handle from RemoteContext
        void* context_handle = npu_context.get();

Memory Sharing Between Application and NPU Plugin#

The classes that implement the ov::RemoteTensor interface are the wrappers for native API memory handles, which can be obtained from them at any time.

To create a shared tensor from a native memory handle or a file, use dedicated create_tensor, create_l0_host_tensor, or create_host_tensor methods of the ov::RemoteContext sub-classes. ov::intel_npu::level_zero::LevelZero has multiple overloads methods which enable wrapping pre-allocated native handles with the ov::RemoteTensor object or requesting plugin to allocate specific device memory. For more details, see the code snippets below:

        ov::intel_npu::FileDescriptor file_descriptor{"file_name"};
        auto remote_tensor = npu_context.create_tensor(in_element_type, in_shape, file_descriptor);
        void* shared_buffer = nullptr;
        auto remote_tensor = npu_context.create_tensor(in_element_type, in_shape, shared_buffer);
        int32_t fd_heap = 0;  // create the DMA-BUF System Heap file descriptor
        auto remote_tensor = npu_context.create_tensor(in_element_type, in_shape, fd_heap);
        auto remote_tensor = npu_context.create_l0_host_tensor(in_element_type, in_shape);
        // Extract raw level zero pointer from remote tensor
        void* level_zero_ptr = remote_tensor.get();
        auto tensor = npu_context.create_host_tensor(in_element_type, in_shape);
        // Extract raw level zero pointer from remote tensor
        void* level_zero_ptr = tensor.data();

Limitations#

  • Allocation of the NT handle or DMA-BUF System Heap file descriptor is done manually.

Low-Level Methods for RemoteContext and RemoteTensor Creation#

The high-level wrappers mentioned above bring a direct dependency on native APIs to your program. If you want to avoid the dependency, you still can directly use the ov::Core::create_context(), ov::RemoteContext::create_tensor(), and ov::RemoteContext::get_params() methods. On this level, native handles are re-interpreted as void pointers and all arguments are passed using ov::AnyMap containers that are filled with the std::string, ov::Any pairs. Two types of map entries are possible: a descriptor and a container. The descriptor sets the expected structure and possible parameter values of the map.

For possible low-level properties and their description, refer to the header file: remote_properties.hpp.

Additional Resources#