Model Server Features

Serving Pipelines of Models

Connect multiple models in a pipeline and reduce data transfer overhead with Directed Acyclic Graph (DAG) Scheduler. Implement model inference and data transformations using a custom node C/C++ dynamic library.

Learn more

Processing Raw Data

Send data in JPEG or PNG formats to reduce traffic and offload data pre-processing to the server.

Learn more

Model Versioning Policies

The model repository structure enables adding or deleting numerical version directories and the server will automatically adjust which models are served.

Control which model versions are served by setting the model version policy to serve all models, a specific model or set of models or just the latest version of the model (default setting).

Learn more

Model Reshaping

Change the batch size, shape and layout of the model at runtime to achieve high throughput and low latency.

Learn more

Modify Model Configuration at Runtime

OpenVINO Model Server regularly checks for changes to the configuration file and applies them during runtime. This means that you can change model configurations (for example, change the device where a model is served), add a new model or completely remove one that is no longer needed. These changes will be applied without any disruption to the service.

Learn more

Working with Stateful Models

Serve models that operate on sequences of data and maintain their state between inference requests.

Learn more

Metrics

Use the metrics endpoint compatible with the Prometheus to access performance and utilization statistics.

Learn more

Enabling Dynamic Inputs

Configure served models to accept data with variable batch sizes and input shapes.

Learn more

Model Server C API

Use in process inference via model server to leverage the model management and model pipelines functionality of OpenVINO Model Server within an application. This allows to reuse existing OVMS functionality to execute inference locally without network overhead.

Learn more

Advanced Features

Use CPU Extensions, model cache feature or a custom model loader.

Learn more