• OpenVINO Model Server provides a C++ implementation of the gRPC and RESTful API interfaces compatible with Tensorflow Serving.

  • In the backend, OpenVINO Model Server uses OpenVINO Runtime libraries from OpenVINO toolkit. This speeds up execution on the CPU and enables it on AI accelerators, like Neural Compute Stick 2, iGPU(Integrated Graphics Processing Unit), and HDDL.

  • API requests in gRPC code backbone are created based on TensorFlow Serving Core Framework with tuned implementation of request handling.

  • Services are designed via a set of C++ classes managing AI models in the Intermediate Representation format. OpenVINO Runtime executes the model’s operations.


Figure 1: Docker Container (VM or Bare Metal Host)

  • The models used by OpenVINO Model Server need to be stored locally or hosted remotely by object storage services. Storage compatible with Google Cloud Storage (GCS), Amazon S3, or Azure Blob Storage is supported. For more details, refer to Preparing the Model Repository.

  • OpenVINO Model Server is suitable for landing in the Kubernetes environment. It can be also hosted on a bare metal server, virtual machine, or inside a docker container.

  • The only two exposed network interfaces are gRPC API :

    • TensorFlow Serving compatible API(./

    • KServe compatible API(./

    … and RESTful API :

    • TensorFlow Serving compatible API(./

    • KServe compatible API(./

    They do not include authorization, authentication, or data encryption. There is, however, a documented method for including NGINX reverse proxy with mTLS traffic termination.