OpenAI API embeddings endpoint#

API Reference#

OpenVINO Model Server includes now the embeddings endpoint using OpenAI API. Please see the OpenAI API Reference for more information on the API. The endpoint is exposed via a path:

http://server_name:port/v3/embeddings

Example request#

curl http://localhost/v3/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gte-large",
    "input": ["This is a test"],
    "encoding_format": "float"
  }'

Example response#

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.03440694510936737,
        -0.02553200162947178,
        -0.010130723007023335,
        -0.013917984440922737,
...
        0.02722850814461708,
        -0.017527244985103607,
        -0.0053995149210095406
      ],
      "index": 0
    }
  ]
}

Request#

Generic#

Param

OpenVINO Model Server

OpenAI /completions API

Type

Description

model

string (required)

Name of the model to use. Name assigned to a MediaPipe graph configured to schedule generation using desired embedding model.

input

string/list of strings (required)

Input text to embed, encoded as a string or a list of strings

encoding_format

float or base64 (default: float)

The format to return the embeddings in

Unsupported params from OpenAI service:#

  • user

  • dimensions

Response#

Param

OpenVINO Model Server

OpenAI /completions API

Type

Description

data

array

A list of responses for each string

data.embedding

array of float or base64 string

Vector of embeddings for a string.

data.index

integer

Response index

model

string

Model name

usage

dictionary

Info about assessed tokens

Error handling#

Endpoint can raise an error related to incorrect request in the following conditions:

  • Incorrect format of any of the fields based on the schema

  • Any tokenized input text exceeds the maximum length of the model context. Make sure input documents are chunked to fit the model

  • The number of input documents exceeds allowed configured value - default 500

References#

End to end demo with embeddings endpoint

Code snippets