OpenAI API embeddings endpoint#

API Reference#

OpenVINO Model Server includes now the embeddings endpoint using OpenAI API. Please see the OpenAI API Reference for more information on the API. The endpoint is exposed via a path:

http://server_name:port/v3/embeddings

Example request#

curl http://localhost/v3/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gte-large",
    "input": ["This is a test"],
    "encoding_format": "float"
  }'

Example response#

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.03440694510936737,
        -0.02553200162947178,
        -0.010130723007023335,
        -0.013917984440922737,
...
        0.02722850814461708,
        -0.017527244985103607,
        -0.0053995149210095406
      ],
      "index": 0
    }
  ]
}

Request#

Generic#

Param	OpenVINO Model Server	OpenAI /completions API	Type	Description
model	✅	✅	string (required)	Name of the model to use. Name assigned to a MediaPipe graph configured to schedule generation using desired embedding model.
input	✅	✅	string/list of strings (required)	Input text to embed, encoded as a string or a list of strings
encoding_format	✅	✅	float or base64 (default: `float`)	The format to return the embeddings in

Unsupported params from OpenAI service:#

user
dimensions

Response#

Param	OpenVINO Model Server	OpenAI /completions API	Type	Description
data	✅	✅	array	A list of responses for each string
data.embedding	✅	✅	array of float or base64 string	Vector of embeddings for a string.
data.index	✅	✅	integer	Response index
model	✅	✅	string	Model name
usage	✅	✅	dictionary	Info about assessed tokens

Error handling#

Endpoint can raise an error related to incorrect request in the following conditions:

Incorrect format of any of the fields based on the schema
Any tokenized input text exceeds the maximum length of the model context. Make sure input documents are chunked to fit the model
The number of input documents exceeds allowed configured value - default 500

References#

End to end demo with embeddings endpoint

Code snippets