OpenAI API text to speech endpoints#

API Reference#

OpenVINO Model Server includes now the audio/speech endpoint using OpenAI API. It is used to execute text to speech task with OpenVINO GenAI pipeline. Please see the OpenAI API Create Speech Reference for more information on the API. The endpoint is exposed via a path:

http://server_name:port/v3/audio/speech

Request body must be in JSON format, and the request must have Content-Type: application/json header.

Example request#

curl http://localhost:8000/v3/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/speecht5_tts",
    "input": "The quick brown fox jumped over the lazy dog.",
  }' \
  -o speech.wav

Example response#

speech.wav - audio file in wav format.

Request#

Param

OpenVINO Model Server

OpenAI /audio//speech API

Type

Description

model

string (required)

Name of the model to use. Name assigned to a MediaPipe graph configured to schedule generation using desired embedding model. Note: This can also be omitted to fall back to URI based routing. Read more on routing topic TODO

input

string (required)

The text to generate audio for.

voice

string (required)

The voice to use when generating the audio.

instructions

string

Control the voice of your generated audio with additional instructions.

response_format

string

The format to audio in.

speed

number

The speed of the generated audio.

stream_format

string

The format to stream the audio in.

Error handling#

Endpoint can raise an error related to incorrect request in the following conditions:

  • Incorrect format of any of the fields based on the schema

References#

End to end demo with speech generation endpoint

Code snippets

Speech Generation calculator configuration and limitations