OpenAI API speech to text endpoints#

API Reference#

OpenVINO Model Server includes now the audio/transcriptions and audio/translations endpoints using OpenAI API. It is used to execute speech to text task with OpenVINO GenAI pipeline. Please see the OpenAI API Transcription Reference and OpenAI API Translation Reference for more information on the API.

The are two endpoints exposed:

http://server_name:port/v3/audio/transcriptions http://server_name:port/v3/audio/translations

Request body must be in multipart/form-data format.

curl -X POST http://localhost:8000/v3/audio/transcriptions \
  -F "model=OpenVINO/whisper-large-v3-fp16-ov" \
  -F "file=@speech_english.wav"

curl -X POST http://localhost:8000/v3/audio/translations \
  -F "model=OpenVINO/whisper-large-v3-fp16-ov" \
  -F "file=@speech_spanish.wav"

{"text":"..."}

Param	OpenVINO Model Server	OpenAI /audio/transcriptions API	Type	Description
model	✅	✅	string (required)	Name of the model to use. Note: This can also be omitted to fall back to URI based routing. Read more on routing topic TODO
file	⚠️	✅	file (required)	The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. (⚠️Note: For now supported formats are mp3 and wav)
language	✅	✅	string	The language of the input audio in ISO-639-1. Providing language for multilanguage model may improve accuracy and performance.
chunking_strategy	❌	✅	“auto” or object	Controls how the audio is cut into chunks.
include	❌	✅	array	Additional information to include in the transcription response.
known_speaker_names	❌	✅	array	List of speaker names corresponding to the audio samples
known_speaker_references	❌	✅	array	Optional list of audio samples with known speaker references matching known_speaker_names
prompt	❌	✅	string	An optional text to guide the model’s style or continue a previous audio segment.
response_format	❌	✅	string	The format of the output.
stream	❌	✅	boolean	Generate the response in streaming mode.
temperature	❌	✅	number	The sampling temperature, between 0 and 1.
timestamp_granularities	❌	✅	array	The timestamp granularities to populate for this transcription.

Param	OpenVINO Model Server	OpenAI /audio/transcriptions API	Type	Description
model	✅	✅	string (required)	Name of the model to use. Note: This can also be omitted to fall back to URI based routing. Read more on routing topic TODO
file	⚠️	✅	file (required)	The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. (⚠️Note: For now supported formats are mp3 and wav)
prompt	❌	✅	string	An optional text to guide the model’s style or continue a previous audio segment.
response_format	❌	✅	string	The format of the output.
temperature	❌	✅	number	The sampling temperature, between 0 and 1.

Param	OpenVINO Model Server	OpenAI /audio/transcriptions API	Type	Description
text	✅	✅	string	The transcribed text.
logprobs	❌	✅	array	The log probabilities of the tokens in the transcription.
usage	❌	✅	object	Token usage statistics for the request.

Param	OpenVINO Model Server	OpenAI /audio/transcriptions API	Type	Description
text	✅	✅	string	The translated text.

Endpoint can raise an error related to incorrect request in the following conditions: