Live Speech Recognition Demo

This demo provides a GUI interface for automatic speech recognition using selected inference engine, OpenVINO™ Feature Extraction Library and OpenVINO™ Decoder Library.

How It Works

The application can transcribe audio from a WAV file and/or audio device. It supports recognition of two audio sources in parallel, for example audio coming from your microphone and audio coming from your PC (a.k.a. loopback). That enables use cases like audio conference or transcribing audio from online video stream. Among other things, user can select which inference engine should be used for the recognition, set batch size and control volume.

The software stack used by the demo is as follows:

sw_components.png

Running

The application main window looks like this:

live_speech_recognition_demo_annotated.jpg

Transcribing speech from WAV file

Click the Select File (9) button and navigate to the audio file using the file selection window dialog. Ensure the selected audio format is 16 kHz, 16 bit, 1 channel stored as WAV.

Alternatively you can use the audio file that is already selected upon launching the app.

Click the Recognize (10) button.

Transcription will appear in the Source 1 box.

Transcribing speech from audio/video playback (loopback)

Select the proper audio output device for capture (3).

Click the Recognize button (5) and play your video/other multimedia.

Transcription will appear in the Source 1 box.

NOTE: Loopback on Linux may need manual settings in PulseAudio Control or via config file

Transcribing speech captured with microphone

Select microphone (6).

Click the Recognize button (8) and start speaking.

Transcription will appear in the Source 2 box.

Transcribing speech from audio output and microphone at the same time (audio conference)

Select audio output device (3).

Select microphone (6).

Click both Recognize buttons: (5) and (8) then start speaking.

Transcriptions will appear in both Source 1 and Source 2 boxes.

NOTE: Loopback on Linux may need manual settings in PulseAudio Control or via config file.

Changing the speech recognition model

Select the desired configuration from the dropdown list (1).

To reset the application to default configuration (the one that the application started with), click on (2).

Controlling volume

Audio volume for each stream can be controlled with (4) and (7) sliders. Current audio levels of each stream is shown in the bar on the same row as its source selector.

Selecting inference engine

Inference engine and batch size can be selected with (11) and (12).

Demo Output

The resulting transcription for each audio source is presented in the application in real time.