This sample demonstrates how construct and control GStreamer pipeline from Python application, and how to access metadata generated by inference elements and attached to image buffer.

How It Works

The sample utilizes GStreamer function gst_parse_launch to construct the pipeline from string representation. Then callback function is set on source pin of gvawatermark element in the pipeline.

The callback is invoked on every frame, it loops through inference metadata attached to the frame, converts raw tensor data into text labels, and visualizes the label around detected objects.

Note that this sample doesn't contain .json files with post-processing rules as post-processing of classification results performed by sample itself (inside callback function), not by gvaclassify element.

Models

The sample uses by default the following pre-trained models from OpenVINO™ Toolkit Open Model Zoo

face-detection-adas-0001 is primary detection network for finding faces
age-gender-recognition-retail-0013 age and gender estimation on detected faces
emotions-recognition-retail-0003 emotion estimation on detected faces
facial-landmarks-35-adas-0002-0009 generates facial landmark points
head-pose-estimation-adas-0001 estimates head pose

NOTE: Before running samples (including this one), run script download_models.sh once (the script located in samples top folder) to download all models required for this and other samples.

Running

./draw_face_attributes.sh [INPUT_VIDEO]

If no input parameters specified, the sample by default streams video example from HTTPS link (utilizing urisourcebin element) so requires internet connection. The command-line parameter INPUT_VIDEO allows to change input video and supports

local video file
web camera device (ex. /dev/video0)
RTSP camera (URL starting with rtsp://) or other streaming source (ex URL starting with http://)

Sample Output

The sample

prints GSreamer pipeline string as passed to function gst_parse_launch
starts the pipeline and visualizes video with bouding boxes around detected faces, facial landmarks points, head pose, and text with classification results (age/gender, emotion) for each detected face

How It Works

Models

Running

Sample Output

See also