Face Detection And Classification Python Sample

This sample demonstrates how construct and control GStreamer pipeline from Python application, and how to access metadata generated by inference elements and attached to image buffer.

How It Works

The sample utilizes GStreamer function gst_parse_launch to construct the pipeline from string representation. Then callback function is set on source pin of gvawatermark element in the pipeline.

The callback is invoked on every frame, it loops through inference metadata attached to the frame, converts raw tensor data into text labels, and visualizes the label around detected objects.

Note that this sample doesn’t contain .json files with post-processing rules as post-processing of classification results performed by sample itself (inside callback function), not by gvaclassify element.


The sample uses by default the following pre-trained models from OpenVINO™ Toolkit Open Model Zoo

  • face-detection-adas-0001 is primary detection network for finding faces

  • age-gender-recognition-retail-0013 age and gender estimation on detected faces

  • emotions-recognition-retail-0003 emotion estimation on detected faces

  • facial-landmarks-35-adas-0002-0009 generates facial landmark points

  • head-pose-estimation-adas-0001 estimates head pose


Before running samples (including this one), run script download_models.sh once (the script located in samples top folder) to download all models required for this and other samples.


./draw_face_attributes.sh [INPUT_VIDEO]

If no input parameters specified, the sample by default streams video example from HTTPS link (utilizing urisourcebin element) so requires internet connection. The command-line parameter INPUT_VIDEO allows to change input video and supports

  • local video file

  • web camera device (ex. /dev/video0)

  • RTSP camera (URL starting with rtsp://) or other streaming source (ex URL starting with `http:// <http://>`__)

Sample Output

The sample

  • prints GSreamer pipeline string as passed to function gst_parse_launch

  • starts the pipeline and visualizes video with bounding boxes around detected faces, facial landmarks points, head pose, and text with classification results (age/gender, emotion) for each detected face