smartlab-sequence-modelling-0001¶

Use Case and High-Level Description¶

This is an online action segmentation network for 16 classes trained on Intel dataset. It is an online version of MSTCN++. The difference between online MSTCN++ and MSTCN++ is that the former accept stream video as input while the latter assume the whole video is given.

For the original MSTCN++ model details see paper

Specification¶

Metric	Value
GOPs	0.048915
MParams	1.018179
Source framework	PyTorch*

Accuracy¶

Notice: In the accuracy report, feature extraction network is i3d-rgb, you can get this model from @ref omz_models_model_i3d_rgb_tf.

Inputs¶

The inputs to the network are feature vectors at each video frame, which should be the output of feature extraction network, such as i3d-rgb-tf and resnet-50-tf, and feature outputs of the previous frame.

You can check the i3d-rgb and smartlab-sequence-modelling-0001 usage in demos/smartlab_demo

Input feature, name: input, shape: 1, 2048, 24, format: B, W, H, where:
- B - batch size
- W - feature map width
- H - feature map height
History feature 1, name: fhis_in_0, shape: 12, 64, 2048, format: C, H, W ,
History feature 2, name: fhis_in_1 , shape: 11, 64, 2048 , format: C, H’, W ,
History feature 3, name: fhis_in_2 , shape: 11, 64, 2048 , format: C, H’, W ,
History feature 4, name: fhis_in_3 , shape: 11, 64, 2048 , format: C, H’, W`, where:
- C - the channel number of feature vector
- H - feature map height
- W - feature map width

Outputs¶

The outputs also include two parts: predictions and four feature outputs. Predictions is the action classification and prediction results. Four Feature maps are the model layer features in past frames.

Prediction, name: output, shape: 4, 1, 64, 24, format: C, B, H, W,
- C - the channel number of feature vector
- B - batch size
- H - feature map height
- W - feature map width After post-process with argmx() function, the prediction result can be used to decide the action type of the current frame.
History feature 1, name: fhis_out_0, shape: 12, 64, 2048, format: C, H, W,
History feature 2, name: fhis_out_1, shape: 11, 64, 2048, format: C, H, W,
History feature 3, name: fhis_out_2, shape: 11, 64, 2048, format: C, H, W,
History feature 4, name: fhis_out_3, shape: 11, 64, 2048, format: C, H, W, where:
- C - the channel number of feature vector
- H - feature map height
- W - feature map width

Legal Information¶

[*] Other names and brands may be claimed as the property of others.

Prev Next