smartlab-sequence-modelling-0002#
Use Case and High-Level Description#
This is an online action segmentation network for 13 classes trained on Intel dataset. It is an online version of MSTCN++. The difference between online MSTCN++ and MSTCN++ is that the former accept stream video as input while the latter assume the whole video is given.
For the original MSTCN++ model details see paper
Specification#
Metric |
Value |
---|---|
GOPs |
0.048915 |
MParams |
1.018179 |
Source framework |
PyTorch* |
Accuracy#
Accuracy | noise/background | remove_support_sleeve | adjust_rider | adjust_nut | adjust_balancing | open_box | close_box | choose_weight | put_left | put_right | take_left | take_right | install support_sleeve | mean | mPR (P+R)/2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
frame-level | precision | 0.44 | 0.68 | 0.82 | 0.56 | 0.7 | 0.74 | 0.79 | 0.63 | 0.59 | 0.66 | 0.74 | 0.82 | 0.91 | 0.7 | 0.68 |
recall | 0.63 | 0.94 | 0.88 | 0.07 | 0.64 | 0.91 | 0.62 | 0.54 | 0.61 | 0.65 | 0.67 | 0.51 | 0.95 | 0.66 |
Notice: In the accuracy report, feature extraction network is mobilenet-v3(smartlab-sequence-modelling-0001), you can get this model from omz_models_model_smartlab_sequence_modelling_0001.md
. Train and test dataset are inernal.
Inputs#
The inputs to the network are feature vectors at each video frame, which should be the combination of two views(top view and side view) output of feature extraction network, for example smartlab-sequence-modelling-0001, and feature outputs of the previous frame.
You can check the smartlab-sequence-modelling-0001 and smartlab-sequence-modelling-0002 usage in demos/smartlab_demo
Input feature, name:
input
, shape:1, 1152, 24
, format:B, W, H
, where:B
- batch sizeW
- feature map widthH
- feature map height
History feature 1, name:
fhis_in_0
, shape:12, 64, 2048
, format:C, H', W
,History feature 2, name:
fhis_in_1
, shape:11, 64, 2048
, format:C, H', W
,History feature 3, name:
fhis_in_2
, shape:11, 64, 2048
, format:C, H', W
,History feature 4, name:
fhis_in_3
, shape:11, 64, 2048
, format:C, H', W
, where:C
- the channel number of feature vectorH
- feature map heightW
- feature map width
Outputs#
The outputs also include two parts: predictions and four feature outputs. Predictions is the action classification and prediction results. Four Feature maps are the model layer features in past frames.
Prediction, name:
output
, shape:4, 1, 56, 24
, format:C, B, H, W
,C
- the channel number of feature vectorB
- batch sizeH
- feature map heightW
- feature map width After post-process with argmax() function, the prediction result can be used to decide the action type of the current frame.
History feature 1, name:
fhis_out_0
, shape:12, 64, 2048
, format:C, H, W
,History feature 2, name:
fhis_out_1
, shape:11, 64, 2048
, format:C, H, W
,History feature 3, name:
fhis_out_2
, shape:11, 64, 2048
, format:C, H, W
,History feature 4, name:
fhis_out_3
, shape:11, 64, 2048
, format:C, H, W
, where:C
- the channel number of feature vectorH
- feature map heightW
- feature map width
Legal Information#
[*] Other names and brands may be claimed as the property of others.