A trained model of ICNet for fast semantic segmentation, trained on the CamVid dataset from scratch using the TensorFlow* framework. The trained model has 60% sparsity (ratio of zeros within all the convolution kernel weights). For details about the original floating-point model, check out the ICNet for Real-Time Semantic Segmentation on High-Resolution Images.
The model input is a blob that consists of a single image of
1, 3, 720, 960 in the
BGR order. The pixel values are integers in the [0, 255] range.
The model output for
icnet-camvid-ava-sparse-60-0001 is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset:
The quality metrics were calculated on the CamVid validation dataset. The
unlabeled class had been ignored during metrics calculation.
TP- number of true positive pixels for given class
FN- number of false negative pixels for given class
FP- number of false positive pixels for given class
Image, shape -
1, 3, 720, 960, format is
B, C, H, W, where:
B- batch size
Channel order is
Semantic segmentation class prediction map, shape -
1, 720, 960, output data format is
B, H, W, where:
B- batch size
H- horizontal coordinate of the input pixel
W- vertical coordinate of the input pixel
Output contains the class prediction result of each pixel.
[*] Other names and brands may be claimed as the property of others.