NetVLAD is a CNN architecture which tackles the problem of large scale visual place recognition. The architecture uses VGG 16 as base network and NetVLAD - a new trainable generalized VLAD (Vector of Locally Aggregated Descriptors) layer. It is a place recognition model pretrained on the Pittsburgh 250k dataset.
For details see repository and paper.
Metric | Value |
---|---|
Type | Place recognition |
GFLOPs | 36.6374 |
MParams | 149.0021 |
Source framework | TensorFlow* |
Accuracy metrics are obtained on a smaller validation subset of Pittsburgh 250k dataset (Pitts30k) containing 10k database images in each set (train/test/validation). Images were resized to input size.
Metric | Value |
---|---|
localization_recall | 82.0321% |
Image, name - Placeholder
, shape - 1,200,300,3
, format is B,H,W,C
where:
B
- batch sizeC
- channelH
- heightW
- widthChannel order is RGB
.
Image, name - Placeholder
, shape - 1,3,200,300
, format is B,C,H,W
where:
B
- batch sizeC
- channelH
- heightW
- widthChannel order is BGR
.
Floating point embeddings, name - vgg16_netvlad_pca/l2_normalize_1
, shape - 1,4096
, output data format - B,C
, where:
B
- batch sizeC
- vector of 4096 floating points values, local image descriptorsFloating point embeddings, name - vgg16_netvlad_pca/l2_normalize_1
, shape - 1,4096
, output data format - B,C
, where:
B
- batch sizeC
- vector of 4096 floating points values, local image descriptorsYou can download models and if necessary convert them into Inference Engine format using the Model Downloader and other automation tools as shown in the examples below.
An example of using the Model Downloader:
An example of using the Model Converter:
The original model is distributed under MIT license: