Deep Learning accuracy validation framework#
The Accuracy Checker is an extensible, flexible and configurable Deep Learning accuracy validation framework. The tool has a modular structure and allows to reproduce validation pipeline and collect aggregated quality indicators for popular datasets both for networks in source frameworks and in the OpenVINO™ supported formats.
Installation#
Prerequisites#
Install prerequisites first:
1. Python#
Accuracy Checker uses Python 3. Install it first:
sudo apt-get install python3 python3-dev python3-setuptools python3-pip
Python* setuptools and Python* package manager (pip) install packages into system directory by default. Installation of Accuracy Checker is tested only via virtual environment.
Install the virtual environment:
python3 -m pip install virtualenv
python3 -m virtualenv -p `which python3` <directory_for_environment>
Activate the virtual environment:
source <directory_for_environment>/bin/activate
Virtual environment can be deactivated using the following command:
deactivate
2. Frameworks#
The next step is installing backend frameworks for Accuracy Checker.
To evaluate some models, you need to install the required frameworks. Accuracy Checker supports the following frameworks:
You can use any of them or several at a time. For correct work, Accuracy Checker requires at least one. You can postpone installation of other frameworks and install them when they will be necessary.
Install Accuracy Checker#
If all prerequisite are installed, then you are ready to install Accuracy Checker:
python3 -m pip install .
Accuracy Checker is a modular tool and have some task-specific dependencies, all specific required modules can be found in requirements-extra.in
file.
Standard installation procedure includes only basic part, in order to obtain extra modules you can execute following command:
python3 -m pip install .[extra]
Installation Troubleshooting#
When previous version of the tool is already installed in the environment, in some cases, it can broke the new installation. If you get a directory/file not found error, try manually removing the previous tool version from your environment or install the tool using following command in Accuracy Checker directory instead of setup.py install:
python3 -m pip install --upgrade --force-reinstall .
If
accuracy_check
command failed with following error:
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
try to uninstall opencv-python
and install opencv-python-headless
package. More detils about the error and approaches how to fix can be found here
Running the Tool inside IDE for Development Purposes#
Accuracy Checker tool has an entry point for running in CLI, however, the majority of popular code editors or integrated development environments (IDEs) expect scripts as the starting point of application.
Sometimes it can be useful to run the tool as a script for debugging or enabling new models.
To use Accuracy Checker inside the IDE, you need to create a script in accuracy_checker root directory, for example, <open_model_zoo>/tools/accuracy_checker/main.py
, with the following code:
from accuracy_checker.main import main
if __name__ == '__main__':
main()
Now, you can use this script for running Accuracy Checker in IDE.
Usage#
You may test your installation and get familiar with Accuracy Checker by running a sample.
Each Open Model Zoo model can be evaluated using a configuration file. To learn more, refer to How to use predefined configuration files guide.
Once you installed accuracy checker, you can evaluate your configurations using:
accuracy_check -c path/to/configuration_file -m /path/to/models -s /path/to/source/data -a /path/to/annotation
Use -h, --help
to get the full list of command-line options. Some arguments are described below:
-c, --config
path to configuration file.-m, --models
specifies directory in which models and weights declared in config file will be searched. You can also specify space-separated list of directories if you want to run the same configuration several times with models located in different directories or if you have the pipeline with several models.-s, --source
specifies directory in which input images will be searched.-a, --annotations
specifies directory in which annotation and meta files will be searched.-d, --definitions
path to the global configuration file.-e, --extensions
directory with InferenceEngine extensions.-C, '--converted_models
directory to store Model Optimizer converted models (used for DLSDK launcher only).-tf, --target_framework
framework for infer.-td, --target_devices
devices for infer. You can specify several devices using space as a delimiter.--async_mode
allows run the tool in async mode if launcher supports it.--num_requests
number requests for async execution. Allows override provided in config info. Default isAUTO
--model_attributes
directory with additional models attributes.--subsample_size
dataset subsample size.--shuffle
allows shuffle annotation during creation a subset if subsample_size argument is provided. Default isTrue
.--intermediate_metrics_results
enables intermediate metrics results printing. Default isFalse
--metrics_interval
number of iterations for updated metrics result printing if--intermediate_metrics_results
flag enabled. Default is 1000.--sub_evaluation
enables evaluation of subset of dataset with predefinedsubset_metrics
. Default isFalse
. See Sub evaluation with subset metrics
You are also able to replace some command-line arguments with the environment variables for path prefixing. Supported list of variables includes:
DEFINITIONS_FILE
- equivalent of-d
,-definitions
.DATA_DIR
- equivalent of-s
,--source
.MODELS_DIR
- equivalent of-m
,--models
.EXTENSIONS
- equivalent of-e
,--extensions
.ANNOTATIONS_DIR
- equivalent of-a
,--annotations
.MODEL_ATTRIBUTES_DIR
- equivalent of--model_attributes
.
Configuration#
There is a config file, which declares the validation process.
Every validated model has to have its entry in the models
list
with distinct name
and other properties described below.
There is also a definitions file, which declares global options shared across all models. Config file has priority over definitions file.
Example:
models:
- name: densenet-121-tf
launchers:
- framework: openvino
adapter: classification
datasets:
- name: imagenet_1000_classes
preprocessing:
- type: resize
size: 256
- type: crop
size: 224
metrics:
- name: accuracy@top1
type: accuracy
top_k: 1
reference: 0.7446
- name: accuracy@top5
type: accuracy
top_k: 5
reference: 0.9213
Optionally you can use global configuration. It can be useful for avoiding duplication if you have several models which should be run on the same dataset.
Example of global definitions file can be found at <omz_dir>/data/dataset_definitions.yml
. Global definitions will be merged with evaluation config in the runtime by dataset name.
Parameters of global configuration can be overwritten by local config (e.g. if in definitions specified resize with destination size 224 and in the local config used resize with size 227, the value in config - 227 will be used as resize parameter)
You can use field global_definitions
for specifying path to global definitions directly in the model config or via command line arguments (-d
, --definitions
).
Launchers#
Launcher is a description of how your model should be executed.
Each launcher configuration starts with setting framework
name.
Currently caffe, dlsdk, mxnet, tf, tf2, tf_lite, opencv, onnx_runtime, pytorch, paddlepaddle supported.
Launcher description can have differences.
Datasets#
Dataset entry describes the data on which model should be evaluated, all required preprocessing and postprocessing/filtering steps, and metrics that will be used for evaluation.
If your dataset data is a well-known competition problem (COCO, Pascal VOC, and others) and/or can be potentially reused for other models
it is reasonable to declare it in some global configuration file (<omz_dir>/data/dataset_definitions.yml
). This way in your local configuration file you can provide only
name
and all required steps will be picked from global one. To pass path to this global configuration use --definition
argument of CLI.
If you want to evaluate models using prepared config files and well-known datasets, you need to organize folders with validation datasets in a certain way. Find more detailed information about dataset preparation in Dataset Preparation Guide.
Each dataset must have:
name
- unique identifier of your model/topology.data_source
- path to directory where input data is stored.metrics
- list of metrics that should be computed.
And optionally:
preprocessing
- list of preprocessing steps applied to input data. If you want calculated metrics to match reported, you must reproduce preprocessing from canonical paper of your topology or ask topology author about required steps.postprocessing
- list of postprocessing steps.reader
- approach for data reading. Default reader isopencv_imread
.segmentation_masks_source
- path to directory where gt masks for semantic segmentation task stored.
Also it must contain data related to annotation. You can convert annotation in-place using:
annotation_conversion
- parameters for annotation conversion
or use existing annotation file and dataset meta:
annotation
- path to annotation file, you must convert annotation to representation of dataset problem first, you may choose one of the converters from annotation-converters if there is already converter for your dataset or write your own.dataset_meta
- path to metadata file (generated by converter). More detailed information about annotation conversion you can find in Annotation Conversion Guide.subset_metrics
- list of dataset subsets with unique size and metrics, computed if--sub_evaluation
flag enabled. Ifsubsample_size
is defined then only subset with matchingsubset_size
is evaluated, otherwise by default the first subset is validated. See Sub evaluation with subset metrics.subset_size
- size of dataset subset to evaluate, its value is compared withsubsample_size
to select desired subset for evaluation.metrics
- list of metrics specific for defined subset size
Example of dataset definition:
- name: dataset_name
annotation: annotation.pickle
data_source: images_folder
preprocessing:
- type: resize
dst_width: 256
dst_height: 256
- type: normalization
mean: imagenet
- type: crop
dst_width: 227
dst_height: 227
metrics:
- type: accuracy
Preprocessing, Metrics, Postprocessing#
Each entry of preprocessing, metrics, postprocessing must have a type
field
with other options specific to the type. If you do not provide any other option, it
will be picked from the definitions file.
You can use the following instructions:
You may optionally provide reference
field for metric, if you want the calculated metric
tested against a specific value (reported in canonical paper).
Some metrics support providing vector results, for example, mAP is able to return average precision for each detection class. You can change view mode for metric results using presenter
(for example, print_vector
, print_scalar
).
Example:
metrics:
- type: accuracy
top_k: 5
reference: 86.43
threshold: 0.005
Sub-evaluation with subset metrics#
You may optionally enable sub_evaluation
flag to quickly get results for subset of big dataset.
The subset_metrics
needs to provide subsets with different subset_size
and metrics
.
If subset_metrics
consist several entries, you may use subsample_size
value to select desired subset_size
, otherwise the first defined subset_size
will be used.
Note: Enabling sub_evaluation
flag has no effect when accuracy config has no subset_metrics
defined.
Example:
metrics:
- type: accuracy
top_k: 5
reference: 86.43
subset_metrics:
- subset_size: "10%"
metrics:
- type: accuracy
top_k: 5
reference: 86.13
- subset_size: "20%"
metrics:
- type: accuracy
top_k: 5
reference: 86.23
top_k: 1
reference: 76.42
Testing New Models#
Typical workflow for testing a new model includes:
Convert annotation of your dataset. Use one of the converters from annotation-converters, or write your own if there is no converter for your dataset. You can find detailed instruction how to use converters in Annotation Conversion Guide.
Choose one of adapters or write your own. Adapter converts raw output produced by framework to high level problem specific representation (e.g. ClassificationPrediction, DetectionPrediction, etc).
Reproduce preprocessing, metrics and postprocessing from canonical paper.
Create entry in config file and execute.
Customizing Evaluation#
Standard Accuracy Checker validation pipeline: Annotation Reading -> Data Reading -> Preprocessing -> Inference -> Postprocessing -> Metrics. In some cases, this validation pipeline can be unsuitable, for example, when you have a sequence of models. You can customize validation pipeline using your own evaluator. Find more details about custom evaluations in the related section.