Tree-Structured Parzen Estimator (TPE) algorithm is designed to optimize quantization hyperparameters to find quantization configuration that achieve an expected accuracy target and provide best possible latency improvement. TPE is an iterative process that uses history of evaluated hyperparameters to create probabilistic model, which is used to suggest next set of hyperparameters to evaluate.
Generally, the algorithm consists of the following steps:
For more information about TPE see [1].
NOTE: TPE requires many iterations to converge to an optimal solution, and it is recommended to run it for at least 200 iterations. Because every iteration requires evaluation of a generated model, which means accuracy measurements on a dataset and latency measurements using benchmark, this process may take from 24 hours up to few days to complete, depending on a model. Due to this, even though the TPE supports all OpenVINO™-supported models, it is being continuously validated only on a subset of models:
- SSD MobileNet V1 COCO
- Mobilenet V2 1.0 224
- Faster R-CNN ResNet 50 COCO
- Faster R-CNN Inception V2 COCO
- YOLOv3 TF Full COCO
TPE parameters can be divided into two groups: mandatory and optional.
"max_trials"
- maximum number of trails"trials_load_method"
- specifies whether to start from scratch or reuse previous results. It should be used in following manner:"cold_start"
- start trials from beginning. Logs from previous execution are removed"warm_start"
- continue execution using logs from previous execution up to the limit set by "max_trials"
(may be larger than in previous execution). Quantization algorithms' parameters impacting parameters metadata creation are ignored, because search space is retrieved from logs. May be used either after "cold_start"
or "fine_tune"
. If no previous logs exist then it behaves like "cold_start"
"fine_tune"
- start new trials with new search space derived from best result achieved since last "cold_start"
. Quantization algorithms are responsible for modifying parameter metadata to accommodate parameters used to get best result (for more details about parameter metadata generation see quantization algorithms). If no previous logs exist then it behaves like "cold_start"
"eval"
- load best result to get model"accuracy_loss"
- maximum acceptable relative accuracy loss in percentage"latency_reduce"
- target latency improvement versus original model"accuracy_weight"
and "latency_weight"
- accuracy and latency weights used in loss function. These two parameters are intended to be set to 1.0, because accuracy and latency components in the loss function are designed to be balanced equally, so that the algorithm is able to achieve an expected accuracy target and provide best possible latency improvement. Changing "accuracy_weight"
, which is left open for experimentation, is discouraged, but it is recommended to change "latency_weight"
to 0 for configurations that do not change latency result, for example, when tuning parameters that only change numeric values of the parameters, such as quantization ranges, and do not change graph structure or data types"benchmark"
- latency measurement benchmark configuration. For details of configuration options see Benchmark C++ Tool"max_minutes"
- trials time limit. When it expires, the last trial is completed and the best result is returned"stop_on_target"
- flag to stop TPE trials when accuracy_loss and latency_reduce targets are reached. If false or not specified TPE will continue until max_trials or max_minutes is reached even if targets are reached earlier"eval_subset_size"
- subset of test data used to evaluate hyperparameters. The whole dataset is used if no parameter specified."metrics"
- an optional list of reference metrics values. If not specified, all metrics will be calculated from the original model. It consists of tuples with the following parameters:"name"
- name of the metric to optimize"baseline_value"
- baseline metric value of the original modelBelow is a fragment of the configuration file that shows overall structure of parameters for this algorithm.
Domain for hyperparameter search space can be vast. Searching best result is time-consuming process. The current implementation allow to use multiple machines to work together. For more information go to Multi-node description.
TPE does not model interactions between hyperparameters.
[1] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for Hyper-Parameter Optimization,” in Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2011, pp. 2546–2554.