If your question is not covered below, use the OpenVINO™ Community Forum page, where you can participate freely.
No, the POT is not available on any of the opensource platforms. It is distributed as a part of Intel® OpenVINO™ only.
In general, you should have a dataset. The dataset should be annotated if you want to validate the accuracy. If your dataset is not annotated, you can still quantize the model in the Simplified mode but you will not be able to measure the accuracy. See Post-Training Optimization Best Practices for more details. You can also use POT API to integrate the post-training quantization into the custom inference pipeline.
The POT accepts models in the OpenVINO™ Intermediate Representation (IR) format only. For that you need to convert your model to the IR format using Model Optimizer.
To create the Accuracy Checker configuration file, refer to Accuracy Checker documentation and try to find the configuration file for your model among the ones available in the Accuracy Checker examples. An alternative way is to quantize the model in the Simplified mode but you will not be able to measure the accuracy. See Post-Training Optimization Best Practices for more details. Also, you can use POT API to integrate the post-training quantization into your pipeline without the Accuracy Checker.
The tradeoff is between the accuracy drop and performance. When a model is in low precision, it is usually performed
compared to the same model in full precision but the accuracy might be worse. You can find some benchmarking results in INT8 vs FP32 Comparison on Select Networks and Platforms. The other benefit of having a model in low precision is its smaller size.
First of all, you should validate the POT compression pipeline you are running, which can be done with the following steps:
preset
and stat_subset_size
). Make sure you get the undesirable accuracy drop/performance gain in this case.Finally, if you have done the steps above and the problem persists, you could try to compress your model using the Neural Network Compression Framework (NNCF). Note that NNCF usage requires you to have a PyTorch-based training pipeline of your model in order to perform compression-aware fine-tuning. See Low Precision Optimization Guide for more details.
These issues happen due to insufficient available amount of memory for statistics collection during the quantization process of a huge model or due to a very high resolution of input images in the quantization dataset. If you do not have a possibility to increase your RAM size, one of the following options can help:
eval_requests_number
and stat_requests_number
parameters to 1. In that case the POT will limit the number of infer requests by 1 and use less memory. Note that such change might increase time required for quantization.use_fast_bias
parameter to false
. In that case the POT will switch from the FastBiasCorrection algorithm to the full BiasCorrection algorithm which is usually more accurate and takes more time but requires less memory. See Post-Training Optimization Best Practices for more details.It can happen due to the following reasons:
Quantization time depends on multiple factors such as the size of the model and the dataset. It also depends on the algorithm: the DefaultQuantization algorithm takes less time than the AccuracyAwareQuantization algorithm. The Tree-Structured Parzen Estimator (TPE) algorithm might take even more time. The following configuration parameters also impact the quantization time duration (see details in Post-Training Optimization Best Practices):
use_fast_bias
: when set to false
, it increases the quantization timestat_subset_size
: the higher the value of this parameter, the more time will be required for the quantizationtune_hyperparams
: if set to true
when the AccuracyAwareQuantization algorithm is used, it increases the quantization timestat_requests_number
: the lower number, the more time might be required for the quantizationeval_requests_number
: the lower number, the more time might be required for the quantization stat_requests_number
and eval_requests_number
increase memory consumption by POT.It happens when some needed library is not available in your environment. To avoid it, execute the following command:
where <INSTALL_DIR>
is the directory where the OpenVINO™ toolkit is installed.
This error is reported when you have an older python version than 3.5 in your environment. Upgrade your python version. Refer to more details about the prerequisites on the Post-Training Optimization Tool page.
It means that some required python module is not installed in your environment. To install it, run pip install some_module_name
.