Model Optimization - NNCF#
Model optimization means altering the model itself to improve its performance and reduce its size. It is an optional step, typically used only at the development stage, so that a pre-optimized model is used in the final AI application.
In OpenVINO, the default optimization tool is NNCF (Neural Network Compression Framework). It is a set of compression algorithms, organized as a Python package, that make your models smaller and faster. Note that NNCF is not part of the OpenVINO package, so it needs to be installed separately. It supports models in PyTorch, TensorFlow , ONNX, and OpenVINO IR formats, offering the following optimizations:
A common approach is to perform post-training quantization first, as it is the easiest option. If the result proves unsatisfactory, quantization-aware training will give you higher accuracy with the same level of performance boost. For the most performant product, adding filter pruning will further streamline the model.
To learn about the full scope of the framework, its installation, and technical details, visit both the NNCF repository and NNCF API documentation.
pip install nncf
conda install -c conda-forge nncf
For more installation details, see the page on NNCF Installation.
Full requirement listing is available in the NNCF GitHub Repository
Note that to optimize a model, you will need to install this model’s framework as well. Install NNCF in the same Python environment as the framework. For a list of recommended framework versions, see the framework compatibility table.
Note
Once optimized, models may be executed with the typical OpenVINO inference workflow, no additional changes to the inference code are required.
This is true for models optimized using NNCF, as well as those pre-optimized in their source frameworks, such as PyTorch, TensorFlow, and ONNX (in Q/DQ; Quantize/DeQuantize format). The latter may be easily converted to the OpenVINO Intermediate Representation format (IR) right away.
Hugging Face Optimum Intel offers OpenVINO integration with Hugging Face models and pipelines. NNCF serves as the compression backend within the Hugging Face Optimum Intel, integrating with the widely used transformers library to enhance model performance.