Neural Network Compression Framework

Neural Network Compression Framework (NNCF) is a set of advanced algorithms for optimizing Deep Neural Networks (DNN). It provides in-training optimization capabilities, which means that fine-tuning or even re-training the original model is necessary, and supports several optimization algorithms:

Compression algorithm

PyTorch

TensorFlow 2.x

8- bit quantization

Supported

Supported

Filter pruning

Supported

Supported

Sparsity

Supported

Supported

Mixed-precision quantization

Supported

Not supported

Binarization

Supported

Not supported

The model optimization workflow using NNCF:

_images/nncf_workflow.png

The main NNCF characteristics:

  • Support for optimization of PyTorch and TensorFlow 2.x models.

  • Stacking of optimization methods, for example: 8-bit quaNtization + Filter Pruning.

  • Support for Accuracy-Aware model training pipelines via the Adaptive Compression Level Training and Early Exit Training.

  • Automatic and configurable model graph transformation to obtain the compressed model (limited support for TensorFlow models, only the ones created using Sequential or Keras Functional API, are supported).

  • GPU-accelerated layers for faster compressed model fine-tuning.

  • Distributed training support.

  • Configuration file examples for each supported compression algorithm.

  • Exporting PyTorch compressed models to ONNX checkpoints and TensorFlow compressed models to SavedModel or Frozen Graph format, ready to use with OpenVINO toolkit.

  • Open source, available on GitHub.

  • Git patches for prominent third-party repositories (huggingface-transformers) demonstrating the process of integrating NNCF into custom training pipelines.

Get started

Installation

NNCF provides the packages available for installation through the PyPI repository. To install the latest version via pip manager run the following command:

pip install nncf

Usage examples

NNCF provides various examples and tutorials that demonstrate usage of optimization methods.