Compressing a Model to FP16¶
By default, when IR is saved all relevant floating-point weights are compressed to
FP16 data type during model conversion.
It results in creating a “compressed
FP16 model”, which occupies about half of
the original space in the file system. The compression may introduce a minor drop in accuracy,
but it is negligible for most models.
In case if accuracy drop is significant user can disable compression explicitly.
To disable compression, use the
from openvino.runtime import save_model ov_model = save_model(INPUT_MODEL, compress_to_fp16=False)
mo --input_model INPUT_MODEL --compress_to_fp16=False
For details on how plugins handle compressed
FP16 models, see
Working with devices.
FP16 compression is sometimes used as the initial step for
Refer to the Post-training optimization guide for more
information about that.
Some large models (larger than a few GB) when compressed to
FP16 may consume an overly large amount of RAM on the loading
phase of the inference. If that is the case for your model, try to convert it without compression:
convert_model(INPUT_MODEL, compress_to_fp16=False) or