Compression is an important part of the Internet today because it
enables people to easily share high-quality photos, listen to audio
messages, stream their favorite shows, and so much more. Even when using
today’s state-of-the-art techniques, enjoying these rich multimedia
experiences requires a high speed Internet connection and plenty of
storage space. AI helps to overcome these limitations: “Imagine
listening to a friend’s audio message in an area with low connectivity
and not having it stall or glitch.”
This tutorial considers ways to use OpenVINO and EnCodec algorithm for
hyper compression of audio. EnCodec is a real-time, high-fidelity audio
codec that uses AI to compress audio files without losing quality. It
was introduced in High Fidelity Neural Audio
Compression paper by Meta AI.
The researchers claimed they achieved an approximate 10x compression
rate without loss of quality and made it work for CD-quality audio. More
details about this approach can be found in Meta AI
blog
and original repo.
ERROR:pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.pyannote-audio2.0.1requirestorchaudio<1.0,>=0.10,butyouhavetorchaudio2.2.0+cpuwhichisincompatible.torchvision0.14.1+cpurequirestorch==1.13.1,butyouhavetorch2.2.0+cpuwhichisincompatible.
Codecs, which act as encoders
and decoders for streams of data, help empower most of the audio
compression people currently use online. Some examples of commonly used
codecs include MP3, Opus, and EVS. Classic codecs like these decompose
the signal between different frequencies and encode as efficiently as
possible. Most classic codecs leverage human hearing knowledge
(psychoacoustics) but have a finite or given set of handcrafted ways to
efficiently encode and decode the file. EnCodec, a neural network that
is trained from end to end to reconstruct the input signal, was
introduced as an attempt to overcome this limitation. It consists of
three parts:
The encoder, which takes the uncompressed data in and transforms
it into a higher dimensional and lower frame rate representation.
The quantizer, which compresses this representation to the target
size. This compressed representation is what is stored on disk or
will be sent through the network.
The decoder is the final step. It turns the compressed signal
back into a waveform that is as similar as possible to the original.
The key to lossless compression is to identify changes that will not
be perceivable by humans, as perfect reconstruction is impossible at
low bit rates.
)
The authors provide two multi-bandwidth models: *
encodec_model_24khz - a causal model operating at 24 kHz on
monophonic audio trained on a variety of audio data. *
encodec_model_48khz - a non-causal model operating at 48 kHz on
stereophonic audio trained on music-only data.
In this tutorial, we will use encodec_model_24khz as an example, but
the same actions are also applicable to encodec_model_48khz model as
well. To start working with this model, we need to instantiate model
class using EncodecModel.encodec_model_24khz() and select required
compression bandwidth among available: 1.5, 3, 6, 12 or 24 kbps for 24
kHz model and 3, 6, 12 and 24 kbps for 48 kHz model. We will use 6 kbps
bandwidth.
fromencodecimportcompress,decompressfromencodec.utilsimportconvert_audio,save_audiofromencodec.compressimportMODELSimporttorchaudioimporttorchimporttypingastpmodel_id="encodec_24khz"# Instantiate a pretrained EnCodec modelmodel=MODELS[model_id]()model.set_target_bandwidth(6.0)
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py:28:UserWarning:torch.nn.utils.weight_normisdeprecatedinfavoroftorch.nn.utils.parametrizations.weight_norm.warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
To achieve the best result, audio should have the number of channels and
sample rate expected by the model. If audio does not fulfill these
requirements, it can be converted to the desired sample rate and the
number of channels using the convert_audio function.
model_sr,model_channels=model.sample_rate,model.channelsprint(f"Model expected sample rate {model_sr}")print(f"Model expected audio format {'mono'ifmodel_channels==1else'stereo'}")
Audio waveform should be split by chunks and then encoded by Encoder
model, then compressed by quantizer for reducing memory. The result of
compression is a binary file with ecdc extension, a special format
for storing EnCodec compressed audio on disc.
importosorig_file_stats=os.stat(sample_file)compressed_file_stats=os.stat("compressed.ecdc")print(f"size before compression in Bytes: {orig_file_stats.st_size}")print(f"size after compression in Bytes: {compressed_file_stats.st_size}")print(f"Compression file size ratio: {orig_file_stats.st_size/compressed_file_stats.st_size:.2f}")
After successful sending of the compressed audio, it should be
decompressed on the recipient’s side. The decoder model is responsible
for restoring the compressed signal back into a waveform that is as
similar as possible to the original.
out,out_sr=decompress(out_file.read_bytes())
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py:28:UserWarning:torch.nn.utils.weight_normisdeprecatedinfavoroftorch.nn.utils.parametrizations.weight_norm.warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Convert model to OpenVINO Intermediate Representation format¶
For best results with OpenVINO, it is recommended to convert the model
to OpenVINO IR format. OpenVINO supports PyTorch via conversion to
OpenVINO IR format. We need to provide initialized model’s instance and
example of inputs for shape inference. We will use ov.convert_model
functionality to convert the PyTorch models. The ov.convert_model
Python function returns an OpenVINO model ready to load on the device
and start making predictions. We can save it on disk for the next usage
with ov.save_model.
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/encodec/modules/conv.py:60:TracerWarning:ConvertingatensortoaPythonfloatmightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!ideal_length=(math.ceil(n_frames)-1)*stride+(kernel_size-padding_total)/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/encodec/modules/conv.py:85:TracerWarning:ConvertingatensortoaPythonbooleanmightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!assertpadding_left>=0andpadding_right>=0,(padding_left,padding_right)/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/encodec/modules/conv.py:87:TracerWarning:ConvertingatensortoaPythonbooleanmightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!max_pad=max(padding_left,padding_right)/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/encodec/modules/conv.py:89:TracerWarning:ConvertingatensortoaPythonbooleanmightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!iflength<=max_pad:
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/encodec/quantization/core_vq.py:358:TracerWarning:torch.tensorresultsareregisteredasconstantsinthetrace.Youcansafelyignorethiswarningifyouusethisfunctiontocreatetensorsoutofconstantvariablesthatwouldbethesameeverytimeyoucallthisfunction.Inanyothercase,thismightcausethetracetobeincorrect.quantized_out=torch.tensor(0.0,device=q_indices.device)/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/encodec/quantization/core_vq.py:359:TracerWarning:Iteratingoveratensormightcausethetracetobeincorrect.Passingatensorofdifferentshapewon't change the number of iterations executed (and might lead to errors or silently give incorrect results).fori,indicesinenumerate(q_indices):/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/encodec/modules/conv.py:103:TracerWarning:ConvertingatensortoaPythonbooleanmightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!assert(padding_left+padding_right)<=x.shape[-1]
importgradioasgrfromtypingimportTupleimportnumpyasnpdefpreprocess(input,sample_rate,model_sr,model_channels):input=torch.tensor(input,dtype=torch.float32)input=input/2**15# adjust to int16 scaleinput=input.unsqueeze(0)input=convert_audio(input,sample_rate,model_sr,model_channels)returninputdefpostprocess(output):output=output.squeeze()output=output*2**15# adjust to [-1, 1] scaleoutput=output.numpy(force=True)output=output.astype(np.int16)returnoutputdef_compress(input:Tuple[int,np.ndarray]):sample_rate,waveform=inputwaveform=preprocess(waveform,sample_rate,model_sr,model_channels)b=compress(model,waveform,use_lm=False)out,out_sr=decompress(b)out=postprocess(out)returnout_sr,outdemo=gr.Interface(_compress,'audio','audio',examples=['test_24k.wav'])try:demo.launch(debug=False)exceptException:demo.launch(share=True,debug=False)# if you are launching remotely, specify server_name and server_port# demo.launch(server_name='your server name', server_port='server port in int')# Read more in the docs: https://gradio.app/docs/
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().