Sound Generation with Stable Audio Open and OpenVINO™#

This Jupyter notebook can be launched after a local installation only.

Github

Stable Audio Open is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts. The model was trained on data from Freesound and the Free Music Archive, respecting creator rights.

stable-audio

stable-audio#

Key Takeaways:#

  • Stable Audio Open is an open source text-to-audio model for generating up to 47 seconds of samples and sound effects.

  • Users can create drum beats, instrument riffs, ambient sounds, foley and production elements.

  • The model enables audio variations and style transfer of audio samples.

This model is made to be used with the stable-audio-tools library for inference.

Table of contents:

Installation Instructions#

This is a self-contained example that relies solely on its own code.

We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.

Prerequisites#

>Note: using python3.8 can take a long time to resolve dependency conflicts.

%pip install -q "torch>=2.2" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q  "stable-audio-tools" "nncf>=2.12.0" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q  "openvino>=2024.3.0"

Load the original model and inference#

Note: run model with notebook, you will need to accept license agreement. You must be a registered user in Hugging Face Hub. Please visit HuggingFace model card, carefully read terms of usage and click accept button. You will need to use an access token for the code below to run. For more information on access tokens, refer to this section of the documentation. You can login on Hugging Face Hub in notebook environment, using following code:

# uncomment these lines to login to huggingfacehub to get access to pretrained model
# from huggingface_hub import notebook_login, whoami

# try:
#     whoami()
#     print('Authorization token already provided')
# except OSError:
#     notebook_login()
import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond


# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
/home/ea/work/py3.11/lib/python3.11/site-packages/x_transformers/x_transformers.py:435: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  @autocast(enabled = False)
/home/ea/work/py3.11/lib/python3.11/site-packages/x_transformers/x_transformers.py:461: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  @autocast(enabled = False)
/home/ea/work/py3.11/lib/python3.11/site-packages/stable_audio_tools/models/transformer.py:126: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  @autocast(enabled = False)
/home/ea/work/py3.11/lib/python3.11/site-packages/stable_audio_tools/models/transformer.py:151: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  @autocast(enabled = False)
No module named 'flash_attn'
flash_attn not installed, disabling Flash Attention
/home/ea/work/py3.11/lib/python3.11/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:436: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  @autocast(enabled = False)
/home/ea/work/py3.11/lib/python3.11/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:619: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  @autocast(enabled = False)
/home/ea/work/py3.11/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  WeightNorm.apply(module, name, dim)
sample_rate = model_config["sample_rate"]

model = model.to("cpu")
total_seconds = 20

# Set up text and timing conditioning
conditioning = [{"prompt": "128 BPM tech house drum loop", "seconds_start": 0, "seconds_total": total_seconds}]

# Generate stereo audio
output = generate_diffusion_cond(
    model,
    steps=100,
    seed=42,
    cfg_scale=7,
    conditioning=conditioning,
    sample_size=sample_rate * total_seconds,
    sigma_min=0.3,
    sigma_max=500,
    sampler_type="dpmpp-3m-sde",
    device="cpu",
)

# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")

# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)
42
/home/ea/work/py3.11/lib/python3.11/site-packages/stable_audio_tools/models/conditioners.py:314: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  with torch.cuda.amp.autocast(dtype=torch.float16) and torch.set_grad_enabled(self.enable_grad):
/home/ea/work/py3.11/lib/python3.11/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn(
/home/ea/work/py3.11/lib/python3.11/site-packages/stable_audio_tools/inference/sampling.py:177: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
  with torch.cuda.amp.autocast():
0%|          | 0/100 [00:00<?, ?it/s]
/home/ea/work/py3.11/lib/python3.11/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.000061.
  warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
/home/ea/work/py3.11/lib/python3.11/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
from IPython.display import Audio

Audio("output.wav")