Lightweight image generation with aMUSEd and OpenVINO#
This Jupyter notebook can be launched on-line, opening an interactive environment in a browser window. You can also make a local installation. Choose one of the following options:
Amused is a lightweight text to image model based off of the muse architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
Amused is a VQVAE token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
Table of contents:
Installation Instructions#
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.
Prerequisites#
%pip install -q transformers "diffusers>=0.25.0" "openvino>=2023.2.0" "accelerate>=0.20.3" "gradio>=4.19" "torch>=2.1" "pillow" "torchmetrics" "torch-fidelity" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q "nncf>=2.9.0" datasets
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
# Fetch the notebook utils script from the openvino_notebooks repo
import requests
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py",
)
open("notebook_utils.py", "w").write(r.text)
24692
Load and run the original pipeline#
import torch
from diffusers import AmusedPipeline
pipe = AmusedPipeline.from_pretrained(
"amused/amused-256",
)
prompt = "kind smiling ghost"
image = pipe(prompt, generator=torch.Generator("cpu").manual_seed(8)).images[0]
image.save("text2image_256.png")
Loading pipeline components...: 0%| | 0/5 [00:00<?, ?it/s]
0%| | 0/12 [00:00<?, ?it/s]
image
Convert the model to OpenVINO IR#
aMUSEd consists of three separately trained components: a pre-trained CLIP-L/14 text encoder, a VQ-GAN, and a U-ViT.
During inference, the U-ViT is conditioned on the text encoder’s hidden states and iteratively predicts values for all masked tokens. The cosine masking schedule determines a percentage of the most confident token predictions to be fixed after every iteration. After 12 iterations, all tokens have been predicted and are decoded by the VQ-GAN into image pixels.
Define paths for converted models:
from pathlib import Path
TRANSFORMER_OV_PATH = Path("models/transformer_ir.xml")
TEXT_ENCODER_OV_PATH = Path("models/text_encoder_ir.xml")
VQVAE_OV_PATH = Path("models/vqvae_ir.xml")
Define the conversion function for PyTorch modules. We use
ov.convert_model
function to obtain OpenVINO Intermediate
Representation object and ov.save_model
function to save it as XML
file.
import torch
import openvino as ov
def convert(model: torch.nn.Module, xml_path: str, example_input):
xml_path = Path(xml_path)
if not xml_path.exists():
xml_path.parent.mkdir(parents=True, exist_ok=True)
with torch.no_grad():
converted_model = ov.convert_model(model, example_input=example_input)
ov.save_model(converted_model, xml_path, compress_to_fp16=False)
# cleanup memory
torch._C._jit_clear_class_registry()
torch.jit._recursive.concrete_type_store = torch.jit._recursive.ConcreteTypeStore()
torch.jit._state._clear_class_state()
Convert the Text Encoder#
class TextEncoderWrapper(torch.nn.Module):
def __init__(self, text_encoder):
super().__init__()
self.text_encoder = text_encoder
def forward(self, input_ids=None, return_dict=None, output_hidden_states=None):
outputs = self.text_encoder(
input_ids=input_ids,
return_dict=return_dict,
output_hidden_states=output_hidden_states,
)
return outputs.text_embeds, outputs.last_hidden_state, outputs.hidden_states
input_ids = pipe.tokenizer(
prompt,
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=pipe.tokenizer.model_max_length,
)
input_example = {
"input_ids": input_ids.input_ids,
"return_dict": torch.tensor(True),
"output_hidden_states": torch.tensor(True),
}
convert(TextEncoderWrapper(pipe.text_encoder), TEXT_ENCODER_OV_PATH, input_example)
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/modeling_utils.py:5006: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead warnings.warn( loss_type=None was set in the config but it is unrecognised.Using the default loss: ForCausalLMLoss. /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:88: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:164: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:861: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! encoder_states = () if output_hidden_states else None /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:866: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if output_hidden_states: /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:889: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if output_hidden_states: /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:892: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:988: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: /opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:1486: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict:
Convert the U-ViT transformer#
class TransformerWrapper(torch.nn.Module):
def __init__(self, transformer):
super().__init__()
self.transformer = transformer
def forward(
self,
latents=None,
micro_conds=None,
pooled_text_emb=None,
encoder_hidden_states=None,
):
return self.transformer(
latents,
micro_conds=micro_conds,
pooled_text_emb=pooled_text_emb,
encoder_hidden_states=encoder_hidden_states,
)
shape = (1, 16, 16)
latents = torch.full(shape, pipe.scheduler.config.mask_token_id, dtype=torch.long)
latents = torch.cat([latents] * 2)
example_input = {
"latents": latents,
"micro_conds": torch.rand([2, 5], dtype=torch.float32),
"pooled_text_emb": torch.rand([2, 768], dtype=torch.float32),
"encoder_hidden_states": torch.rand([2, 77, 768], dtype=torch.float32),
}
pipe.transformer.eval()
w_transformer = TransformerWrapper(pipe.transformer)
convert(w_transformer, TRANSFORMER_OV_PATH, example_input)
Convert VQ-GAN decoder (VQVAE)#
Function get_latents
is
needed to return real latents for the conversion. Due to the VQVAE
implementation autogenerated tensor of the required shape is not
suitable. This function repeats part of AmusedPipeline
.
def get_latents():
shape = (1, 16, 16)
latents = torch.full(shape, pipe.scheduler.config.mask_token_id, dtype=torch.long)
model_input = torch.cat([latents] * 2)
model_output = pipe.transformer(
model_input,
micro_conds=torch.rand([2, 5], dtype=torch.float32),
pooled_text_emb=torch.rand([2, 768], dtype=torch.float32),
encoder_hidden_states=torch.rand([2, 77, 768], dtype=torch.float32),
)
guidance_scale = 10.0
uncond_logits, cond_logits = model_output.chunk(2)
model_output = uncond_logits + guidance_scale * (cond_logits - uncond_logits)
latents = pipe.scheduler.step(
model_output=model_output,
timestep=torch.tensor(0),
sample=latents,
).prev_sample
return latents
class VQVAEWrapper(torch.nn.Module):
def __init__(self, vqvae):
super().__init__()
self.vqvae = vqvae
def forward(self, latents=None, force_not_quantize=True, shape=None):
outputs = self.vqvae.decode(
latents,
force_not_quantize=force_not_quantize,
shape=shape.tolist(),
)
return outputs
latents = get_latents()
example_vqvae_input = {
"latents": latents,
"force_not_quantize": torch.tensor(True),
"shape": torch.tensor((1, 16, 16, 64)),
}
convert(VQVAEWrapper(pipe.vqvae), VQVAE_OV_PATH, example_vqvae_input)
/tmp/ipykernel_498025/3779428577.py:34: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
shape=shape.tolist(),
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/diffusers/models/autoencoders/vq_model.py:144: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not force_not_quantize:
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/diffusers/models/upsampling.py:147: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/diffusers/models/upsampling.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if hidden_states.shape[0] >= 64:
Compiling models and prepare pipeline#
Select device from dropdown list for running inference using OpenVINO.
from notebook_utils import device_widget
device = device_widget()
device
Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')
core = ov.Core()
ov_text_encoder = core.compile_model(TEXT_ENCODER_OV_PATH, device.value)
ov_transformer = core.compile_model(TRANSFORMER_OV_PATH, device.value)
ov_vqvae = core.compile_model(VQVAE_OV_PATH, device.value)
Let’s create callable wrapper classes for compiled models to allow
interaction with original AmusedPipeline
class. Note that all of
wrapper classes return torch.Tensor
s instead of np.array
s.
from collections import namedtuple
class ConvTextEncoderWrapper(torch.nn.Module):
def __init__(self, text_encoder, config):
super().__init__()
self.config = config
self.text_encoder = text_encoder
def forward(self, input_ids=None, return_dict=None, output_hidden_states=None):
inputs = {
"input_ids": input_ids,
"return_dict": return_dict,
"output_hidden_states": output_hidden_states,
}
outs = self.text_encoder(inputs)
outputs = namedtuple("CLIPTextModelOutput", ("text_embeds", "last_hidden_state", "hidden_states"))
text_embeds = torch.from_numpy(outs[0])
last_hidden_state = torch.from_numpy(outs[1])
hidden_states = list(torch.from_numpy(out) for out in outs.values())[2:]
return outputs(text_embeds, last_hidden_state, hidden_states)
class ConvTransformerWrapper(torch.nn.Module):
def __init__(self, transformer, config):
super().__init__()
self.config = config
self.transformer = transformer
def forward(self, latents=None, micro_conds=None, pooled_text_emb=None, encoder_hidden_states=None, **kwargs):
outputs = self.transformer(
{
"latents": latents,
"micro_conds": micro_conds,
"pooled_text_emb": pooled_text_emb,
"encoder_hidden_states": encoder_hidden_states,
},
share_inputs=False,
)
return torch.from_numpy(outputs[0])
class ConvVQVAEWrapper(torch.nn.Module):
def __init__(self, vqvae, dtype, config):
super().__init__()
self.vqvae = vqvae
self.dtype = dtype
self.config = config
def decode(self, latents=None, force_not_quantize=True, shape=None):
inputs = {
"latents": latents,
"force_not_quantize": force_not_quantize,
"shape": torch.tensor(shape),
}
outs = self.vqvae(inputs)
outs = namedtuple("VQVAE", "sample")(torch.from_numpy(outs[0]))
return outs
And insert wrappers instances in the pipeline:
prompt = "kind smiling ghost"
transformer = pipe.transformer
vqvae = pipe.vqvae
text_encoder = pipe.text_encoder
pipe.__dict__["_internal_dict"]["_execution_device"] = pipe._execution_device # this is to avoid some problem that can occur in the pipeline
pipe.register_modules(
text_encoder=ConvTextEncoderWrapper(ov_text_encoder, text_encoder.config),
transformer=ConvTransformerWrapper(ov_transformer, transformer.config),
vqvae=ConvVQVAEWrapper(ov_vqvae, vqvae.dtype, vqvae.config),
)
image = pipe(prompt, generator=torch.Generator("cpu").manual_seed(8)).images[0]
image.save("text2image_256.png")
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/diffusers/configuration_utils.py:140: FutureWarning: Accessing config attribute _execution_device directly via 'AmusedPipeline' object attribute is deprecated. Please access '_execution_device' over 'AmusedPipeline's config object instead, e.g. 'scheduler.config._execution_device'. deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
0%| | 0/12 [00:00<?, ?it/s]
image
Quantization#
NNCF enables
post-training quantization by adding quantization layers into model
graph and then using a subset of the training dataset to initialize the
parameters of these additional quantization layers. Quantized operations
are executed in INT8
instead of FP32
/FP16
making model
inference faster.
According to Amused
pipeline structure, the vision transformer model
takes up significant portion of the overall pipeline execution time. Now
we will show you how to optimize the UNet part using
NNCF to reduce
computation cost and speed up the pipeline. Quantizing the rest of the
pipeline does not significantly improve inference performance but can
lead to a substantial degradation of generations quality.
We also estimate the quality of generations produced by optimized pipeline with Inception Score which is often used to measure quality of text-to-image generation systems.
The steps are the following:
Create a calibration dataset for quantization.
Run
nncf.quantize()
on the model.Save the quantized model using
openvino.save_model()
function.Compare inference time and Inception score for original and quantized pipelines.
Please select below whether you would like to run quantization to improve model inference speed.
NOTE: Quantization is time and memory consuming operation. Running quantization code below may take some time.
from notebook_utils import quantization_widget
QUANTIZED_TRANSFORMER_OV_PATH = Path(str(TRANSFORMER_OV_PATH).replace(".xml", "_quantized.xml"))
skip_for_device = "GPU" in device.value
to_quantize = quantization_widget(not skip_for_device)
to_quantize
Checkbox(value=True, description='Quantization')
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py",
)
open("skip_kernel_extension.py", "w").write(r.text)
%load_ext skip_kernel_extension
Prepare calibration dataset#
We use a portion of
conceptual_captions
dataset from Hugging Face as calibration data. To collect intermediate
model inputs for calibration we customize CompiledModel
.
%%skip not $to_quantize.value
import datasets
from tqdm.auto import tqdm
from typing import Any, Dict, List
import pickle
import numpy as np
def disable_progress_bar(pipeline, disable=True):
if not hasattr(pipeline, "_progress_bar_config"):
pipeline._progress_bar_config = {'disable': disable}
else:
pipeline._progress_bar_config['disable'] = disable
class CompiledModelDecorator(ov.CompiledModel):
def __init__(self, compiled_model: ov.CompiledModel, data_cache: List[Any] = None, keep_prob: float = 0.5):
super().__init__(compiled_model)
self.data_cache = data_cache if data_cache is not None else []
self.keep_prob = keep_prob
def __call__(self, *args, **kwargs):
if np.random.rand() <= self.keep_prob:
self.data_cache.append(*args)
return super().__call__(*args, **kwargs)
def collect_calibration_data(ov_transformer_model, calibration_dataset_size: int) -> List[Dict]:
calibration_dataset_filepath = Path(f"calibration_data/{calibration_dataset_size}.pkl")
if not calibration_dataset_filepath.exists():
calibration_data = []
pipe.transformer.transformer = CompiledModelDecorator(ov_transformer_model, calibration_data, keep_prob=1.0)
disable_progress_bar(pipe)
dataset = datasets.load_dataset("google-research-datasets/conceptual_captions", split="train", trust_remote_code=True).shuffle(seed=42)
# Run inference for data collection
pbar = tqdm(total=calibration_dataset_size)
for batch in dataset:
prompt = batch["caption"]
if len(prompt) > pipe.tokenizer.model_max_length:
continue
pipe(prompt, generator=torch.Generator('cpu').manual_seed(0))
pbar.update(len(calibration_data) - pbar.n)
if pbar.n >= calibration_dataset_size:
break
pipe.transformer.transformer = ov_transformer_model
disable_progress_bar(pipe, disable=False)
calibration_dataset_filepath.parent.mkdir(exist_ok=True, parents=True)
with open(calibration_dataset_filepath, 'wb') as f:
pickle.dump(calibration_data, f)
with open(calibration_dataset_filepath, 'rb') as f:
calibration_data = pickle.load(f)
return calibration_data
Run model quantization#
Run calibration data collection and quantize the vision transformer model.
%%skip not $to_quantize.value
from nncf.quantization.advanced_parameters import AdvancedSmoothQuantParameters
from nncf.quantization.range_estimator import RangeEstimatorParameters, StatisticsCollectorParameters, StatisticsType, \
AggregatorType
import nncf
CALIBRATION_DATASET_SIZE = 12 * 25
if not QUANTIZED_TRANSFORMER_OV_PATH.exists():
calibration_data = collect_calibration_data(ov_transformer, CALIBRATION_DATASET_SIZE)
quantized_model = nncf.quantize(
core.read_model(TRANSFORMER_OV_PATH),
nncf.Dataset(calibration_data),
model_type=nncf.ModelType.TRANSFORMER,
subset_size=len(calibration_data),
# We ignore convolutions to improve quality of generations without significant drop in inference speed
ignored_scope=nncf.IgnoredScope(types=["Convolution"]),
# Value of 0.85 was obtained using grid search based on Inception Score computed below
advanced_parameters=nncf.AdvancedQuantizationParameters(
smooth_quant_alphas=AdvancedSmoothQuantParameters(matmul=0.85),
# During activation statistics collection we ignore 1% of outliers which improves quantization quality
activations_range_estimator_params=RangeEstimatorParameters(
min=StatisticsCollectorParameters(statistics_type=StatisticsType.MIN,
aggregator_type=AggregatorType.MEAN_NO_OUTLIERS,
quantile_outlier_prob=0.01),
max=StatisticsCollectorParameters(statistics_type=StatisticsType.MAX,
aggregator_type=AggregatorType.MEAN_NO_OUTLIERS,
quantile_outlier_prob=0.01)
)
)
)
ov.save_model(quantized_model, QUANTIZED_TRANSFORMER_OV_PATH)
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, openvino
0%| | 0/300 [00:00<?, ?it/s]
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/diffusers/configuration_utils.py:140: FutureWarning: Accessing config attribute _execution_device directly via 'AmusedPipeline' object attribute is deprecated. Please access '_execution_device' over 'AmusedPipeline's config object instead, e.g. 'scheduler.config._execution_device'. deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
Output()
Output()
INFO:nncf:3 ignored nodes were found by types in the NNCFGraph
INFO:nncf:Not adding activation input quantizer for operation: 53 __module.transformer.embed.conv/aten::_convolution/Convolution
INFO:nncf:Not adding activation input quantizer for operation: 1986 __module.transformer.mlm_layer.conv1/aten::_convolution/Convolution
INFO:nncf:Not adding activation input quantizer for operation: 2927 __module.transformer.mlm_layer.conv2/aten::_convolution/Convolution
Output()
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/tensor/tensor.py:100: RuntimeWarning: invalid value encountered in multiply
return Tensor(self.data * unwrap_tensor_data(other))
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/tensor/tensor.py:100: RuntimeWarning: invalid value encountered in multiply
return Tensor(self.data * unwrap_tensor_data(other))
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/tensor/tensor.py:100: RuntimeWarning: invalid value encountered in multiply
return Tensor(self.data * unwrap_tensor_data(other))
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/tensor/tensor.py:100: RuntimeWarning: invalid value encountered in multiply
return Tensor(self.data * unwrap_tensor_data(other))
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/tensor/tensor.py:100: RuntimeWarning: invalid value encountered in multiply
return Tensor(self.data * unwrap_tensor_data(other))
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/tensor/tensor.py:100: RuntimeWarning: invalid value encountered in multiply
return Tensor(self.data * unwrap_tensor_data(other))
Demo generation with quantized pipeline#
%%skip not $to_quantize.value
original_ov_transformer_model = pipe.transformer.transformer
pipe.transformer.transformer = core.compile_model(QUANTIZED_TRANSFORMER_OV_PATH, device.value)
image = pipe(prompt, generator=torch.Generator('cpu').manual_seed(8)).images[0]
image.save('text2image_256_quantized.png')
pipe.transformer.transformer = original_ov_transformer_model
display(image)
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/diffusers/configuration_utils.py:140: FutureWarning: Accessing config attribute _execution_device directly via 'AmusedPipeline' object attribute is deprecated. Please access '_execution_device' over 'AmusedPipeline's config object instead, e.g. 'scheduler.config._execution_device'. deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
0%| | 0/12 [00:00<?, ?it/s]
Compute Inception Scores and inference time#
Below we compute Inception
Score of original and
quantized pipelines on a small subset of images. Images are generated
from prompts of conceptual_captions
validation set. We also measure
the time it took to generate the images for comparison reasons.
Please note that the validation dataset size is small and serves only as a rough estimate of generation quality.
%%skip not $to_quantize.value
from torchmetrics.image.inception import InceptionScore
from torchvision import transforms as transforms
from itertools import islice
import time
VALIDATION_DATASET_SIZE = 100
def compute_inception_score(ov_transformer_model_path, validation_set_size, batch_size=100):
original_ov_transformer_model = pipe.transformer.transformer
pipe.transformer.transformer = core.compile_model(ov_transformer_model_path, device.value)
disable_progress_bar(pipe)
dataset = datasets.load_dataset("google-research-datasets/conceptual_captions", "unlabeled", split="validation", trust_remote_code=True).shuffle(seed=42)
dataset = islice(dataset, validation_set_size)
inception_score = InceptionScore(normalize=True, splits=1)
images = []
infer_times = []
for batch in tqdm(dataset, total=validation_set_size, desc="Computing Inception Score"):
prompt = batch["caption"]
if len(prompt) > pipe.tokenizer.model_max_length:
continue
start_time = time.perf_counter()
image = pipe(prompt, generator=torch.Generator('cpu').manual_seed(0)).images[0]
infer_times.append(time.perf_counter() - start_time)
image = transforms.ToTensor()(image)
images.append(image)
mean_perf_time = sum(infer_times) / len(infer_times)
while len(images) > 0:
images_batch = torch.stack(images[-batch_size:])
images = images[:-batch_size]
inception_score.update(images_batch)
kl_mean, kl_std = inception_score.compute()
pipe.transformer.transformer = original_ov_transformer_model
disable_progress_bar(pipe, disable=False)
return kl_mean, mean_perf_time
original_inception_score, original_time = compute_inception_score(TRANSFORMER_OV_PATH, VALIDATION_DATASET_SIZE)
print(f"Original pipeline Inception Score: {original_inception_score}")
quantized_inception_score, quantized_time = compute_inception_score(QUANTIZED_TRANSFORMER_OV_PATH, VALIDATION_DATASET_SIZE)
print(f"Quantized pipeline Inception Score: {quantized_inception_score}")
print(f"Quantization speed-up: {original_time / quantized_time:.2f}x")
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: Metric InceptionScore will save all extracted features in buffer. For large datasets this may lead to large memory footprint. warnings.warn(*args, **kwargs) # noqa: B028
Computing Inception Score: 0%| | 0/100 [00:00<?, ?it/s]
/opt/home/k8sworker/ci-ai/cibuilds/jobs/ov-notebook/jobs/OVNotebookOps/builds/810/archive/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torchmetrics/image/inception.py:175: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at ../aten/src/ATen/native/ReduceOps.cpp:1808.)
return kl.mean(), kl.std()
Original pipeline Inception Score: 11.875359535217285
Computing Inception Score: 0%| | 0/100 [00:00<?, ?it/s]
Quantized pipeline Inception Score: 11.0730562210083
Quantization speed-up: 2.08x
Interactive inference#
Below you can select which pipeline to run: original or quantized.
import ipywidgets as widgets
quantized_model_present = QUANTIZED_TRANSFORMER_OV_PATH.exists()
use_quantized_model = widgets.Checkbox(
value=True if quantized_model_present else False,
description="Use quantized pipeline",
disabled=not quantized_model_present,
)
use_quantized_model
Checkbox(value=True, description='Use quantized pipeline')
from pathlib import Path
if not Path("gradio_helper.py").exists():
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/amused-lightweight-text-to-image/gradio_helper.py"
)
open("gradio_helper.py", "w").write(r.text)
from gradio_helper import make_demo
pipe.transformer.transformer = core.compile_model(
QUANTIZED_TRANSFORMER_OV_PATH if use_quantized_model.value else TRANSFORMER_OV_PATH,
device.value,
)
demo = make_demo(pipe)
try:
demo.queue().launch(debug=False)
except Exception:
demo.queue().launch(debug=False, share=True)
# if you are launching remotely, specify server_name and server_port
# demo.launch(server_name='your server name', server_port='server port in int')
# Read more in the docs: https://gradio.app/docs/
Running on local URL: http://127.0.0.1:7860 To create a public link, set share=True in launch().