Latent Consistency Model using Optimum-Intel OpenVINO#
This Jupyter notebook can be launched after a local installation only.
This notebook provides instructions how to run Latent Consistency Model (LCM). It allows to setup standard Hugging Face diffusers pipeline and Optimum Intel pipeline optimized for Intel hardware including CPU and GPU. Running inference on CPU and GPU it is easy to compare performance and time required to generate an image for provided prompt. The notebook can be also used on other Intel hardware with minimal or no modifications.
Optimum Intel is an interface from Hugging Face between both diffusers and transformers libraries and various tools provided by Intel to accelerate pipelines on Intel hardware. It allows to perform quantization of the models hosted on Hugging Face. In this notebook OpenVINO is used for AI-inference acceleration as a backend for Optimum Intel!
For more details please refer to Optimum Intel repository huggingface/optimum-intel
LCMs are the next generation of generative models after Latent Diffusion Models (LDMs). They are proposed to overcome the slow iterative sampling process of Latent Diffusion Models (LDMs), enabling fast inference with minimal steps (from 2 to 4) on any pre-trained LDMs (e.g. Stable Diffusion). To read more about LCM please refer to https://latent-consistency-models.github.io/
Table of contents:
Installation Instructions#
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.
Prerequisites#
Install required packages
%pip install -q "openvino>=2023.3.0"
%pip install -q "onnx>=1.11.0"
%pip install -q "optimum-intel[diffusers]@git+https://github.com/huggingface/optimum-intel.git" "ipywidgets" "torch>=2.1" "transformers>=4.33.0" --extra-index-url https://download.pytorch.org/whl/cpu
Note: you may need to restart the kernel to use updated packages.
import warnings
warnings.filterwarnings("ignore")
Showing Info Available Devices#
The available_devices
property shows the available devices in your
system. The “FULL_DEVICE_NAME” option to ie.get_property()
shows the
name of the device. Check what is the ID name for the discrete GPU, if
you have integrated GPU (iGPU) and discrete GPU (dGPU), it will show
device_name="GPU.0"
for iGPU and device_name="GPU.1"
for dGPU.
If you just have either an iGPU or dGPU that will be assigned to
"GPU"
Note: For more details about GPU with OpenVINO visit this link. If you have been facing any issue in Ubuntu 20.04 or Windows 11 read this blog.
import openvino as ov
core = ov.Core()
devices = core.available_devices
for device in devices:
device_name = core.get_property(device, "FULL_DEVICE_NAME")
print(f"{device}: {device_name}")
CPU: Intel(R) Core(TM) Ultra 7 155H
GNA.GNA_SW: GNA_SW
GNA.GNA_HW: GNA_HW
GPU: Intel(R) Arc(TM) Graphics (iGPU)
NPU: Intel(R) AI Boost
Using full precision model in CPU with LatentConsistencyModelPipeline
#
Standard pipeline for the Latent Consistency Model(LCM) from Diffusers library is used here. For more information please refer to https://huggingface.co/docs/diffusers/en/api/pipelines/latent_consistency_models
from diffusers import LatentConsistencyModelPipeline
import gc
pipeline = LatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7")
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]
prompt = "A cute squirrel in the forest, portrait, 8k"
image = pipeline(prompt=prompt, num_inference_steps=4, guidance_scale=8.0, height=512, width=512).images[0]
image.save("image_standard_pipeline.png")
image
0%| | 0/4 [00:00<?, ?it/s]
del pipeline
gc.collect();
345
Select inference device for text-to-image generation#
import ipywidgets as widgets
core = ov.Core()
device = widgets.Dropdown(
options=core.available_devices + ["AUTO"],
value="CPU",
description="Device:",
disabled=False,
)
device
Running inference using Optimum Intel OVLatentConsistencyModelPipeline
#
Accelerating inference of LCM using Intel Optimum with OpenVINO backend. For more information please refer to https://huggingface.co/docs/optimum/intel/inference#latent-consistency-models. The pretrained model in this notebook is available on Hugging Face in FP32 precision and in case if CPU is selected as a device, then inference runs with full precision. For GPU accelerated AI-inference is supported for FP16 data type and FP32 precision for GPU may produce high memory footprint and latency. Therefore, default precision for GPU in OpenVINO is FP16. OpenVINO GPU Plugin converts FP32 to FP16 on the fly and there is no need to do it manually
from optimum.intel.openvino import OVLatentConsistencyModelPipeline
from pathlib import Path
if not Path("./openvino_ir").exists():
ov_pipeline = OVLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", height=512, width=512, export=True, compile=False)
ov_pipeline.save_pretrained("./openvino_ir")
else:
ov_pipeline = OVLatentConsistencyModelPipeline.from_pretrained("./openvino_ir", export=False, compile=False)
ov_pipeline.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)
ov_pipeline.to(device.value)
ov_pipeline.compile()
prompt = "A cute squirrel in the forest, portrait, 8k"
image_ov = ov_pipeline(prompt=prompt, num_inference_steps=4, guidance_scale=8.0, height=512, width=512).images[0]
image_ov.save("image_opt.png")
image_ov
0%| | 0/4 [00:00<?, ?it/s]
del ov_pipeline
gc.collect();