Multi LoRA Image Generation#
This Jupyter notebook can be launched after a local installation only.
LoRA, or Low-Rank Adaptation, is a popular and lightweight training technique used for fine-tuning Large Language and Stable Diffusion Models without needing full model training. Full fine-tuning of larger models (consisting of billions of parameters) is inherently expensive and time-consuming. LoRA works by adding a smaller number of new weights to the model for training, rather than retraining the entire parameter space of the model. This makes training with LoRA much faster, memory-efficient, and produces smaller model weights (a few hundred MBs), which are easier to store and share.
At its core, LoRA leverages the concept of low-rank matrix factorization. Instead of updating all the parameters in a neural network, LoRA decomposes the parameter space into two low-rank matrices. This decomposition allows the model to capture essential information with fewer parameters, significantly reducing the amount of data and computation required for fine-tuning.
By incorporating LoRA into Stable Diffusion models, we can enhance their ability to understand complex relationships and patterns in data. This approach opens up numerous possibilities: * Art and Design: Artists can fine-tune models to generate images that align with their unique styles, creating personalized artwork effortlessly. * Content Creation: Businesses can customize image generation models to produce branded visuals, enhancing marketing and media production. * Entertainment: Game developers and filmmakers can use fine-tuned models to create realistic and imaginative worlds, streamlining the creative process.
In this tutorial we explore possibilities to use LoRA with OpenVINO Generative API.
Table of contents:
Installation Instructions#
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.
import platform
%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu torch torchvision transformers accelerate "diffusers>0.25.0" pillow "gradio>=4.19" "peft>=0.7.0"
%pip install -q "git+https://github.com/huggingface/optimum-intel.git"
%pip install -q -U "openvino>=2024.5.0" "openvino-tokenizers>=2024.5.0" "openvino-genai>=2024.5.0"
if platform.system() == "Darwin":
%pip install -q "numpy<2.0.0"
import requests
from pathlib import Path
notebook_utils_path = Path("notebook_utils.py")
lora_config_path = Path("lora_config.py")
if not notebook_utils_path.exists():
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py",
)
notebook_utils_path.open("w").write(r.text)
if not lora_config_path.exists():
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/multilora-image-generation/lora_config.py",
)
lora_config_path.open("w").write(r.text)
Convert Diffusion Model using Optimum Intel#
Optimum Intel is the interface between the Transformers and Diffusers libraries and OpenVINO to accelerate end-to-end pipelines on Intel architectures. It provides ease-to-use interface for exporting models to OpenVINO Intermediate Representation (IR) format.
Applying LoRA to Original Diffusers pipeline before conversion#
LoRA can be easily added to Diffusers pipeline before export. At the export stage, LoRA weights will be fused to original model weights and converted model will preserve LoRA provided behavior. This approach is suitable when you need model with adapter capabilities by default and it does not required configuration at inference time (e.g. changing weight coefficient for adapter). For example, we can use this method for speedup generation process with integration LCM LoRA. Previously, we already considered with approach in this tutorial.
Using optimum-cli
for exporting models requires to provide model id
on HuggingFace Hub or local directory with saved model. In case, if
model stored in multiple separated repositories or directories (e.g. you
want to replace VAE component or add LoRA), it should be merged and
saved on disk before export. For avoiding this, we will use
export_from_model
function that accepts initialized model.
Additionally, for using model with OpenVINO GenAI, we need to export
tokenizers to OpenVINO format using OpenVINO
Tokenizers
library.
In this tutorial we will use Stable Diffusion XL model, but the same steps are also applicable to other models of Stable Diffusion family.
from pathlib import Path
from diffusers import DiffusionPipeline, AutoencoderKL, LCMScheduler
from optimum.exporters.openvino import export_from_model
from optimum.intel.openvino import OVConfig
from optimum.exporters.openvino.convert import export_tokenizer
import gc
model_dir = Path("sdxl-lcm")
if not model_dir.exists():
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter_id = "latent-consistency/lcm-lora-sdxl"
vae_id = "madebyollin/sdxl-vae-fp16-fix"
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix")
pipe = DiffusionPipeline.from_pretrained(model_id, vae=vae, variant="fp16", use_safetensors=True)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()
export_from_model(pipe, model_dir, task="text-to-image", stateful=False, ov_config=OVConfig(dtype="fp16"))
for tokenizer in ["tokenizer", "tokenizer_2"]:
tokenizer_model = getattr(pipe, tokenizer, None)
if tokenizer_model is not None:
export_tokenizer(tokenizer_model, model_dir / tokenizer, task="text-to-image")
del vae
del pipe
gc.collect()
2024-11-08 16:49:48.963221: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-11-08 16:49:48.977712: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1731070188.992824 718925 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1731070188.997386 718925 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-08 16:49:49.014687: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Image Generation using OpenVINO GenAI#
OpenVINO™ GenAI is a library of the most popular Generative AI model pipelines, optimized execution methods, and samples that run on top of highly performant OpenVINO Runtime.
This library is friendly to PC and laptop execution, and optimized for resource consumption. It requires no external dependencies to run generative models as it already includes all the core functionality.
openvino_genai.Text2ImagePipeline
class supports inference of
Diffusers
models.
For pipeline initialization, we should provide directory with converted
by Optimum Intel pipeline and specify inference device. Optionally, we
can provide configuration for LoRA Adapters using adapter_config
.
For starting generation process generate
method should be used.
Basically, it required to provide input text prompt for image
generation. You can provide additional arguments like negative prompt,
number of steps, guidance scale, image width and height to control
generation process.
Integration LoRA into pipeline#
Similarly to Diffusers pipeline, you can store separately and load LoRA
into base pipeline before inference using OpenVINO GenAI.
openvino_genai.AdapterConfig
serves for adapters management in
openvino_genai.Text2ImagePipeline
. It can be used for adding and
removing adapters or changing their weight coefficient for blending into
pipeline. You can add one or multiple adapters into config and also
specify alpha blending coefficients for their addition. OpenVINO GenAI
supports LoRA adapters saved in Safetensors format. You can use one of
publicly available pretrained adapters from
CivitAI or HuggingFace
Hub or train your own. > Important
Note: Before loading pretrained adapters, please make sure that they
are compatible with your base model architecture. E.g. if you use SDXL
model, you need to provide adapters trained for this model type and
loading adapter, for example, trained for FLUX is not allowed.
Generally, process of adapters configuration consists of 2 steps: 1.
Register adapters in pipeline constructor. At this moment, it is
recommended to provide all adapters that you plan to use on this stage.
2. Choose which adapter (or a combination of adapters) to apply in each
generate
call. It is not obligated to use all of provided in
constructor adapters simultaneously, you can select one or combination
of several among them for each generation cycle.
Prepare LoRA Adapters#
Prepare LoRA Adapters#
from lora_config import LORA
# uncomment this line to see predefined LoRA adapters configuration used in this notebook
# LORA
from huggingface_hub import hf_hub_download
lora_dir = Path("lora")
adapter_paths = []
for lora in LORA:
lora_model_dir = lora_dir / lora["name"].lower().replace(" ", "_")
file_name = lora["file_name"]
if not (lora_model_dir / file_name).exists():
hf_hub_download(repo_id=lora["model_id"], filename=file_name, local_dir=lora_model_dir)
adapter_paths.append(lora_model_dir / file_name)
import openvino_genai as ov_genai
def prepare_adapter_config(scales=None):
if scales is None:
scales = [1 / len(adapter_paths)] * len(adapter_paths)
if isinstance(scales, float):
scales = [scales] * len(adapter_paths)
adapter_config = ov_genai.AdapterConfig()
for adapter, scale in zip(adapter_paths, scales):
adapter_config.add(ov_genai.Adapter(adapter), scale)
return adapter_config
adapters_config = prepare_adapter_config(0.0)
adapters = adapters_config.get_adapters()
Create Inference Pipeline#
diffusion process involves random for preparing initial state for
denoising. For reproducibility of generation results, we will use
Generator
class.
from notebook_utils import device_widget
device = device_widget(default="CPU", exclude=["NPU"])
device
Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')
import openvino as ov
import torch
class Generator(ov_genai.Generator):
def __init__(self, seed):
ov_genai.Generator.__init__(self)
self.generator = torch.Generator(device="cpu").manual_seed(seed)
def next(self):
return torch.randn(1, generator=self.generator, dtype=torch.float32).item()
def randn_tensor(self, shape: ov.Shape):
torch_tensor = torch.randn(list(shape), generator=self.generator, dtype=torch.float32)
return ov.Tensor(torch_tensor.numpy())
pipe = ov_genai.Text2ImagePipeline(model_dir, "CPU", adapters=adapters_config)
Selection specific adapter during generation#
As it was already mention before, it is not necessary to use all
adapters specified at initialization stage for generation in the same
time. Providing adapters argument with openvino_genai.AdapterConfig
into generate
allow to select one or several from them. For example,
let’s select LoRA for generation images in X-Ray style.
subject = "a cute cat in sunglasses"
prompt_template = LORA[0].get("prompt", "<subject>")
adapter_weight = LORA[0].get("weight", 1.0)
prompt = prompt_template.replace("<subject>", subject)
adapter_config = ov_genai.AdapterConfig()
adapter_config.add(adapters[0], adapter_weight)
image_tensor = pipe.generate(prompt, num_inference_steps=4, guidance_scale=0, adapters=adapter_config, generator=Generator(421235))
from PIL import Image
image = Image.fromarray(image_tensor.data[0])
image
Use multiple adapters simultaneously#
You also can use combination of adapters that will be applied in the same time. Let’s see what happens if traditional Japanese art will meet modern illustration pointillistic style.
prompt_template1 = LORA[1].get("prompt", "<subject>")
prompt_template2 = LORA[2].get("prompt", "<subject>")
adapter1_weight = LORA[1].get("weight", 1.0)
adapter2_weight = LORA[2].get("weight", 1.0)
prompt = prompt_template2.replace("<subject>", prompt_template1.replace("<subject>", subject))
adapter_config = ov_genai.AdapterConfig()
adapter_config.add(adapters[1], adapter1_weight)
adapter_config.add(adapters[2], adapter2_weight)
image_tensor = pipe.generate(prompt, num_inference_steps=4, guidance_scale=0, adapters=adapter_config, generator=Generator(421235))
image = Image.fromarray(image_tensor.data[0])
image
Disable adapters#
You can disable adapters providing empty AdapterConfig
into generate
image_tensor = pipe.generate(subject, num_inference_steps=4, guidance_scale=0, adapters=ov_genai.AdapterConfig(), generator=Generator(421235))
image = Image.fromarray(image_tensor.data[0])
image
Interactive demo#
gradio_helper_path = Path("gradio_helper.py")
if not gradio_helper_path.exists():
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/multilora-image-generation/gradio_helper.py",
)
lora_config_path.open("w").write(r.text)
from gradio_helper import make_demo
demo = make_demo(pipe, Generator, adapters, LORA)
try:
demo.launch(debug=False)
except Exception:
demo.launch(share=True, debug=False)
# if you are launching remotely, specify server_name and server_port
# demo.launch(server_name='your server name', server_port='server port in int')
# Read more in the docs: https://gradio.app/docs/