Text to Image pipeline and OpenVINO with Generate API#
This Jupyter notebook can be launched after a local installation only.
OpenVINO™ GenAI is a library of the most popular Generative AI model pipelines, optimized execution methods, and samples that run on top of highly performant OpenVINO Runtime.
This library is friendly to PC and laptop execution, and optimized for resource consumption. It requires no external dependencies to run generative models as it already includes all the core functionality (e.g. tokenization via openvino-tokenizers).
In this notebook we will demonstrate how to use text to image models like Stable Diffusion 1.5, 2.1, LCM using Dreamlike Anime 1.0 as an example. All it takes is two steps: 1. Export OpenVINO IR format model using the Hugging Face Optimum library accelerated by OpenVINO integration. The Hugging Face Optimum Intel API is a high-level API that enables us to convert and quantize models from the Hugging Face Transformers library to the OpenVINO™ IR format. For more details, refer to the Hugging Face Optimum Intel documentation. 2. Run inference using the Text to Image pipeline from OpenVINO GenAI.
Table of contents:
Installation Instructions#
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.
Prerequisites#
import platform
import requests
%pip install -q "git+https://github.com/huggingface/optimum-intel.git"
%pip install -q -U "openvino>=2024.5" "openvino-tokenizers>=2024.5" "openvino-genai>=2024.5"
%pip install -q Pillow "diffusers>=0.30.3" "gradio>=4.19" "typing_extensions>=4.9"
if platform.system() == "Darwin":
%pip install -q "numpy<2.0.0"
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py",
)
open("notebook_utils.py", "w").write(r.text)
r = requests.get(
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py",
)
open("cmd_helper.py", "w").write(r.text)
Convert model using Optimum-CLI tool#
Optimum Intel is the interface between the Transformers and Diffusers libraries and OpenVINO to accelerate end-to-end pipelines on Intel architectures. It provides ease-to-use cli interface for exporting models to OpenVINO Intermediate Representation (IR) format.
The command bellow demonstrates basic command for model export with
optimum-cli
optimum-cli export openvino --model <model_id_or_path> --task <task> <out_dir>
where --model
argument is model id from HuggingFace Hub or local
directory with model (saved using .save_pretrained
method),
--task
is one of supported
task
that exported model should solve. For image generation models,
text-to-image
should be used. If model initialization requires to
use remote code, --trust-remote-code
flag additionally should be
passed. You can also apply fp16, 8-bit or 4-bit weight compression on
the Linear, Convolutional and Embedding layers when exporting your model
with the CLI by setting --weight-format
to respectively fp16, int8
or int4. This type of optimization allows to reduce the memory footprint
and inference latency.
We will use optimum_cli
from our helper cmd_helper.py
that is a
wrapper over cli-command.
from pathlib import Path
from cmd_helper import optimum_cli
model_dir = Path("dreamlike_anime_1_0_ov")
if not model_dir.exists():
optimum_cli("dreamlike-art/dreamlike-anime-1.0", model_dir)
Run inference OpenVINO model with Text2ImagePipeline#
select device from dropdown list for running inference using OpenVINO
from notebook_utils import device_widget
device = device_widget("CPU", exclude=["NPU"])
device
Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')
And now just provide model_dir
and the chosen inference device to
openvino_genai.Text2ImagePipeline
and call generate
method for
inference. openvino_genai.Generator
class wraps std::mt19937
pseudo-random generator. It can be used for results reproducibility.
That’s it:)
import openvino_genai as ov_genai
import openvino as ov
from PIL import Image
import torch
class Generator(ov_genai.Generator):
def __init__(self, seed):
ov_genai.Generator.__init__(self)
self.generator = torch.Generator(device="cpu").manual_seed(seed)
def next(self):
return torch.randn(1, generator=self.generator, dtype=torch.float32).item()
def randn_tensor(self, shape: ov.Shape):
torch_tensor = torch.randn(list(shape), generator=self.generator, dtype=torch.float32)
return ov.Tensor(torch_tensor.numpy())
random_generator = Generator(42) # openvino_genai.CppStdGenerator can be used to have same images as C++ sample
pipe = ov_genai.Text2ImagePipeline(model_dir, device.value)
prompt = "anime, masterpiece, high quality, a green snowman with a happy smiling face in the snows"
image_tensor = pipe.generate(prompt, width=512, height=512, num_inference_steps=20, num_images_per_prompt=1, generator=random_generator)
image = Image.fromarray(image_tensor.data[0])
image
Run inference OpenVINO model with Text2ImagePipeline with optional LoRA adapters#
LoRA adapters can be connected to the pipeline and modify generated
images to have certain style, details or quality. Adapters are supported
in Safetensors
format and can be downloaded from public sources like
Civitai or
HuggingFace or trained by the user.
Adapters compatible with a base model should be used only. A weighted
blend of multiple adapters can be applied by specifying multiple adapter
files with corresponding alpha parameters in command line. Check
lora.cpp
source code to learn how to enable adapters and specify
them in each generate
call.
Here is an example how to run the sample with a single adapter. First
download adapter file from https://civitai.com/models/67927/soulcard
page manually and save it as soulcard.safetensors
. Or download it
from command line:
r = requests.get(
url="https://civitai.com/api/download/models/72591",
)
with open("soulcard.safetensors", "wb") as file:
file.write(r.content)
def prepare_adapter_config(adapters):
adapter_config = ov_genai.AdapterConfig()
# Multiple LoRA adapters applied simultaneously are supported, parse them all and corresponding alphas from cmd parameters:
for i in range(int(len(adapters) / 2)):
adapter = ov_genai.Adapter(adapters[2 * i])
alpha = float(adapters[2 * i + 1])
adapter_config.add(adapter, alpha)
return adapter_config
adapter_config = prepare_adapter_config(["soulcard.safetensors", 0.5])
pipe = ov_genai.Text2ImagePipeline(model_dir, device.value, adapters=adapter_config)
image_tensor = pipe.generate(prompt, generator=Generator(42), width=512, height=512, num_inference_steps=20)
image = Image.fromarray(image_tensor.data[0])
image
You can find more detailed tutorial for running inference with multiple LoRA adapters in this notebook
Interactive demo#
from gradio_helper import make_demo
demo = make_demo(pipe, Generator, adapter_config)
try:
demo.launch(debug=True)
except Exception:
demo.launch(share=True, debug=True)
# if you are launching remotely, specify server_name and server_port
# demo.launch(server_name='your server name', server_port='server port in int')
# Read more in the docs: https://gradio.app/docs/