Style Transfer with OpenVINO™¶
This Jupyter notebook can be launched on-line, opening an interactive environment in a browser window. You can also make a local installation. Choose one of the following options:
This notebook demonstrates style transfer with OpenVINO, using the Style Transfer Models from ONNX Model Repository. Specifically, Fast Neural Style Transfer model, which is designed to mix the content of an image with the style of another image.
This notebook uses five pre-trained models, for the following styles: Mosaic, Rain Princess, Candy, Udnie and Pointilism. The models are from ONNX Model Repository and are based on the research paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution along with Instance Normalization. Final part of this notebook shows live inference results from a webcam. Additionally, you can also upload a video file.
NOTE: If you have a webcam on your computer, you can see live results streaming in the notebook. If you run the notebook on a server, the webcam will not work but you can run inference, using a video file.
Table of contents:¶
Preparation¶
Install requirements¶
%pip install -q "openvino>=2023.1.0"
%pip install -q opencv-python requests tqdm
# Fetch `notebook_utils` module
import urllib.request
urllib.request.urlretrieve(
url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
filename='notebook_utils.py'
)
Imports¶
import collections
import time
import cv2
import numpy as np
from pathlib import Path
from IPython import display
from ipywidgets import interactive, ToggleButtons
import openvino as ov
import notebook_utils as utils
Select one of the styles below: Mosaic, Rain Princess, Candy, Udnie, and Pointilism to do the style transfer.
# Option to select different styles
styleButtons = ToggleButtons(
options=['MOSAIC', 'RAIN-PRINCESS', 'CANDY', 'UDNIE', 'POINTILISM'],
description="Click one of the styles you want to use for the style transfer",
disabled=False,
style={'description_width': '300px'})
interactive(lambda option: print(option), option=styleButtons)
The Model¶
Download the Model¶
The style transfer model, selected in the previous step, will be
downloaded to model_path
if you have not already downloaded it. The
models are provided by the ONNX Model Zoo in .onnx
format, which
means it could be used with OpenVINO directly. However, this notebook
will also show how you can use the Conversion API to convert ONNX to
OpenVINO Intermediate Representation (IR) with FP16
precision.
# Directory to download the model from ONNX model zoo
base_model_dir = "model"
base_url = "https://github.com/onnx/models/raw/69d69010b7ed6ba9438c392943d2715026792d40/archive/vision/style_transfer/fast_neural_style/model"
# Selected ONNX model will be downloaded in the path
model_path = Path(f"{styleButtons.value.lower()}-9.onnx")
style_url = f"{base_url}/{model_path}"
utils.download_file(style_url, directory=base_model_dir)
Convert ONNX Model to OpenVINO IR Format¶
In the next step, you will convert the ONNX model to OpenVINO IR format
with FP16
precision. While ONNX models are directly supported by
OpenVINO runtime, it can be useful to convert them to IR format to take
advantage of OpenVINO optimization tools and features. The
ov.convert_model
Python function of model conversion API can be
used. The converted model is saved to the model directory. The function
returns instance of OpenVINO Model class, which is ready to use in
Python interface but can also be serialized to OpenVINO IR format for
future execution. If the model has been already converted, you can skip
this step.
# Construct the command for model conversion API.
ov_model = ov.convert_model(f"model/{styleButtons.value.lower()}-9.onnx")
ov.save_model(ov_model, f"model/{styleButtons.value.lower()}-9.xml")
# Converted IR model path
ir_path = Path(f"model/{styleButtons.value.lower()}-9.xml")
onnx_path = Path(f"model/{model_path}")
Load the Model¶
Both the ONNX model(s) and converted IR model(s) are stored in the
model
directory.
Only a few lines of code are required to run the model. First,
initialize OpenVINO Runtime. Then, read the network architecture and
model weights from the .bin
and .xml
files to compile for the
desired device. If you select GPU
you may need to wait briefly for
it to load, as the startup time is somewhat longer than CPU
.
To let OpenVINO automatically select the best device for inference just
use AUTO
. In most cases, the best device to use is GPU
(better
performance, but slightly longer startup time). You can select one from
available devices using dropdown list below.
OpenVINO Runtime can load ONNX models from ONNX Model Repository directly. In such cases, use ONNX path instead of IR model to load the model. It is recommended to load the OpenVINO Intermediate Representation (IR) model for the best results.
# Initialize OpenVINO Runtime.
core = ov.Core()
# Read the network and corresponding weights from ONNX Model.
# model = ie_core.read_model(model=onnx_path)
# Read the network and corresponding weights from IR Model.
model = core.read_model(model=ir_path)
import ipywidgets as widgets
device = widgets.Dropdown(
options=core.available_devices + ["AUTO"],
value='AUTO',
description='Device:',
disabled=False,
)
# Compile the model for CPU (or change to GPU, etc. for other devices)
# or let OpenVINO select the best available device with AUTO.
device
compiled_model = core.compile_model(model=model, device_name=device.value)
# Get the input and output nodes.
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)
Input and output layers have the names of the input node and output node
respectively. For fast-neural-style-mosaic-onnx, there is 1 input and
1 output with the (1, 3, 224, 224)
shape.
print(input_layer.any_name, output_layer.any_name)
print(input_layer.shape)
print(output_layer.shape)
# Get the input size.
N, C, H, W = list(input_layer.shape)
Preprocess the image¶
Preprocess the input image before running the model. Prepare the dimensions and channel order for the image to match the original image with the input tensor
Preprocess a frame to convert from
unit8
tofloat32
.Transpose the array to match with the network input size
# Preprocess the input image.
def preprocess_images(frame, H, W):
"""
Preprocess input image to align with network size
Parameters:
:param frame: input frame
:param H: height of the frame to style transfer model
:param W: width of the frame to style transfer model
:returns: resized and transposed frame
"""
image = np.array(frame).astype('float32')
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
image = cv2.resize(src=image, dsize=(H, W), interpolation=cv2.INTER_AREA)
image = np.transpose(image, [2, 0, 1])
image = np.expand_dims(image, axis=0)
return image
Helper function to postprocess the stylized image¶
The converted IR model outputs a NumPy float32
array of the (1, 3,
224,
224)
shape .
# Postprocess the result
def convert_result_to_image(frame, stylized_image) -> np.ndarray:
"""
Postprocess stylized image for visualization
Parameters:
:param frame: input frame
:param stylized_image: stylized image with specific style applied
:returns: resized stylized image for visualization
"""
h, w = frame.shape[:2]
stylized_image = stylized_image.squeeze().transpose(1, 2, 0)
stylized_image = cv2.resize(src=stylized_image, dsize=(w, h), interpolation=cv2.INTER_CUBIC)
stylized_image = np.clip(stylized_image, 0, 255).astype(np.uint8)
stylized_image = cv2.cvtColor(stylized_image, cv2.COLOR_BGR2RGB)
return stylized_image
Main Processing Function¶
The style transfer function can be run in different operating modes, either using a webcam or a video file.
def run_style_transfer(source=0, flip=False, use_popup=False, skip_first_frames=0):
"""
Main function to run the style inference:
1. Create a video player to play with target fps (utils.VideoPlayer).
2. Prepare a set of frames for style transfer.
3. Run AI inference for style transfer.
4. Visualize the results.
Parameters:
source: The webcam number to feed the video stream with primary webcam set to "0", or the video path.
flip: To be used by VideoPlayer function for flipping capture image.
use_popup: False for showing encoded frames over this notebook, True for creating a popup window.
skip_first_frames: Number of frames to skip at the beginning of the video.
"""
# Create a video player to play with target fps.
player = None
try:
player = utils.VideoPlayer(source=source, flip=flip, fps=30, skip_first_frames=skip_first_frames)
# Start video capturing.
player.start()
if use_popup:
title = "Press ESC to Exit"
cv2.namedWindow(winname=title, flags=cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)
processing_times = collections.deque()
while True:
# Grab the frame.
frame = player.next()
if frame is None:
print("Source ended")
break
# If the frame is larger than full HD, reduce size to improve the performance.
scale = 720 / max(frame.shape)
if scale < 1:
frame = cv2.resize(src=frame, dsize=None, fx=scale, fy=scale,
interpolation=cv2.INTER_AREA)
# Preprocess the input image.
image = preprocess_images(frame, H, W)
# Measure processing time for the input image.
start_time = time.time()
# Perform the inference step.
stylized_image = compiled_model([image])[output_layer]
stop_time = time.time()
# Postprocessing for stylized image.
result_image = convert_result_to_image(frame, stylized_image)
processing_times.append(stop_time - start_time)
# Use processing times from last 200 frames.
if len(processing_times) > 200:
processing_times.popleft()
processing_time_det = np.mean(processing_times) * 1000
# Visualize the results.
f_height, f_width = frame.shape[:2]
fps = 1000 / processing_time_det
cv2.putText(result_image, text=f"Inference time: {processing_time_det:.1f}ms ({fps:.1f} FPS)",
org=(20, 40),fontFace=cv2.FONT_HERSHEY_COMPLEX, fontScale=f_width / 1000,
color=(0, 0, 255), thickness=1, lineType=cv2.LINE_AA)
# Use this workaround if there is flickering.
if use_popup:
cv2.imshow(title, result_image)
key = cv2.waitKey(1)
# escape = 27
if key == 27:
break
else:
# Encode numpy array to jpg.
_, encoded_img = cv2.imencode(".jpg", result_image, params=[cv2.IMWRITE_JPEG_QUALITY, 90])
# Create an IPython image.
i = display.Image(data=encoded_img)
# Display the image in this notebook.
display.clear_output(wait=True)
display.display(i)
# ctrl-c
except KeyboardInterrupt:
print("Interrupted")
# any different error
except RuntimeError as e:
print(e)
finally:
if player is not None:
# Stop capturing.
player.stop()
if use_popup:
cv2.destroyAllWindows()
Run Style Transfer¶
Now, try to apply the style transfer model using video from your webcam
or video file. By default, the primary webcam is set with source=0
.
If you have multiple webcams, each one will be assigned a consecutive
number starting at 0. Set flip=True
when using a front-facing
camera. Some web browsers, especially Mozilla Firefox, may cause
flickering. If you experience flickering, set use_popup=True
.
NOTE: To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run it on a server, you will not be able to access the webcam. However, you can still perform inference on a video file in the final step.
If you do not have a webcam, you can still run this demo with a video file. Any format supported by OpenCV
USE_WEBCAM = False
cam_id = 0
video_file = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/video/Coco%20Walking%20in%20Berkeley.mp4"
source = cam_id if USE_WEBCAM else video_file
run_style_transfer(source=source, flip=isinstance(source, int), use_popup=False)