Style Transfer with OpenVINO™

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS.

Github

This notebook demonstrates style transfer with OpenVINO, using the Style Transfer Models from ONNX Model Repository. Specifically, Fast Neural Style Transfer model, which is designed to mix the content of an image with the style of another image.

style transfer

style transfer

This notebook uses five pre-trained models, for the following styles: Mosaic, Rain Princess, Candy, Udnie and Pointilism. The models are from ONNX Model Repository and are based on the research paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution along with Instance Normalization. Final part of this notebook shows live inference results from a webcam. Additionally, you can also upload a video file.

NOTE: If you have a webcam on your computer, you can see live results streaming in the notebook. If you run the notebook on a server, the webcam will not work but you can run inference, using a video file.

Imports

import collections
import sys
import time

import cv2
import numpy as np
from pathlib import Path
from IPython import display
from ipywidgets import interactive, ToggleButtons
from openvino.runtime import Core

sys.path.append("../utils")
import notebook_utils as utils

Select one of the styles below: Mosaic, Rain Princess, Candy, Udnie, and Pointilism to do the style transfer.

# Option to select different styles
styleButtons = ToggleButtons(
    options=['MOSAIC', 'RAIN-PRINCESS', 'CANDY', 'UDNIE', 'POINTILISM'],
    description="Click one of the styles you want to use for the style transfer",
    disabled=False,
    style={'description_width': '300px'})

interactive(lambda option: print(option), option=styleButtons)
interactive(children=(ToggleButtons(description='Click one of the styles you want to use for the style transfe…

The Model

Download the Model

The style transfer model, selected in the previous step, will be downloaded to model_path if you have not already downloaded it. The models are provided by the ONNX Model Zoo in .onnx format, which means it could be used with OpenVINO directly. However, this notebook will also show how you can use the Model Optimizer to convert ONNX to OpenVINO Intermediate Representation (IR) with FP16 precision.

# Directory to download the model from ONNX model zoo
base_model_dir = "model"
base_url = "https://github.com/onnx/models/raw/main/vision/style_transfer/fast_neural_style/model"

# Selected ONNX model will be downloaded in the path
model_path = Path(f"{styleButtons.value.lower()}-9.onnx")

style_url = f"{base_url}/{model_path}"
utils.download_file(style_url, directory=base_model_dir)
model/mosaic-9.onnx:   0%|          | 0.00/6.42M [00:00<?, ?B/s]
PosixPath('/opt/home/k8sworker/cibuilds/ov-notebook/OVNotebookOps-408/.workspace/scm/ov-notebook/notebooks/404-style-transfer-webcam/model/mosaic-9.onnx')

Convert ONNX Model to OpenVINO IR Format

In the next step, you will convert the ONNX model to OpenVINO IR format with FP16 precision. While ONNX models are directly supported by OpenVINO runtime, it can be useful to convert them to IR format to take advantage of OpenVINO optimization tools and features. The mo.convert_model python function can be used for converting model, using OpenVINO Model Optimizer. The converted model is saved to the model directory. The function returns instance of OpenVINO Model class, which is ready to use in Python interface but can also be serialized to OpenVINO IR format for future execution. If the model has been already converted, you can skip this step.

# Construct the command for Model Optimizer.
from openvino.runtime import serialize
from openvino.tools import mo

ov_model = mo.convert_model(f"model/{styleButtons.value.lower()}-9.onnx", compress_to_fp16=True)
serialize(ov_model, f"model/{styleButtons.value.lower()}-9.xml")
# Converted IR model path
ir_path = Path(f"model/{styleButtons.value.lower()}-9.xml")
onnx_path = Path(f"model/{model_path}")

Load the Model

Both the ONNX model(s) and converted IR model(s) are stored in the model directory.

Only a few lines of code are required to run the model. First, initialize OpenVINO Runtime. Then, read the network architecture and model weights from the .bin and .xml files to compile for the desired device. If you select GPU you may need to wait briefly for it to load, as the startup time is somewhat longer than CPU.

To let OpenVINO automatically select the best device for inference just use AUTO. In most cases, the best device to use is GPU (better performance, but slightly longer startup time).

OpenVINO Runtime can load ONNX models from ONNX Model Repository directly. In such cases, use ONNX path instead of IR model to load the model. It is recommended to load the OpenVINO Intermediate Representation (IR) model for the best results.

# Initialize OpenVINO Runtime.
ie_core = Core()

# Read the network and corresponding weights from ONNX Model.
# model = ie_core.read_model(model=onnx_path)

# Read the network and corresponding weights from IR Model.
model = ie_core.read_model(model=ir_path)

# Compile the model for CPU (or change to GPU, MYRIAD etc. for other devices)
# or let OpenVINO select the best available device with AUTO.
compiled_model = ie_core.compile_model(model=model, device_name="AUTO")

# Get the input and output nodes.
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

Input and output layers have the names of the input node and output node respectively. For fast-neural-style-mosaic-onnx, there is 1 input and 1 output with the (1, 3, 224, 224) shape.

print(input_layer.any_name, output_layer.any_name)
print(input_layer.shape)
print(output_layer.shape)

# Get the input size.
N, C, H, W = list(input_layer.shape)
input1 output1
[1,3,224,224]
[1,3,224,224]

Preprocess the image

Preprocess the input image before running the model. Prepare the dimensions and channel order for the image to match the original image with the input tensor

  1. Preprocess a frame to convert from unit8 to float32.

  2. Transpose the array to match with the network input size

# Preprocess the input image.
def preprocess_images(frame, H, W):
    """
    Preprocess input image to align with network size

    Parameters:
        :param frame:  input frame
        :param H:  height of the frame to style transfer model
        :param W:  width of the frame to style transfer model
        :returns: resized and transposed frame
    """
    image = np.array(frame).astype('float32')
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    image = cv2.resize(src=image, dsize=(H, W), interpolation=cv2.INTER_AREA)
    image = np.transpose(image, [2, 0, 1])
    image = np.expand_dims(image, axis=0)
    return image

Helper function to postprocess the stylized image

The converted IR model outputs a NumPy float32 array of the (1, 3, 224, 224) shape .

# Postprocess the result
def convert_result_to_image(frame, stylized_image) -> np.ndarray:
    """
    Postprocess stylized image for visualization

    Parameters:
        :param frame:  input frame
        :param stylized_image:  stylized image with specific style applied
        :returns: resized stylized image for visualization
    """
    h, w = frame.shape[:2]
    stylized_image = stylized_image.squeeze().transpose(1, 2, 0)
    stylized_image = cv2.resize(src=stylized_image, dsize=(w, h), interpolation=cv2.INTER_CUBIC)
    stylized_image = np.clip(stylized_image, 0, 255).astype(np.uint8)
    stylized_image = cv2.cvtColor(stylized_image, cv2.COLOR_BGR2RGB)
    return stylized_image

Main Processing Function

The style transfer function can be run in different operating modes, either using a webcam or a video file.

def run_style_transfer(source=0, flip=False, use_popup=False, skip_first_frames=0):
    """
    Main function to run the style inference:
    1. Create a video player to play with target fps (utils.VideoPlayer).
    2. Prepare a set of frames for style transfer.
    3. Run AI inference for style transfer.
    4. Visualize the results.
    Parameters:
        source: The webcam number to feed the video stream with primary webcam set to "0", or the video path.
        flip: To be used by VideoPlayer function for flipping capture image.
        use_popup: False for showing encoded frames over this notebook, True for creating a popup window.
        skip_first_frames: Number of frames to skip at the beginning of the video.
    """
    # Create a video player to play with target fps.
    player = None
    try:
        player = utils.VideoPlayer(source=source, flip=flip, fps=30, skip_first_frames=skip_first_frames)
        # Start video capturing.
        player.start()
        if use_popup:
            title = "Press ESC to Exit"
            cv2.namedWindow(winname=title, flags=cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)

        processing_times = collections.deque()
        while True:
            # Grab the frame.
            frame = player.next()
            if frame is None:
                print("Source ended")
                break
            # If the frame is larger than full HD, reduce size to improve the performance.
            scale = 720 / max(frame.shape)
            if scale < 1:
                frame = cv2.resize(src=frame, dsize=None, fx=scale, fy=scale,
                                   interpolation=cv2.INTER_AREA)
            # Preprocess the input image.

            image = preprocess_images(frame, H, W)

            # Measure processing time for the input image.
            start_time = time.time()
            # Perform the inference step.
            stylized_image = compiled_model([image])[output_layer]
            stop_time = time.time()

            # Postprocessing for stylized image.
            result_image = convert_result_to_image(frame, stylized_image)

            processing_times.append(stop_time - start_time)
            # Use processing times from last 200 frames.
            if len(processing_times) > 200:
                processing_times.popleft()
            processing_time_det = np.mean(processing_times) * 1000

            # Visualize the results.
            f_height, f_width = frame.shape[:2]
            fps = 1000 / processing_time_det
            cv2.putText(result_image, text=f"Inference time: {processing_time_det:.1f}ms ({fps:.1f} FPS)",
                        org=(20, 40),fontFace=cv2.FONT_HERSHEY_COMPLEX, fontScale=f_width / 1000,
                        color=(0, 0, 255), thickness=1, lineType=cv2.LINE_AA)

            # Use this workaround if there is flickering.
            if use_popup:
                cv2.imshow(title, result_image)
                key = cv2.waitKey(1)
                # escape = 27
                if key == 27:
                    break
            else:
                # Encode numpy array to jpg.
                _, encoded_img = cv2.imencode(".jpg", result_image, params=[cv2.IMWRITE_JPEG_QUALITY, 90])
                # Create an IPython image.
                i = display.Image(data=encoded_img)
                # Display the image in this notebook.
                display.clear_output(wait=True)
                display.display(i)
    # ctrl-c
    except KeyboardInterrupt:
        print("Interrupted")
    # any different error
    except RuntimeError as e:
        print(e)
    finally:
        if player is not None:
            # Stop capturing.
            player.stop()
        if use_popup:
            cv2.destroyAllWindows()

Run Style Transfer Using a Webcam

Now, try to apply the style transfer model using video from your webcam. By default, the primary webcam is set with source=0. If you have multiple webcams, each one will be assigned a consecutive number starting at 0. Set flip=True when using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set use_popup=True.

NOTE: To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run it on a server, you will not be able to access the webcam. However, you can still perform inference on a video file in the final step.

run_style_transfer(source=0, flip=True, use_popup=False)
Cannot open camera 0
[ WARN:0@5.213] global cap_v4l.cpp:982 open VIDEOIO(V4L2:/dev/video0): can't open camera by index
[ERROR:0@5.214] global obsensor_uvc_stream_channel.cpp:156 getStreamChannelGroup Camera index out of range

Run Style Transfer on a Video File

You can find out how the model works with a video file. For that, use any formats supported by OpenCV. You can press the stop button to terminate anytime while the video file is running.

NOTE: Sometimes, the video will be cut off when frames are corrupted. If this happens, or you experience any other problems with your video, use the HandBrake encoder tool to create a video file in MPEG format.

video_file = "../data/video/Coco Walking in Berkeley.mp4"
run_style_transfer(source=video_file, flip=False, use_popup=False)
../_images/404-style-transfer-with-output_23_0.png
Source ended