Frame
interpolation is
the process of synthesizing in-between images from a given set of
images. The technique is often used for temporal
up-sampling
to increase the refresh rate of videos or to create slow motion effects.
Nowadays, with digital cameras and smartphones, we often take several
photos within a few seconds to capture the best picture. Interpolating
between these “near-duplicate” photos can lead to engaging videos that
reveal scene motion, often delivering an even more pleasing sense of the
moment than the original photos.
In “FILM: Frame Interpolation for Large
Motion”, published at ECCV
2022, a method to create high quality slow-motion videos from
near-duplicate photos is presented. FILM is a new neural network
architecture that achieves state-of-the-art results in large motion,
while also handling smaller motions well.
The FILM model takes two images as input and outputs a middle image. At
inference time, the model is recursively invoked to output in-between
images. FILM has three components: 1. Feature extractor that summarizes
each input image with deep multi-scale (pyramid) features; 2.
Bi-directional motion estimator that computes pixel-wise motion (i.e.,
flows) at each pyramid level; 3. Fusion module that outputs the final
interpolated image.
FILM is trained on regular video frame triplets, with the middle frame
serving as the ground-truth for supervision.
In this tutorial, we will use TensorFlow Hub as
a model source.
NOTE: To run this tutorial, your system is required to have a VP9
video encoder. Ubuntu has it preinstalled, but for Windows, you
should install it manually.
2023-11-02 11:23:42.519606: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-11-02 11:23:42.521340: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-02 11:23:42.549839: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-02 11:23:42.549860: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-02 11:23:42.549882: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-02 11:23:42.555392: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-02 11:23:42.556206: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-02 11:23:43.247021: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
MODEL_PATH=Path("models/model.xml")DATA_PATH=Path("data")IMAGES={"https://raw.githubusercontent.com/google-research/frame-interpolation/main/photos/one.png":Path("data/one.png"),"https://raw.githubusercontent.com/google-research/frame-interpolation/main/photos/two.png":Path("data/two.png")}OUTPUT_VIDEO_PATH=DATA_PATH/"output.webm"OV_OUTPUT_VIDEO_PATH=DATA_PATH/"ov_output.webm"TIMES_TO_INTERPOLATE=5DATA_PATH.mkdir(parents=True,exist_ok=True)PIL.ImageFile.LOAD_TRUNCATED_IMAGES=True# allows Gradio to read PNG images with large metadata
Model is loaded using tensorflow_hub.KerasLayer function. Then, we
specify shapes of input tensors to cast loaded object to
tf.keras.Model class.
Input tensors are: - time - value between that says
where the generated image should be. is midway between the
input images. - x0 - initial frame. - x1 - final frame.
The process will take as input 2 original frames (first and last) and
generate a midpoint frame. Then, it will repeat itself for pairs “first
- midpoint”, “midpoint - last” to provide midpoints for them, and so on.
Recursion is executed times_to_interpolate times
generating images.
classInterpolator:def__init__(self,model):self._model=modeldef_recursive_generator(self,frame1:np.ndarray,frame2:np.ndarray,num_recursions:int,bar:Optional[tqdm]=None,)->Generator[np.ndarray,None,None]:"""Splits halfway to repeatedly generate more frames. Args: frame1: Input image 1. frame2: Input image 2. num_recursions: How many times to interpolate the consecutive image pairs. Yields: The interpolated frames, including the first frame (frame1), but excluding the final frame2. """ifnum_recursions==0:yieldframe1else:time=np.array([[0.5]],dtype=np.float32)mid_frame=self._model({"x0":frame1,"x1":frame2,"time":time})["image"]ifbarisnotNone:bar.update(1)yield fromself._recursive_generator(frame1,mid_frame,num_recursions-1,bar)yield fromself._recursive_generator(mid_frame,frame2,num_recursions-1,bar)definterpolate_recursively(self,frame1:np.ndarray,frame2:np.ndarray,times_to_interpolate:int)->Generator[np.ndarray,None,None]:"""Generates interpolated frames by repeatedly interpolating the midpoint. Args: frame1: Input image 1. frame2: Input image 2. times_to_interpolate: Number of times to do recursive midpoint interpolation. Yields: The interpolated frames (including the inputs). """num_frames=2**(times_to_interpolate)-1bar=tqdm(total=num_frames)yield fromself._recursive_generator(frame1,frame2,times_to_interpolate,bar)# Separately yield the final frame.yieldframe2
To convert a TensorFlow Keras Model to OpenVINO Intermediate
Representation (IR), call the openvino.convert_model() function and
pass the model as the only argument. You can then serialize the model
object to disk using the openvino.save_model() function.
defgenerate(frame1,frame2,times_to_interpolate,_=gr.Progress(track_tqdm=True)):x0,x1=[preprocess_np_frame(frame)forframein[frame1,frame2]]frames=ov_interpolator.interpolate_recursively(x0,x1,times_to_interpolate)height,width=frame1.shape[:2]filename=DATA_PATH/f"output_{datetime.now().isoformat()}.webm"save_as_video(frames,width,height,filename)returnfilenamedemo=gr.Interface(generate,[gr.Image(label="First image"),gr.Image(label="Last image"),gr.Slider(1,8,step=1,label="Times to interpolate",info="""Controls the number of times the frame interpolator is invoked. The output will be the interpolation video with (2^value + 1) frames, fps of 30.""")],gr.Video(),examples=[[*IMAGES.values(),5]],allow_flagging="never")try:demo.queue().launch(debug=False)exceptException:demo.queue().launch(share=True,debug=False)# if you are launching remotely, specify server_name and server_port# demo.launch(server_name='your server name', server_port='server port in int')# Read more in the docs: https://gradio.app/docs/