YOLOv8 Oriented Bounding Boxes Object Detection with OpenVINO™#
This Jupyter notebook can be launched on-line, opening an interactive environment in a browser window.
You can also make a local installation. Choose one of the following options:
Oriented object detection goes a step further than object detection and
introduce an extra angle to locate objects more accurate in an image.
The output of an oriented object detector is a set of rotated bounding
boxes that exactly enclose the objects in the image, along with class
labels and confidence scores for each box. Object detection is a good
choice when you need to identify objects of interest in a scene, but
don’t need to know exactly where the object is or its exact shape.
Generally, PyTorch models represent an instance of the
torch.nn.Module
class, initialized by a state dictionary with model weights. We will use
the YOLOv8 pretrained OBB large model (also known as yolov8l-obbn)
pre-trained on a DOTAv1 dataset, which is available in this
repo. Similar steps are
also applicable to other YOLOv8 models.
YOLOv8-obb is pre-trained on the DOTA dataset. Also, Ultralytics
provides DOTA8 dataset. It is a small, but versatile oriented object
detection dataset composed of the first 8 images of 8 images of the
split DOTAv1 set, 4 for training and 4 for validation. This dataset is
ideal for testing and debugging object detection models, or for
experimenting with new detection approaches. With 8 images, it is small
enough to be easily manageable, yet diverse enough to test training
pipelines for errors and act as a sanity check before training larger
datasets.
The original model repository uses a Validator wrapper, which represents
the accuracy validation pipeline. It creates dataloader and evaluation
metrics and updates metrics on each data batch produced by the
dataloader. Besides that, it is responsible for data preprocessing and
results postprocessing. For class initialization, the configuration
should be provided. We will use the default setup, but it can be
replaced with some parameters overriding to test on custom data. The
model has connected the task_map, which allows to get a validator class
instance.
Dataset 'datasets/dota8.yaml' images not found ⚠️, missing path '/home/ea/work/openvino_notebooks/notebooks/fast-segment-anything/datasets/dota8/images/val'
Downloading ultralytics/yolov5 to '/home/ea/work/openvino_notebooks/notebooks/fast-segment-anything/datasets/dota8.zip'...
YOLOv8 provides API for convenient model exporting to different formats
including OpenVINO IR. model.export is responsible for model
conversion. We need to specify the format, and additionally, we can
preserve dynamic shapes in the model.
NNCF enables
post-training quantization by adding quantization layers into model
graph and then using a subset of the training dataset to initialize the
parameters of these additional quantization layers. Quantized operations
are executed in INT8 instead of FP32/FP16 making model
inference faster.
The optimization process contains the following steps:
Create a calibration dataset for quantization.
Run nncf.quantize() to obtain quantized model.
Save the INT8 model using openvino.save_model() function.
Please select below whether you would like to run quantization to
improve model inference speed.
Create a quantized model from the pre-trained converted OpenVINO model.
NOTE: Quantization is time and memory consuming operation.
Running quantization code below may take some time.
NOTE: We use the tiny DOTA8 dataset as a calibration dataset. It
gives a good enough result for tutorial purpose. For batter results,
use a bigger dataset. Usually 300 examples are enough.
%%skip not $to_quantize.value
if INT8_OV_PATH.exists():
print("Loading quantized model")
quantized_model = core.read_model(INT8_OV_PATH)
else:
ov_model.reshape({0: [1, 3, -1, -1]})
quantized_model = nncf.quantize(
ov_model,
quantization_dataset,
preset=nncf.QuantizationPreset.MIXED,
)
ov.save_model(quantized_model, INT8_OV_PATH)
ov_config = {}
if device.value != "CPU":
quantized_model.reshape({0: [1, 3, 1024, 1024]})
if "GPU" in device.value or ("AUTO" in device.value and "GPU" in core.available_devices):
ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}
model_optimized = core.compile_model(quantized_model, device.value, ov_config)
Output()
Output()
We can reuse the base model pipeline in the same way as for IR model.
%%skip not $to_quantize.value
def infer(*args):
result = model_optimized(args)[0]
return torch.from_numpy(result)
model.predictor.inference = infer
Run inference
%%skip not $to_quantize.value
res = model(example_image_path, device='cpu')
Image.fromarray(res[0].plot()[:, :, ::-1])
You can see that the result is almost the same but it has a small
difference. One small vehicle was recognized as two vehicles. But one
large car was also identified, unlike the original model.
# Inference FP32 model (OpenVINO IR)!benchmark_app-m$OV_MODEL_PATH-d$device.value-apiasync-shape"[1,3,640,640]"
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ WARNING ] Default duration 120 seconds is used for unknown device AUTO
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0
[ INFO ]
[ INFO ] Device info:
[ INFO ] AUTO
[ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(AUTO) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 25.07 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] x (node: x) : f32 / [...] / [?,3,?,?]
[ INFO ] Model outputs:
[ INFO ] *NO_NAME* (node: __module.model.22/aten::cat/Concat_9) : f32 / [...] / [?,20,16..]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[ INFO ] Reshaping model: 'x': [1,3,640,640]
[ INFO ] Reshape model took 10.42 ms
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] x (node: x) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] *NO_NAME* (node: __module.model.22/aten::cat/Concat_9) : f32 / [...] / [1,20,8400]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 645.51 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: Model0
[ INFO ] EXECUTION_DEVICES: ['CPU']
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 12
[ INFO ] MULTI_DEVICE_PRIORITIES: CPU
[ INFO ] CPU:
[ INFO ] AFFINITY: Affinity.CORE
[ INFO ] CPU_DENORMALS_OPTIMIZATION: False
[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 0
[ INFO ] ENABLE_CPU_PINNING: True
[ INFO ] ENABLE_HYPER_THREADING: True
[ INFO ] EXECUTION_DEVICES: ['CPU']
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] INFERENCE_NUM_THREADS: 36
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] KV_CACHE_PRECISION: <Type: 'float16'>
[ INFO ] LOG_LEVEL: Level.NO
[ INFO ] NETWORK_NAME: Model0
[ INFO ] NUM_STREAMS: 12
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 12
[ INFO ] PERFORMANCE_HINT: THROUGHPUT
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] PERF_COUNT: NO
[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM
[ INFO ] LOADED_FROM_CACHE: False
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!
[ INFO ] Fill input 'x' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 12 inference requests, limits: 120000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 362.70 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['CPU']
[ INFO ] Count: 1620 iterations
[ INFO ] Duration: 121527.01 ms
[ INFO ] Latency:
[ INFO ] Median: 884.92 ms
[ INFO ] Average: 897.13 ms
[ INFO ] Min: 599.38 ms
[ INFO ] Max: 1131.46 ms
[ INFO ] Throughput: 13.33 FPS
ifINT8_OV_PATH.exists():# Inference INT8 model (Quantized model)!benchmark_app-m$INT8_OV_PATH-d$device.value-apiasync-shape"[1,3,640,640]"-t15
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0
[ INFO ]
[ INFO ] Device info:
[ INFO ] AUTO
[ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(AUTO) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 46.47 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] x (node: x) : f32 / [...] / [?,3,?,?]
[ INFO ] Model outputs:
[ INFO ] *NO_NAME* (node: __module.model.22/aten::cat/Concat_9) : f32 / [...] / [?,20,16..]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[ INFO ] Reshaping model: 'x': [1,3,640,640]
[ INFO ] Reshape model took 20.10 ms
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] x (node: x) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] *NO_NAME* (node: __module.model.22/aten::cat/Concat_9) : f32 / [...] / [1,20,8400]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 1201.42 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: Model0
[ INFO ] EXECUTION_DEVICES: ['CPU']
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 12
[ INFO ] MULTI_DEVICE_PRIORITIES: CPU
[ INFO ] CPU:
[ INFO ] AFFINITY: Affinity.CORE
[ INFO ] CPU_DENORMALS_OPTIMIZATION: False
[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 0
[ INFO ] ENABLE_CPU_PINNING: True
[ INFO ] ENABLE_HYPER_THREADING: True
[ INFO ] EXECUTION_DEVICES: ['CPU']
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] INFERENCE_NUM_THREADS: 36
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] KV_CACHE_PRECISION: <Type: 'float16'>
[ INFO ] LOG_LEVEL: Level.NO
[ INFO ] NETWORK_NAME: Model0
[ INFO ] NUM_STREAMS: 12
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 12
[ INFO ] PERFORMANCE_HINT: THROUGHPUT
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] PERF_COUNT: NO
[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM
[ INFO ] LOADED_FROM_CACHE: False
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'x'!. This input will be filled with random values!
[ INFO ] Fill input 'x' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 12 inference requests, limits: 15000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 124.20 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['CPU']
[ INFO ] Count: 708 iterations
[ INFO ] Duration: 15216.46 ms
[ INFO ] Latency:
[ INFO ] Median: 252.23 ms
[ INFO ] Average: 255.76 ms
[ INFO ] Min: 176.97 ms
[ INFO ] Max: 344.41 ms
[ INFO ] Throughput: 46.53 FPS