To download original ImageNet, Pascal Visual Object Classes (VOC), and Common Objects in Context (COCO) datasets, follow the instructions for each dataset type below. These datasets are considerably big in size. If you want to save time when loading original datasets into the DL Workbench, cut them as described in the following sections.
To learn more about dataset types supported by the DL Workbench and their structure, refer to Dataset Types.
To download images from ImageNet, you need to have an account and agree to the Terms of Access. Follow the steps below:
NOTE: Replace
<user>
with your username.
imagenet.zip
and caffe_ilsvrc12.tar.gz
. Place the val.txt
file from caffe_ilsvrc12
inside the imagenet
folder.imagenet
folder. The final imagenet.zip
archive must follow the structure below: Save the script to cut datatsets to the following directory:
NOTE: Replace
<user>
with your username.
NOTE: Replace
<user>
with your username.
python C:\Work\cut_dataset.py `
--source_archive_dir=C:\Work\imagenet.zip `
--output_size=20 `
--output_archive_dir=C:\Work\subsets `
--dataset_type=imagenet `
--first_image=10
This command runs the script with the following arguments:
Parameter | Explanation |
---|---|
--source_archive_dir | Full path to a downloaded archive |
--output_size=20 | Number of images to be left in a smaller dataset |
--output_archive_dir | Full directory to the smaller dataset, excluding the name |
--dataset_type | Type of the source dataset |
--first_image | Optional. The index of the image to start cutting from. Specify if you want to split your dataset into training and validation subsets. The default value is 0. |
To download test data from Pascal VOC, you need to have an account. Follow the steps below:
NOTE: Due to structure inconsistency observed within Pascal VOC test datasets, optimization and accuracy measurement are not available for them. Use validation datasets instead. The instructions below demonstrate how to download them.
NOTE: Replace
<user>
with your username.
Save the script to cut datatsets to the following directory:
NOTE: Replace
<user>
with your username.
Follow instructions for your operating system.
NOTE: Replace
<user>
with your username.
python C:\Work\cut_dataset.py `
--source_archive_dir=C:\Work\voc.tar.gz `
--output_size=20 `
--output_archive_dir=C:\Work\subsets `
--dataset_type=voc `
--first_image=10
This command runs the script with the following arguments:
Parameter | Explanation |
---|---|
--source_archive_dir | Full path to a downloaded archive |
--output_size=20 | Number of images to be left in a smaller dataset |
--output_archive_dir | Full directory to the smaller dataset, excluding the name |
--dataset_type | Type of the source dataset |
--first_image | Optional. The index of the image to start cutting from. Specify if you want to split your dataset into training and validation subsets. The default value is 0. |
To use a dataset from the COCO website, download annotations and images archives separately. Choose one of the options:
NOTE: Download them to the directory and with the name as follows:
- Linux*, macOS*: (Replace
<user>
with your username)/home/<user>/Work/coco_images.zip/home/<user>/Work/coco_annotations_.zip- Windows*:
C:\Work\coco_images.zipC:\Work\coco_annotations_.zip
Save the script to cut datatsets to the following directory:
NOTE: Replace
<user>
with your username.
NOTE: Replace
<user>
with your username.
<"user">
with your username.
python C:\Work\cut_dataset.py `
--source_images_archive_dir=C:\Work\coco_images.zip `
--source_annotations_archive_dir=C:\Work\coco_annotations_.zip `
--output_size=20 `
--output_archive_dir=C:\Work\subsets `
--first_image=10
This command runs the script with the following arguments:
Parameter | Explanation |
---|---|
--source_images_archive_dir | Full path to the downloaded archive with images, including the name |
--source_annotations_archive_dir | Full path to the downloaded archive with annotations, including the name |
--output_size | Number of images to be left in a smaller dataset |
--output_archive_dir | Full directory to the smaller dataset excluding the name |
--dataset_type | Type of the source dataset |
--first_image | Optional. The number of the image to start cutting from. Specify if you want to split your dataset into training and validation subsets. The default value is 0. |