Download and Cut Datasets

To download original ImageNet, Pascal Visual Object Classes (VOC), and Common Objects in Context (COCO) datasets, follow the instructions for each dataset type below. These datasets are considerably big in size. If you want to save time when loading original datasets into the DL Workbench, cut them as described in the following sections.

To learn more about dataset types supported by the DL Workbench and their structure, refer to Dataset Types.

ImageNet Dataset

Download ImageNet Dataset

To download images from ImageNet, you need to have an account and agree to the Terms of Access. Follow the steps below:

  1. Go to the ImageNet homepage:
    imagenet_register_01-b.png
  2. If you have an account, click Login. Otherwise, click Signup in the right upper corner, provide your data, and wait for a confirmation email:
    imagenet_register_01-m-b.png
  3. Once you receive the confirmation email and log in, go to the Download page:
    imagenet_download_00-m-b.png
  4. Select Download Original Images:
    imagenet_download_01-m-b.png
  5. This redirects you to the Terms of Access page. If you agree to the Terms, continue by clicking Agree and Sign:
    imagenet_terms_of_access_02-m-b.png
  6. Click one of the links in the Download as one tar file section to select it:
    imagenet_download_02-b.png
  7. Save it to the directory with the name provided below:
  1. Download the archive with annotations.
  2. Unarchive both imagenet.zip and caffe_ilsvrc12.tar.gz. Place the val.txt file from caffe_ilsvrc12 inside the imagenet folder.
  3. Zip the contents of the imagenet folder. The final imagenet.zip archive must follow the structure below:
    |-- imagenet.zip
    |-- val.txt
    |-- 0001.jpg
    |-- 0002.jpg
    |...
    |-- n.jpg

Cut ImageNet Dataset

Save the script to cut datatsets to the following directory:

This command runs the script with the following arguments:

Parameter Explanation
--source_archive_dir Full path to a downloaded archive
--output_size=20 Number of images to be left in a smaller dataset
--output_archive_dir Full directory to the smaller dataset, excluding the name
--dataset_typeType of the source dataset
--first_imageOptional. The index of the image to start cutting from. Specify if you want to split your dataset into training and validation subsets. The default value is 0.

Pascal Visual Object Classes (VOC) Dataset

Download Pascal VOC Dataset

To download test data from Pascal VOC, you need to have an account. Follow the steps below:

NOTE: Due to structure inconsistency observed within Pascal VOC test datasets, optimization and accuracy measurement are not available for them. Use validation datasets instead. The instructions below demonstrate how to download them.

  1. Go to the PASCAL Visual Object Classes Homepage:
    voc_homepage-b.png
  2. Click PASCAL VOC Evaluation Server under the Pascal VOC data sets heading:
    voc_evaluation_server_01-m-b.png
  3. If you have an account, click Login in the left upper corner. Otherwise, click Registration, provide your data, and wait for a confirmation email:
    voc_login_register-m-b.png
  4. Once you receive the confirmation email and log in, go to the Pascal VOC Challenges 2005-2012:
    voc_download_01-b.png
  5. Select a challenge. For example, The VOC2008 Challenge. On the challenge page, go to the Development Kit section:
    voc_download_02-b.png
  6. Save the training/validation_data file to the directory and with the name provided below:

Cut Pascal VOC Dataset

Save the script to cut datatsets to the following directory:

Follow instructions for your operating system.

This command runs the script with the following arguments:

Parameter Explanation
--source_archive_dir Full path to a downloaded archive
--output_size=20 Number of images to be left in a smaller dataset
--output_archive_dir Full directory to the smaller dataset, excluding the name
--dataset_typeType of the source dataset
--first_imageOptional. The index of the image to start cutting from. Specify if you want to split your dataset into training and validation subsets. The default value is 0.

Common Objects in Context (COCO) Dataset

Download COCO Dataset

To use a dataset from the COCO website, download annotations and images archives separately. Choose one of the options:

NOTE: Download them to the directory and with the name as follows:

  • Linux*, macOS*: (Replace <user> with your username)
    /home/<user>/Work/coco_images.zip
    /home/<user>/Work/coco_annotations_.zip
  • Windows*:
    C:\Work\coco_images.zip
    C:\Work\coco_annotations_.zip

Cut COCO Dataset

Save the script to cut datatsets to the following directory:

This command runs the script with the following arguments:

Parameter Explanation
--source_images_archive_dir Full path to the downloaded archive with images, including the name
--source_annotations_archive_dir Full path to the downloaded archive with annotations, including the name
--output_size Number of images to be left in a smaller dataset
--output_archive_dir Full directory to the smaller dataset excluding the name
--dataset_typeType of the source dataset
--first_imageOptional. The number of the image to start cutting from. Specify if you want to split your dataset into training and validation subsets. The default value is 0.