Import, Generate and Select Datasets

The second stage of creating a configuration is adding a sample dataset. On the Select Dataset page in the configuration wizard, you can import a dataset, automatically generate a test dataset consisting of Gaussian distributed noise, or select a previously uploaded dataset. Sample datasets must consist of a small sampling of images and be formatted in either ImageNet or Pascal VOC format.

About ImageNet

ImageNet is a well-known dataset used to train classification models. It consists of over 1.2 million images in a 150 GB archive. Your dataset does not need to contain actual ImageNet images, but it does need to adhere to this format when you are working with classification models. To prepare an ImageNet formatted dataset, create an archive consisting of an annotation file and images in the root directory.

About Pascal VOC

Pascal VOC is a well-known dataset used to train object-detection models. These datasets consist of several folders containing annotation files and image indices. Archives containing Pascal VOC dataset formats must be organized as follows:

Imagesets
Segmentation
train.txt
trainval.txt
JPEGImages
1.jpg
2.jpg
SegmentationClass
1.png
2.png
SegmentationObject
1.png
2.png

Select Imported Dataset

select_dataset_01.png

If the dataset you want to use is in the list of imported dataset, select it and click the Use Selected Dataset button. Next, you are directed to the third step — Run Baseline Inference.

New Datasets

If you want to use a new dataset, the tool provides two options: import a new dataset or generate a new one automatically.

Option 1: Import a New Dataset

To import a new dataset, click the Import Dataset button on the Select Dataset page and fill the Import Dataset form:

import_dataset_01.png
Dataset File A .zip or .tar.gz archive that contains the dataset.
Local Path A path to a directory on your disk that contains the dataset. Not needed in this step.
Dataset Name Can be different from the archive name.
Dataset Format Can be ImageNet or VOC Object Detection.

After you have entered all of the required data, click the Import Dataset button to start the import. The process starts and the Status column in the Imported Model table shows the progress bar and the status of the import.

import_dataset_02.png

Select the dataset in the list and click the Use Selected Dataset button. Now you can Run Baseline Inference.

select_dataset_01.png

Option 2: Autogenerate a New Dataset

NOTE: Generated datasets are only in the ImageNet format and suitable only for generic or classification type models.

To generate a new dataset, use the Autogenerate feature. Click the Autogenerate button and specify the dataset parameters in the Generate Dataset form that appears:

generate_dataset_01.png
1 A number of .npy images to generate for the dataset. It can be in the range 1 to 2000
2 Image height and width in pixels
3 A distribution law. Currently, only the Gaussian distribution is supported.

After you have entered all of the required parameters, click the Generate button to generate the dataset. The process has started and the Status column in the Imported Datasets table shows the progress bar and the status of the generation. To cancel the generation process, press the Cancel icon next to the dataset name.

generate_dataset_02.png

Once the generation completes, the dataset shows status as Ready. You can select and use it for project configuration.

To remove an imported dataset from the list, click the Remove icon in the Action column.

Add Local Dataset

Loading big datasets or models can take more time. To import models and datasets that exist on your host faster, you can mount your directory with data to the Docker* container with Workbench. Refer to Mount Folder to Docker Container for more details.

To mount a dataset, follow the steps provided in the Add Local Model section in Select Models.