Run Range of Inferences

DL Workbench provides a graphical interface to find the optimal configuration of Batch/Parallel requests on a certain machine. To learn more about optimal configurations on specific hardware, refer to Deploy and Integrate Performance Criteria into Application.

Select a model and a dataset and click Run Inference. The Project page appears.

run_single_inference_01-b.png

To run a range of inference streams, place check marks in the boxes under the Use Ranges section. Specify minimum and maximum numbers of inferences per image and batches, as well as the number of steps to increment on parallel requests or on a batch. Then click Execute:

range_of_inferences-b.png

A step is the increment of parallel inference streams used for testing. For example, if the stream is set for 1-5, with step at 2, the inferences will run for 1, 3 and 5 parallel streams. DL Workbench executes every combination of Batch/Inference values from minimum to maximum with the specified step.

The graph in the Inference Results section shows points representing each inference with a certain batch/parallel request configuration.

inference_results_01-b.png

Right under the graph, you can specify maximum latency to find the optimal configuration with the best throughput. The point of this configuration turns pink.

inference_results_02-b.png

To view information about latency, throughput, batch, and parallel requests of a specific job, hover your cursor over the corresponding point on the graph.

inference_results_03-b.png