Run Range of Inferences

DL Workbench provides a graphical interface to find the optimal configuration of Batch/Parallel requests on a certain machine. To learn more about optimal configurations on specific hardware, refer to Deploy and Integrate Performance Criteria into Application.

Select a model and a dataset, then click Run Inference. The Project page appears.

run_single_inference_01-b.png

To run a range of inference streams, place check marks in the boxes under the Use Ranges section. Specify minimum and maximum numbers of inferences per an image and a batch, as well as the number of steps to increment on parallel requests or on a batch. Click Execute:

range_of_inferences-b.png

A step is the increment of parallel inference streams used for testing. For example, if the stream is set for 1-5, with step at 2, the inferences run for 1, 3 and 5 parallel streams. DL Workbench executes every combination of Batch/Inference values from minimum to maximum with the specified step.

The graph in the Inference Results section shows points that represent each inference with a certain batch/parallel request configuration.

inference_results_01-b.png

Right under the graph, you can specify maximum latency to find the optimal configuration with the best throughput. The point corresponding to this configuration turns pink.

inference_results_02-b.png

To view information about latency, throughput, batch, and parallel requests of a specific job, hover your cursor over the corresponding point on the graph.

inference_results_03-b.png