DL Workbench provides a graphical interface to find the optimal configuration of Batch/Parallel requests on a certain machine. To learn more about optimal configurations on specific hardware, refer to Deploy and Integrate Performance Criteria into Application.
Select a model and a dataset, then click Run Inference. The Project page appears.
To run a range of inference streams, place check marks in the boxes under the Use Ranges section. Specify minimum and maximum numbers of inferences per an image and a batch, as well as the number of steps to increment on parallel requests or on a batch. Click Execute:
A step is the increment of parallel inference streams used for testing. For example, if the stream is set for 1-5, with step at 2, the inferences run for 1, 3 and 5 parallel streams. DL Workbench executes every combination of Batch/Inference values from minimum to maximum with the specified step.
The graph in the Inference Results section shows points that represent each inference with a certain batch/parallel request configuration.
Right under the graph, you can specify maximum latency to find the optimal configuration with the best throughput
. The point corresponding to this configuration turns pink.
To view information about latency, throughput, batch, and parallel requests of a specific job, hover your cursor over the corresponding point on the graph.