Run Range of Inferences

DL Workbench provides a graphical interface to find the optimal configuration of Batch/Parallel requests on a certain machine. To learn more about optimal configurations on specific hardware, refer to Deploy and Integrate Performance Criteria into Application.

Select a model and a dataset and click Run Inference. The Project page appears.

run_single_inference_01.png

To run a range of inference streams, place check marks in the boxes under the Use Ranges section. Specify minimum and maximum numbers of inferences per image and batches, as well as a number of steps to increment on parallel requests or on a batch. Then click Execute.

range_of_inferences.png

A step is the increment of parallel inference streams used for testing. For example, if the stream is set for 1-5, with step at 2, the inferences will run for 1, 3 and 5 parallel streams. DL Workbench executes every combination of Batch/Inference values, starting from minimum to maximum with the specified step.

A graph in the Inference Results section shows points representing each inference with a certain batch/parallel request configuration.

inference_results_01.png

Right under the graph, you can specify maximum latency to find the optimal configuration with the best throughput. The point of this configuration turns pink.

inference_results_02.png

To view information about latency, throughput, batch, and parallel requests of a specific job, hover your mouse over the corresponding point on the graph.

inference_results_03.png