Benchmark API Reference#
InferenceLatencyBenchmark#
InferenceLatencyBenchmark(
max_iters: int | None = 1000,
warmup_iters: int = 1,
max_duration: int | None = 60000,
)
Measures per-chunk latency of an InferenceModel. The measured loop stops at whichever bound is reached first: max_iters, max_duration (milliseconds), or input exhaustion. Pass None to disable a bound.
run#
metrics = benchmark.run(model, inputs=None)
inputs is an iterable of observation dicts compatible with model. When None, random inputs are generated from model.input_features
specifications; this requires the exported package to declare input features.
Runs warmup_iters warmup iterations followed by the measured loop and returns a dict of per-iteration seconds:
Key |
Meaning |
|---|---|
|
Mean per-iteration time during warmup. |
|
Number of measured iterations. |
|
Fastest measured iteration. |
|
Slowest measured iteration. |
|
Mean measured iteration. |
|
Median measured iteration. |
|
Population standard deviation (0.0 if n == 1). |