runParallelAsync method
Runs multiple inference operations in parallel using separate isolates. Each inference runs in its own isolate, allowing true parallel execution. All isolates are automatically cleaned up after completion. Returns a list of results in the same order as the input list.
Performance Characteristics vs runAsync():
Option 1: Multiple calls to runAsync() (single persistent isolate)
for (var input in inputs) {
await session.runAsync(runOptions, input); // Sequential
}
- ✅ Pros: No isolate creation overhead, memory efficient
- ❌ Cons: Inferences run SEQUENTIALLY (not parallel!)
- Speed: Slower for batch processing (no parallelism)
- Use when: Processing a stream of requests over time
Option 2: Multiple runOnceAsync() or runParallelAsync() (multiple isolates)
await session.runParallelAsync(inputs, runOptions); // Parallel
- ✅ Pros: TRUE PARALLEL execution (if ONNX session supports it)
- ❌ Cons: Isolate creation overhead (~1-2ms per isolate)
- Speed: Faster for batch processing (parallel execution)
- Use when: Processing multiple inputs at once
Critical Factor: Does ONNX Runtime support concurrent calls?
- If the native ONNX session is thread-safe: Multiple isolates = parallel execution
- If not thread-safe: Multiple isolates will serialize at the native level anyway
- Most ONNX Runtime sessions ARE thread-safe by default
Optimal Thread/Isolate Configuration:
- CPU Cores = 8, Batch Size = 4:
- Option A: 4 isolates × 2 intra-op threads each = utilize all 8 cores
- Option B: 1 isolate × 8 intra-op threads = utilize all 8 cores for each inference
- Option A is better for batch, Option B is better for single large inference
Recommendation:
- Single inference: Use runAsync() with high intra-op threads
- Batch inference: Use runParallelAsync() with lower threads per isolate
- Stream of requests: Use persistent runAsync() to avoid isolate overhead
Implementation
Future<List<List<OrtValue?>>> runParallelAsync(
List<Map<String, OrtValue>> inputsList,
OrtRunOptions runOptions,
[List<String>? outputNames,
Duration timeout = const Duration(seconds: 5)]) async {
// Create a list of futures for parallel execution
final futures = <Future<List<OrtValue?>>>[];
for (final inputs in inputsList) {
// Each inference gets its own isolate for true parallelism
futures.add(runOnceAsyncWithTimeout(
runOptions,
inputs,
timeout,
outputNames,
));
}
// Wait for all inferences to complete in parallel
return await Future.wait(futures);
}