runParallelAsync method

Future<List<List<OrtValue?>>> runParallelAsync(
  1. List<Map<String, OrtValue>> inputsList,
  2. OrtRunOptions runOptions, [
  3. List<String>? outputNames,
  4. Duration timeout = const Duration(seconds: 5),
])

Runs multiple inference operations in parallel using separate isolates. Each inference runs in its own isolate, allowing true parallel execution. All isolates are automatically cleaned up after completion. Returns a list of results in the same order as the input list.

Performance Characteristics vs runAsync():

Option 1: Multiple calls to runAsync() (single persistent isolate)

for (var input in inputs) {
  await session.runAsync(runOptions, input); // Sequential
}
  • Pros: No isolate creation overhead, memory efficient
  • Cons: Inferences run SEQUENTIALLY (not parallel!)
  • Speed: Slower for batch processing (no parallelism)
  • Use when: Processing a stream of requests over time

Option 2: Multiple runOnceAsync() or runParallelAsync() (multiple isolates)

await session.runParallelAsync(inputs, runOptions); // Parallel
  • Pros: TRUE PARALLEL execution (if ONNX session supports it)
  • Cons: Isolate creation overhead (~1-2ms per isolate)
  • Speed: Faster for batch processing (parallel execution)
  • Use when: Processing multiple inputs at once

Critical Factor: Does ONNX Runtime support concurrent calls?

  • If the native ONNX session is thread-safe: Multiple isolates = parallel execution
  • If not thread-safe: Multiple isolates will serialize at the native level anyway
  • Most ONNX Runtime sessions ARE thread-safe by default

Optimal Thread/Isolate Configuration:

  • CPU Cores = 8, Batch Size = 4:
    • Option A: 4 isolates × 2 intra-op threads each = utilize all 8 cores
    • Option B: 1 isolate × 8 intra-op threads = utilize all 8 cores for each inference
    • Option A is better for batch, Option B is better for single large inference

Recommendation:

  • Single inference: Use runAsync() with high intra-op threads
  • Batch inference: Use runParallelAsync() with lower threads per isolate
  • Stream of requests: Use persistent runAsync() to avoid isolate overhead

Implementation

Future<List<List<OrtValue?>>> runParallelAsync(
    List<Map<String, OrtValue>> inputsList,
    OrtRunOptions runOptions,
    [List<String>? outputNames,
    Duration timeout = const Duration(seconds: 5)]) async {
  // Create a list of futures for parallel execution
  final futures = <Future<List<OrtValue?>>>[];

  for (final inputs in inputsList) {
    // Each inference gets its own isolate for true parallelism
    futures.add(runOnceAsyncWithTimeout(
      runOptions,
      inputs,
      timeout,
      outputNames,
    ));
  }

  // Wait for all inferences to complete in parallel
  return await Future.wait(futures);
}