runParallelAsync method

Runs multiple inference operations in parallel using separate isolates. Each inference runs in its own isolate, allowing true parallel execution. All isolates are automatically cleaned up after completion. Returns a list of results in the same order as the input list.

Performance Characteristics vs runAsync():

Option 1: Multiple calls to runAsync() (single persistent isolate)

for (var input in inputs) {
  await session.runAsync(runOptions, input); // Sequential
}

✅ Pros: No isolate creation overhead, memory efficient
❌ Cons: Inferences run SEQUENTIALLY (not parallel!)
Speed: Slower for batch processing (no parallelism)
Use when: Processing a stream of requests over time

Option 2: Multiple runOnceAsync() or runParallelAsync() (multiple isolates)

await session.runParallelAsync(inputs, runOptions); // Parallel

✅ Pros: TRUE PARALLEL execution (if ONNX session supports it)
❌ Cons: Isolate creation overhead (~1-2ms per isolate)
Speed: Faster for batch processing (parallel execution)
Use when: Processing multiple inputs at once

Critical Factor: Does ONNX Runtime support concurrent calls?

If the native ONNX session is thread-safe: Multiple isolates = parallel execution
If not thread-safe: Multiple isolates will serialize at the native level anyway
Most ONNX Runtime sessions ARE thread-safe by default

Optimal Thread/Isolate Configuration:

CPU Cores = 8, Batch Size = 4:
- Option A: 4 isolates × 2 intra-op threads each = utilize all 8 cores
- Option B: 1 isolate × 8 intra-op threads = utilize all 8 cores for each inference
- Option A is better for batch, Option B is better for single large inference

Recommendation:

Single inference: Use runAsync() with high intra-op threads
Batch inference: Use runParallelAsync() with lower threads per isolate
Stream of requests: Use persistent runAsync() to avoid isolate overhead

Implementation

OrtSession class