runAsync method

Performs inference asynchronously. Uses a persistent isolate that stays alive for reuse across multiple calls. This is efficient for repeated inference as it avoids isolate creation overhead. To kill the isolate, call killIsolate() or release(). Default timeout is 5 seconds. Use runAsyncWithTimeout() for custom timeout.

Architecture Note:

CRITICAL UNDERSTANDING: ONNX Runtime computation happens in NATIVE C++ memory, OUTSIDE of Dart's isolate memory space!

When you call inference:

Dart isolate sends data pointers to native ONNX Runtime via FFI
ONNX Runtime (C++ library) performs the actual computation in native memory
Results are returned back to the Dart isolate via pointers

THE KEY INSIGHT:

runAsync(): 1 isolate → 1 active native call at a time (sequential)
runOnceAsync(): N isolates → N active native calls (parallel!)

Why? Because each isolate can make ONE synchronous FFI call at a time. Multiple isolates = Multiple simultaneous FFI calls = Parallel execution!

The native ONNX Runtime CAN handle multiple concurrent calls (it's thread-safe), but a single Dart isolate can only make one blocking FFI call at a time.

Threading Layers:

Dart Isolates: Provide concurrency for Dart code (message passing, orchestration)
- runAsync() uses 1 persistent isolate
- runOnceAsync() creates new isolates for parallel orchestration
ONNX Native Threads: Actual computation parallelism in C++ (shared memory)
- setInterOpNumThreads() - parallel operator execution
- setIntraOpNumThreads() - parallelism within operators

Why use isolates then?

To avoid blocking the main UI thread while waiting for native computation
To orchestrate multiple concurrent inference requests
To handle pre/post-processing in parallel
NOT for the actual neural network math (that's native C++)

Performance Example (8-core CPU, 10 images to process):

// SLOW: Sequential with runAsync() - ~1000ms total
for (var image in images) {
  await session.runAsync(runOptions, image); // Each waits for previous
}

// CRASH WARNING: This will throw an error!
final futures = images.map((img) => session.runAsync(runOptions, img));
await Future.wait(futures); // ERROR: Concurrent calls to same isolate!

// FAST: Parallel with runParallelAsync() - ~200ms total
await session.runParallelAsync(images, runOptions); // All run together

// FASTEST for single large model: Configure threads optimally
options.setIntraOpNumThreads(8); // Use all cores for one inference
await session.runAsync(runOptions, largeInput); // Single but fast

Visual: Why runOnceAsync() is parallel but runAsync() is not:

runAsync() - Single Isolate:          runOnceAsync() - Multiple Isolates:
┌─────────────┐                       ┌─────────────┐ ┌─────────────┐
│  Isolate 1  │                       │  Isolate 1  │ │  Isolate 2  │
│   Call 1    │ → ONNX (blocks)       │   Call 1    │ │   Call 2    │
│   Call 2    │ ← (waiting...)        └──────┬──────┘ └──────┬──────┘
│   Call 3    │ ← (waiting...)               ↓               ↓
└─────────────┘                       ┌─────────────────────────────┐
                                       │     ONNX Runtime (C++)      │
Result: Sequential execution          │   Processing BOTH calls     │
                                       │     in parallel!            │
                                       └─────────────────────────────┘
                                       Result: Parallel execution

Implementation

OrtSession class