runAsync method
Performs inference asynchronously. Uses a persistent isolate that stays alive for reuse across multiple calls. This is efficient for repeated inference as it avoids isolate creation overhead. To kill the isolate, call killIsolate() or release(). Default timeout is 5 seconds. Use runAsyncWithTimeout() for custom timeout.
Architecture Note:
CRITICAL UNDERSTANDING: ONNX Runtime computation happens in NATIVE C++ memory, OUTSIDE of Dart's isolate memory space!
When you call inference:
- Dart isolate sends data pointers to native ONNX Runtime via FFI
- ONNX Runtime (C++ library) performs the actual computation in native memory
- Results are returned back to the Dart isolate via pointers
THE KEY INSIGHT:
- runAsync(): 1 isolate → 1 active native call at a time (sequential)
- runOnceAsync(): N isolates → N active native calls (parallel!)
Why? Because each isolate can make ONE synchronous FFI call at a time. Multiple isolates = Multiple simultaneous FFI calls = Parallel execution!
The native ONNX Runtime CAN handle multiple concurrent calls (it's thread-safe), but a single Dart isolate can only make one blocking FFI call at a time.
Threading Layers:
- Dart Isolates: Provide concurrency for Dart code (message passing, orchestration)
- runAsync() uses 1 persistent isolate
- runOnceAsync() creates new isolates for parallel orchestration
- ONNX Native Threads: Actual computation parallelism in C++ (shared memory)
- setInterOpNumThreads() - parallel operator execution
- setIntraOpNumThreads() - parallelism within operators
Why use isolates then?
- To avoid blocking the main UI thread while waiting for native computation
- To orchestrate multiple concurrent inference requests
- To handle pre/post-processing in parallel
- NOT for the actual neural network math (that's native C++)
Performance Example (8-core CPU, 10 images to process):
// SLOW: Sequential with runAsync() - ~1000ms total
for (var image in images) {
await session.runAsync(runOptions, image); // Each waits for previous
}
// CRASH WARNING: This will throw an error!
final futures = images.map((img) => session.runAsync(runOptions, img));
await Future.wait(futures); // ERROR: Concurrent calls to same isolate!
// FAST: Parallel with runParallelAsync() - ~200ms total
await session.runParallelAsync(images, runOptions); // All run together
// FASTEST for single large model: Configure threads optimally
options.setIntraOpNumThreads(8); // Use all cores for one inference
await session.runAsync(runOptions, largeInput); // Single but fast
Visual: Why runOnceAsync() is parallel but runAsync() is not:
runAsync() - Single Isolate: runOnceAsync() - Multiple Isolates:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Isolate 1 │ │ Isolate 1 │ │ Isolate 2 │
│ Call 1 │ → ONNX (blocks) │ Call 1 │ │ Call 2 │
│ Call 2 │ ← (waiting...) └──────┬──────┘ └──────┬──────┘
│ Call 3 │ ← (waiting...) ↓ ↓
└─────────────┘ ┌─────────────────────────────┐
│ ONNX Runtime (C++) │
Result: Sequential execution │ Processing BOTH calls │
│ in parallel! │
└─────────────────────────────┘
Result: Parallel execution
Implementation
Future<List<OrtValue?>>? runAsync(
OrtRunOptions runOptions, Map<String, OrtValue> inputs,
[List<String>? outputNames]) {
// Create persistent isolate session if it doesn't exist
// This isolate is reused for all runAsync calls for efficiency
_persistentIsolateSession ??= OrtIsolateSession(this);
return _persistentIsolateSession?.run(runOptions, inputs, outputNames);
}