onnxruntime_v2 1.23.2+1
onnxruntime_v2: ^1.23.2+1 copied to clipboard
Flutter plugin for OnnxRuntime provides an easy, flexible, and fast Dart API to integrate Onnx models in flutter apps across mobile and desktop platforms.

This is a fork of the original onnxruntime Flutter plugin, which appears to be no longer maintained. This fork adds support for 16KB memory page size and full GPU and hardware acceleration support
OnnxRuntime Plugin #
Overview #
Flutter plugin for OnnxRuntime via dart:ffi provides an easy, flexible, and fast Dart API to integrate Onnx models in flutter apps across mobile and desktop platforms.
| Platform | Android | iOS | Linux | macOS | Windows |
|---|---|---|---|---|---|
| Compatibility | API level 21+ | * | * | * | * |
| Architecture | arm32/arm64 | * | * | * | * |
Key Features #
- Multi-platform Support for Android, iOS, Linux, macOS, Windows, and Web(Coming soon).
- Flexibility to use any Onnx Model.
- Acceleration using multi-threading.
- Similar structure as OnnxRuntime Java and C# API.
- Inference speed is not slower than native Android/iOS Apps built using the Java/Objective-C API.
- Run inference in different isolates to prevent jank in UI thread.
Getting Started #
In your flutter project add the dependency:
dependencies:
...
onnxruntime: x.y.z
Usage example #
Import #
import 'package:onnxruntime_v2/onnxruntime_v2.dart';
Initializing environment #
OrtEnv.instance.init();
Creating the Session #
final sessionOptions = OrtSessionOptions();
// π NEW: Automatically use GPU acceleration if available!
// This will try GPU providers first, then fall back to CPU
sessionOptions.appendDefaultProviders();
const assetFileName = 'assets/models/test.onnx';
final rawAssetFile = await rootBundle.load(assetFileName);
final bytes = rawAssetFile.buffer.asUint8List();
final session = OrtSession.fromBuffer(bytes, sessionOptions);
Performing inference #
final shape = [1, 2, 3];
final inputOrt = OrtValueTensor.createTensorWithDataList(data, shape);
final inputs = {'input': inputOrt};
final runOptions = OrtRunOptions();
final outputs = await _session?.runAsync(runOptions, inputs);
inputOrt.release();
runOptions.release();
outputs?.forEach((element) {
element?.release();
});
Releasing environment #
OrtEnv.instance.release();
π GPU Acceleration #
This fork includes full support for GPU and hardware acceleration across multiple platforms!
Supported Execution Providers #
| Provider | Platform | Hardware | Speedup |
|---|---|---|---|
| CUDA | Windows/Linux | NVIDIA GPU | 5-10x |
| TensorRT | Windows/Linux | NVIDIA GPU | 10-20x |
| DirectML | Windows | AMD/Intel/NVIDIA GPU | 3-8x |
| ROCm | Linux | AMD GPU | 5-10x |
| CoreML | iOS/macOS | Apple Neural Engine | 5-15x |
| NNAPI | Android | Google NPU/GPU | 3-7x |
| OpenVINO | Windows/Linux | Intel GPU/VPU | 3-6x |
| DNNL | All | Intel CPU | 2-4x |
| XNNPACK | All | CPU optimizations | 1.5-3x |
Quick Start: Automatic GPU Selection #
The easiest way to enable GPU acceleration:
final sessionOptions = OrtSessionOptions();
sessionOptions.appendDefaultProviders(); // π― That's it!
This automatically selects the best available provider in this order:
- GPU: CUDA β DirectML β ROCm
- NPU: CoreML β NNAPI β QNN
- Optimized CPU: DNNL β XNNPACK
- Fallback: Standard CPU
Manual Provider Selection #
For fine-grained control:
// NVIDIA GPU (Windows/Linux)
sessionOptions.appendCudaProvider(CUDAFlags.useArena);
// NVIDIA with TensorRT optimizations + FP16
sessionOptions.appendTensorRTProvider({'trt_fp16_enable': '1'});
// DirectML for Windows (any GPU)
sessionOptions.appendDirectMLProvider();
// Apple Neural Engine (iOS/macOS)
sessionOptions.appendCoreMLProvider(CoreMLFlags.useNone);
// Android acceleration
sessionOptions.appendNnapiProvider(NnapiFlags.useNone);
// AMD GPU on Linux
sessionOptions.appendRocmProvider(ROCmFlags.useArena);
// Intel optimization
sessionOptions.appendDNNLProvider(DNNLFlags.useArena);
// Always add CPU as fallback
sessionOptions.appendCPUProvider(CPUFlags.useArena);
Performance Tips #
- Use
appendDefaultProviders()first - it handles everything automatically - CUDA vs TensorRT: TensorRT is faster but takes longer to initialize
- DirectML: Great for cross-vendor support on Windows
- Mobile: CoreML (iOS) and NNAPI (Android) provide massive speedups
- Thread count: Set
setIntraOpNumThreads()to your CPU core count for CPU inference
GPU Setup Requirements #
Windows (NVIDIA):
- Install CUDA Toolkit
- Optional: TensorRT for extra speed
Linux (NVIDIA):
- Install CUDA runtime:
apt install nvidia-cuda-toolkit - Optional: TensorRT
Linux (AMD):
- Install ROCm
Windows (Any GPU):
- DirectML works out-of-the-box on Windows 10+
iOS/macOS:
- CoreML works automatically (no setup needed)
Android:
- NNAPI works automatically on Android 8.1+ (no setup needed)
Troubleshooting #
If GPU acceleration isn't working:
- Check available providers:
OrtEnv.instance.availableProviders().forEach((provider) {
print('Available: $provider');
});
- Catch provider errors gracefully:
try {
sessionOptions.appendCudaProvider(CUDAFlags.useArena);
} catch (e) {
print('CUDA not available, falling back to CPU');
sessionOptions.appendCPUProvider(CPUFlags.useArena);
}
-
Verify GPU runtime is installed (CUDA, DirectML, etc.)
-
Check that you're using the GPU-enabled ONNX Runtime library