onnxruntime_v2 1.23.2+1 copy "onnxruntime_v2: ^1.23.2+1" to clipboard
onnxruntime_v2: ^1.23.2+1 copied to clipboard

Flutter plugin for OnnxRuntime provides an easy, flexible, and fast Dart API to integrate Onnx models in flutter apps across mobile and desktop platforms.

This is a fork of the original onnxruntime Flutter plugin, which appears to be no longer maintained. This fork adds support for 16KB memory page size and full GPU and hardware acceleration support

OnnxRuntime Plugin #

pub package

Overview #

Flutter plugin for OnnxRuntime via dart:ffi provides an easy, flexible, and fast Dart API to integrate Onnx models in flutter apps across mobile and desktop platforms.

Platform Android iOS Linux macOS Windows
Compatibility API level 21+ * * * *
Architecture arm32/arm64 * * * *

*: Consistent with Flutter

Key Features #

  • Multi-platform Support for Android, iOS, Linux, macOS, Windows, and Web(Coming soon).
  • Flexibility to use any Onnx Model.
  • Acceleration using multi-threading.
  • Similar structure as OnnxRuntime Java and C# API.
  • Inference speed is not slower than native Android/iOS Apps built using the Java/Objective-C API.
  • Run inference in different isolates to prevent jank in UI thread.

Getting Started #

In your flutter project add the dependency:

dependencies:
  ...
  onnxruntime: x.y.z

Usage example #

Import #

import 'package:onnxruntime_v2/onnxruntime_v2.dart';

Initializing environment #

OrtEnv.instance.init();

Creating the Session #

final sessionOptions = OrtSessionOptions();

// πŸš€ NEW: Automatically use GPU acceleration if available!
// This will try GPU providers first, then fall back to CPU
sessionOptions.appendDefaultProviders();

const assetFileName = 'assets/models/test.onnx';
final rawAssetFile = await rootBundle.load(assetFileName);
final bytes = rawAssetFile.buffer.asUint8List();
final session = OrtSession.fromBuffer(bytes, sessionOptions);

Performing inference #

final shape = [1, 2, 3];
final inputOrt = OrtValueTensor.createTensorWithDataList(data, shape);
final inputs = {'input': inputOrt};
final runOptions = OrtRunOptions();
final outputs = await _session?.runAsync(runOptions, inputs);
inputOrt.release();
runOptions.release();
outputs?.forEach((element) {
  element?.release();
});

Releasing environment #

OrtEnv.instance.release();

πŸš€ GPU Acceleration #

This fork includes full support for GPU and hardware acceleration across multiple platforms!

Supported Execution Providers #

Provider Platform Hardware Speedup
CUDA Windows/Linux NVIDIA GPU 5-10x
TensorRT Windows/Linux NVIDIA GPU 10-20x
DirectML Windows AMD/Intel/NVIDIA GPU 3-8x
ROCm Linux AMD GPU 5-10x
CoreML iOS/macOS Apple Neural Engine 5-15x
NNAPI Android Google NPU/GPU 3-7x
OpenVINO Windows/Linux Intel GPU/VPU 3-6x
DNNL All Intel CPU 2-4x
XNNPACK All CPU optimizations 1.5-3x

Quick Start: Automatic GPU Selection #

The easiest way to enable GPU acceleration:

final sessionOptions = OrtSessionOptions();
sessionOptions.appendDefaultProviders(); // 🎯 That's it!

This automatically selects the best available provider in this order:

  1. GPU: CUDA β†’ DirectML β†’ ROCm
  2. NPU: CoreML β†’ NNAPI β†’ QNN
  3. Optimized CPU: DNNL β†’ XNNPACK
  4. Fallback: Standard CPU

Manual Provider Selection #

For fine-grained control:

// NVIDIA GPU (Windows/Linux)
sessionOptions.appendCudaProvider(CUDAFlags.useArena);

// NVIDIA with TensorRT optimizations + FP16
sessionOptions.appendTensorRTProvider({'trt_fp16_enable': '1'});

// DirectML for Windows (any GPU)
sessionOptions.appendDirectMLProvider();

// Apple Neural Engine (iOS/macOS)
sessionOptions.appendCoreMLProvider(CoreMLFlags.useNone);

// Android acceleration
sessionOptions.appendNnapiProvider(NnapiFlags.useNone);

// AMD GPU on Linux
sessionOptions.appendRocmProvider(ROCmFlags.useArena);

// Intel optimization
sessionOptions.appendDNNLProvider(DNNLFlags.useArena);

// Always add CPU as fallback
sessionOptions.appendCPUProvider(CPUFlags.useArena);

Performance Tips #

  1. Use appendDefaultProviders() first - it handles everything automatically
  2. CUDA vs TensorRT: TensorRT is faster but takes longer to initialize
  3. DirectML: Great for cross-vendor support on Windows
  4. Mobile: CoreML (iOS) and NNAPI (Android) provide massive speedups
  5. Thread count: Set setIntraOpNumThreads() to your CPU core count for CPU inference

GPU Setup Requirements #

Windows (NVIDIA):

Linux (NVIDIA):

  • Install CUDA runtime: apt install nvidia-cuda-toolkit
  • Optional: TensorRT

Linux (AMD):

Windows (Any GPU):

  • DirectML works out-of-the-box on Windows 10+

iOS/macOS:

  • CoreML works automatically (no setup needed)

Android:

  • NNAPI works automatically on Android 8.1+ (no setup needed)

Troubleshooting #

If GPU acceleration isn't working:

  1. Check available providers:
OrtEnv.instance.availableProviders().forEach((provider) {
  print('Available: $provider');
});
  1. Catch provider errors gracefully:
try {
  sessionOptions.appendCudaProvider(CUDAFlags.useArena);
} catch (e) {
  print('CUDA not available, falling back to CPU');
  sessionOptions.appendCPUProvider(CPUFlags.useArena);
}
  1. Verify GPU runtime is installed (CUDA, DirectML, etc.)

  2. Check that you're using the GPU-enabled ONNX Runtime library

2
likes
150
points
160
downloads

Publisher

unverified uploader

Weekly Downloads

Flutter plugin for OnnxRuntime provides an easy, flexible, and fast Dart API to integrate Onnx models in flutter apps across mobile and desktop platforms.

Repository (GitHub)
View/report issues

Topics

#onnx #tflite #pytorch #ai

Documentation

API reference

License

MIT (license)

Dependencies

ffi, flutter

More

Packages that depend on onnxruntime_v2

Packages that implement onnxruntime_v2