flutter_ort_genai 0.1.0 copy "flutter_ort_genai: ^0.1.0" to clipboard
flutter_ort_genai: ^0.1.0 copied to clipboard

ONNX Runtime GenAI for Flutter - High-performance LLM inference with streaming token generation

flutter_ort_genai #

ONNX Runtime GenAI for Flutter - High-performance LLM inference with streaming token generation.

pub package License: MIT

Features #

  • πŸš€ High-performance LLM inference using ONNX Runtime GenAI
  • πŸ“‘ Streaming token generation with cancellation support
  • 🎯 Clean API with Generation object pattern
  • πŸ”§ Dynamic C API resolution (no ORT duplication)
  • πŸ“± Cross-platform: Android, iOS, macOS
  • πŸ›‘οΈ Production-ready with thread safety and proper error handling

Installation #

Add to your pubspec.yaml:

dependencies:
  flutter_onnxruntime: ^latest  # Required peer dependency
  flutter_ort_genai: ^0.1.0

Usage #

Basic Example #

import 'package:flutter_ort_genai/flutter_ort_genai.dart';

// Load a GenAI model
final model = await OrtGenAIModel.load(
  'path/to/genai/model',
  options: GenAIOptions(deviceType: 'cpu'),
);

// Start generation
final generation = model.start(
  'What is Flutter?',
  temperature: 0.7,
  maxTokens: 256,
);

// Stream tokens
await for (final token in generation.stream) {
  print(token);
}

// Or cancel generation
await generation.cancel();

// Dispose when done
await model.dispose();

Advanced Usage #

// With all parameters
final generation = model.start(
  prompt,
  temperature: 0.8,
  maxTokens: 512,
  topP: 0.9,
  topK: 40,
  repetitionPenalty: 1.1,
  stopSequences: ['</end>', '\n\n'],
);

// Collect all tokens
final response = await generation.collectAll();

// With timeout
final response = await generation.collectWithTimeout(
  Duration(seconds: 30),
);

// Get model metadata (if available)
final metadata = await model.getMetadata();
print('Model: ${metadata?.modelName}');
print('Vocab size: ${metadata?.vocabSize}');

Model Requirements #

GenAI models require these files in the model directory:

  • genai_config.json - GenAI configuration
  • tokenizer.json or tokenizer.model - Tokenizer data
  • model.onnx - Model weights (or quantized variants)

See BUILD_INSTRUCTIONS.md for model conversion details.

Building from Source #

This plugin requires building ONNX Runtime GenAI from source. See BUILD_INSTRUCTIONS.md for detailed steps.

Quick Start #

  1. Clone and build onnxruntime-genai:
git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai
python build.py --android --android_abi arm64-v8a
  1. Copy libraries to plugin (see BUILD_INSTRUCTIONS.md)

  2. Build and run:

flutter run

Platform Support #

Platform Minimum Version Architectures
Android API 24 (7.0) arm64-v8a, x86_64
iOS 11.0 arm64
macOS 10.14 arm64, x86_64

Architecture #

This plugin is a companion to flutter_onnxruntime:

  • flutter_onnxruntime: Generic ONNX inference (Whisper, TTS, etc.)
  • flutter_ort_genai: LLM-specific with streaming generation

Both share the same ONNX Runtime 1.22.0, avoiding duplication.

Contributing #

Contributions are welcome! Please read our Contributing Guide first.

License #

MIT License - see LICENSE file.

Acknowledgments #

0
likes
140
points
95
downloads

Publisher

unverified uploader

Weekly Downloads

ONNX Runtime GenAI for Flutter - High-performance LLM inference with streaming token generation

Repository (GitHub)
View/report issues

Topics

#ai #llm #onnx #genai #inference

Documentation

Documentation
API reference

License

MIT (license)

Dependencies

flutter

More

Packages that depend on flutter_ort_genai

Packages that implement flutter_ort_genai