flutter_ort_genai 0.1.0
flutter_ort_genai: ^0.1.0 copied to clipboard
ONNX Runtime GenAI for Flutter - High-performance LLM inference with streaming token generation
flutter_ort_genai #
ONNX Runtime GenAI for Flutter - High-performance LLM inference with streaming token generation.
Features #
- π High-performance LLM inference using ONNX Runtime GenAI
- π‘ Streaming token generation with cancellation support
- π― Clean API with Generation object pattern
- π§ Dynamic C API resolution (no ORT duplication)
- π± Cross-platform: Android, iOS, macOS
- π‘οΈ Production-ready with thread safety and proper error handling
Installation #
Add to your pubspec.yaml
:
dependencies:
flutter_onnxruntime: ^latest # Required peer dependency
flutter_ort_genai: ^0.1.0
Usage #
Basic Example #
import 'package:flutter_ort_genai/flutter_ort_genai.dart';
// Load a GenAI model
final model = await OrtGenAIModel.load(
'path/to/genai/model',
options: GenAIOptions(deviceType: 'cpu'),
);
// Start generation
final generation = model.start(
'What is Flutter?',
temperature: 0.7,
maxTokens: 256,
);
// Stream tokens
await for (final token in generation.stream) {
print(token);
}
// Or cancel generation
await generation.cancel();
// Dispose when done
await model.dispose();
Advanced Usage #
// With all parameters
final generation = model.start(
prompt,
temperature: 0.8,
maxTokens: 512,
topP: 0.9,
topK: 40,
repetitionPenalty: 1.1,
stopSequences: ['</end>', '\n\n'],
);
// Collect all tokens
final response = await generation.collectAll();
// With timeout
final response = await generation.collectWithTimeout(
Duration(seconds: 30),
);
// Get model metadata (if available)
final metadata = await model.getMetadata();
print('Model: ${metadata?.modelName}');
print('Vocab size: ${metadata?.vocabSize}');
Model Requirements #
GenAI models require these files in the model directory:
genai_config.json
- GenAI configurationtokenizer.json
ortokenizer.model
- Tokenizer datamodel.onnx
- Model weights (or quantized variants)
See BUILD_INSTRUCTIONS.md for model conversion details.
Building from Source #
This plugin requires building ONNX Runtime GenAI from source. See BUILD_INSTRUCTIONS.md for detailed steps.
Quick Start #
- Clone and build onnxruntime-genai:
git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai
python build.py --android --android_abi arm64-v8a
-
Copy libraries to plugin (see BUILD_INSTRUCTIONS.md)
-
Build and run:
flutter run
Platform Support #
Platform | Minimum Version | Architectures |
---|---|---|
Android | API 24 (7.0) | arm64-v8a, x86_64 |
iOS | 11.0 | arm64 |
macOS | 10.14 | arm64, x86_64 |
Architecture #
This plugin is a companion to flutter_onnxruntime
:
flutter_onnxruntime
: Generic ONNX inference (Whisper, TTS, etc.)flutter_ort_genai
: LLM-specific with streaming generation
Both share the same ONNX Runtime 1.22.0, avoiding duplication.
Contributing #
Contributions are welcome! Please read our Contributing Guide first.
License #
MIT License - see LICENSE file.