mobile_rag_engine 0.4.0
mobile_rag_engine: ^0.4.0 copied to clipboard
A high-performance, on-device RAG (Retrieval-Augmented Generation) engine for Flutter. Run semantic search completely offline on iOS and Android with HNSW vector indexing.
Mobile RAG Engine #
Production-ready, fully local RAG (Retrieval-Augmented Generation) engine for Flutter.
Powered by a Rust core, it delivers lightning-fast vector search and embedding generation directly on the device. No servers, no API costs, no latency.
Why this package? #
No Rust Installation Required #
You do NOT need to install Rust, Cargo, or Android NDK.
This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.
Performance #
| Feature | Pure Dart | Mobile RAG Engine (Rust) |
|---|---|---|
| Tokenization | Slow | 10x Faster (HuggingFace tokenizers) |
| Vector Search | O(n) | O(log n) (HNSW Index) |
| Memory Usage | High | Optimized (Zero-copy FFI) |
100% Offline & Private #
Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).
Features #
- Cross-Platform: Works seamlessly on iOS, Android, and macOS
- HNSW Vector Index: Fast approximate nearest neighbor search (proven scale up to 10k+ docs)
- Hybrid Search Ready: Supports semantic search combined with exact matching
- Auto-Chunking: Intelligent text splitting strategies included (Unicode-based semantic chunking)
- Model Flexibility: Use standard ONNX models (e.g.,
bge-m3,all-MiniLM-L6-v2)
Benchmark Results #
Installation #
1. Add the dependency #
dependencies:
mobile_rag_engine: ^0.3.9
2. Download Model Files #
# Create assets folder
mkdir -p assets && cd assets
# Download BGE-m3 model (INT8 quantized, multilingual)
curl -L -o model.onnx "https://huggingface.co/Teradata/bge-m3/resolve/main/onnx/model_int8.onnx"
curl -L -o tokenizer.json "https://huggingface.co/BAAI/bge-m3/resolve/main/tokenizer.json"
See Model Setup Guide for alternative models and production deployment strategies.
Quick Start #
Initialize the engine and start searching in just a few lines of code:
import 'package:mobile_rag_engine/mobile_rag_engine.dart';
void main() async {
// 1. Initialize Rust library & services
await RustLib.init(externalLibrary: ExternalLibrary.process(iKnowHowToUseIt: true));
await initTokenizer(tokenizerPath: 'assets/tokenizer.json');
await EmbeddingService.init(modelBytes);
// 2. Add Documents (Auto-embedded & indexed)
final embedding = await EmbeddingService.embed('Flutter is a UI toolkit.');
await addDocument(
dbPath: dbPath,
content: 'Flutter is a UI toolkit.',
embedding: embedding,
);
await rebuildHnswIndex(dbPath: dbPath);
// 3. Search
final queryEmbedding = await EmbeddingService.embed('What is Flutter?');
final results = await searchSimilar(
dbPath: dbPath,
queryEmbedding: queryEmbedding,
topK: 5,
);
print(results.first); // "Flutter is a UI toolkit."
}
Benchmarks #
Rust-powered components (M3 Pro macOS):
| Operation | Time | Notes |
|---|---|---|
| Tokenization (234 chars) | 0.04ms | HuggingFace tokenizers crate |
| HNSW Search (100 docs) | 0.3ms | instant-distance (O(log n)) |
These are the components where Rust provides 10-100x speedup over pure Dart implementations.
Embedding generation uses ONNX Runtime (platform-dependent, typically 25-100ms per text).
Architecture #
This package bridges the best of two worlds: Flutter for UI and Rust for heavy lifting.
| Component | Technology |
|---|---|
| Embedding | ONNX Runtime with quantized models (INT8) |
| Storage | SQLite for metadata + memory-mapped vector index |
| Search | instant-distance (HNSW) for low-latency retrieval |
| Tokenization | HuggingFace tokenizers crate |
Model Options #
| Model | Size | Best For |
|---|---|---|
| Teradata/bge-m3 (INT8) | ~200MB | Multilingual (Korean, English, etc.) |
| all-MiniLM-L6-v2 | ~25MB | English only, faster |
Custom Models: Export any Sentence Transformer to ONNX:
pip install optimum[exporters]
optimum-cli export onnx --model sentence-transformers/YOUR_MODEL ./output
Releases #
- v0.3.0 - Rust Semantic Chunking - Unicode-based semantic chunking
- v0.2.0 - LLM-Optimized Chunking - Chunking and context assembly
Roadmap #
- ✅ INT8 quantization support
- ✅ Chunking strategies for long documents
- ❌ Korean-specific models (KoSimCSE, KR-SBERT)
- ❌ Hybrid search (keyword + semantic)
- ❌ iOS/Android On-Demand Resources
Contributing #
Bug reports, feature requests, and PRs are all welcome!
License #
This project is licensed under the MIT License.