Mobile RAG Engine
Production-ready, fully local RAG (Retrieval-Augmented Generation) engine for Flutter.
Powered by a Rust core, it delivers lightning-fast vector search and embedding generation directly on the device. No servers, no API costs, no latency.
Why this package?
No Rust Installation Required
You do NOT need to install Rust, Cargo, or Android NDK.
This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.
Performance
| Feature | Pure Dart | Mobile RAG Engine (Rust) |
|---|---|---|
| Tokenization | Slow | 10x Faster (HuggingFace tokenizers) |
| Vector Search | O(n) | O(log n) (HNSW Index) |
| Memory Usage | High | Optimized (Zero-copy FFI) |
100% Offline & Private
Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).
Features
- Cross-Platform: Works seamlessly on iOS, Android, and macOS
- HNSW Vector Index: Fast approximate nearest neighbor search (proven scale up to 10k+ docs)
- Hybrid Search Ready: Supports semantic search combined with exact matching
- Auto-Chunking: Intelligent text splitting strategies included (Unicode-based semantic chunking)
- Model Flexibility: Use standard ONNX models (e.g.,
bge-m3,all-MiniLM-L6-v2)
Benchmark Results
Installation
1. Add the dependency
dependencies:
mobile_rag_engine: ^0.3.9
2. Download Model Files
# Create assets folder
mkdir -p assets && cd assets
# Download BGE-m3 model (INT8 quantized, multilingual)
curl -L -o model.onnx "https://huggingface.co/Teradata/bge-m3/resolve/main/onnx/model_int8.onnx"
curl -L -o tokenizer.json "https://huggingface.co/BAAI/bge-m3/resolve/main/tokenizer.json"
See Model Setup Guide for alternative models and production deployment strategies.
Quick Start
Initialize the engine and start searching in just a few lines of code:
import 'package:mobile_rag_engine/mobile_rag_engine.dart';
void main() async {
// 1. Initialize Rust library & services
await RustLib.init(externalLibrary: ExternalLibrary.process(iKnowHowToUseIt: true));
await initTokenizer(tokenizerPath: 'assets/tokenizer.json');
await EmbeddingService.init(modelBytes);
// 2. Add Documents (Auto-embedded & indexed)
final embedding = await EmbeddingService.embed('Flutter is a UI toolkit.');
await addDocument(
dbPath: dbPath,
content: 'Flutter is a UI toolkit.',
embedding: embedding,
);
await rebuildHnswIndex(dbPath: dbPath);
// 3. Search
final queryEmbedding = await EmbeddingService.embed('What is Flutter?');
final results = await searchSimilar(
dbPath: dbPath,
queryEmbedding: queryEmbedding,
topK: 5,
);
print(results.first); // "Flutter is a UI toolkit."
}
Benchmarks
Rust-powered components (M3 Pro macOS):
| Operation | Time | Notes |
|---|---|---|
| Tokenization (234 chars) | 0.04ms | HuggingFace tokenizers crate |
| HNSW Search (100 docs) | 0.3ms | instant-distance (O(log n)) |
These are the components where Rust provides 10-100x speedup over pure Dart implementations.
Embedding generation uses ONNX Runtime (platform-dependent, typically 25-100ms per text).
Architecture
This package bridges the best of two worlds: Flutter for UI and Rust for heavy lifting.
| Component | Technology |
|---|---|
| Embedding | ONNX Runtime with quantized models (INT8) |
| Storage | SQLite for metadata + memory-mapped vector index |
| Search | instant-distance (HNSW) for low-latency retrieval |
| Tokenization | HuggingFace tokenizers crate |
Model Options
| Model | Size | Best For |
|---|---|---|
| Teradata/bge-m3 (INT8) | ~200MB | Multilingual (Korean, English, etc.) |
| all-MiniLM-L6-v2 | ~25MB | English only, faster |
Custom Models: Export any Sentence Transformer to ONNX:
pip install optimum[exporters]
optimum-cli export onnx --model sentence-transformers/YOUR_MODEL ./output
Releases
- v0.3.0 - Rust Semantic Chunking - Unicode-based semantic chunking
- v0.2.0 - LLM-Optimized Chunking - Chunking and context assembly
Roadmap
xINT8 quantization supportxChunking strategies for long documentsKorean-specific models (KoSimCSE, KR-SBERT)Hybrid search (keyword + semantic)iOS/Android On-Demand Resources
Contributing
Bug reports, feature requests, and PRs are all welcome!
License
This project is licensed under the MIT License.
Libraries
- mobile_rag_engine
- Mobile RAG Engine
- services/benchmark_service
- services/context_builder
- Context assembly for LLM prompts.
- services/embedding_service
- services/prompt_compressor
- REFRAG-style prompt compression service.
- services/quality_test_service
- services/source_rag_service
- High-level RAG service for managing sources and chunks.