Mobile RAG Engine

pub package flutter rust platform License: MIT

Production-ready, fully local RAG (Retrieval-Augmented Generation) engine for Flutter.

Powered by a Rust core, it delivers lightning-fast vector search and embedding generation directly on the device. No servers, no API costs, no latency.


Why this package?

No Rust Installation Required

You do NOT need to install Rust, Cargo, or Android NDK.

This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.

Performance

Feature Pure Dart Mobile RAG Engine (Rust)
Tokenization Slow 10x Faster (HuggingFace tokenizers)
Vector Search O(n) O(log n) (HNSW Index)
Memory Usage High Optimized (Zero-copy FFI)

100% Offline & Private

Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).


Features

  • Cross-Platform: Works seamlessly on iOS, Android, and macOS
  • HNSW Vector Index: Fast approximate nearest neighbor search (proven scale up to 10k+ docs)
  • Hybrid Search Ready: Supports semantic search combined with exact matching
  • Auto-Chunking: Intelligent text splitting strategies included (Unicode-based semantic chunking)
  • Model Flexibility: Use standard ONNX models (e.g., bge-m3, all-MiniLM-L6-v2)

Benchmark Results

iOS Benchmark Android Benchmark


Installation

1. Add the dependency

dependencies:
  mobile_rag_engine: ^0.3.9

2. Download Model Files

# Create assets folder
mkdir -p assets && cd assets

# Download BGE-m3 model (INT8 quantized, multilingual)
curl -L -o model.onnx "https://huggingface.co/Teradata/bge-m3/resolve/main/onnx/model_int8.onnx"
curl -L -o tokenizer.json "https://huggingface.co/BAAI/bge-m3/resolve/main/tokenizer.json"

See Model Setup Guide for alternative models and production deployment strategies.


Quick Start

Initialize the engine and start searching in just a few lines of code:

import 'package:mobile_rag_engine/mobile_rag_engine.dart';

void main() async {
  // 1. Initialize Rust library & services
  await RustLib.init(externalLibrary: ExternalLibrary.process(iKnowHowToUseIt: true));
  await initTokenizer(tokenizerPath: 'assets/tokenizer.json');
  await EmbeddingService.init(modelBytes);

  // 2. Add Documents (Auto-embedded & indexed)
  final embedding = await EmbeddingService.embed('Flutter is a UI toolkit.');
  await addDocument(
    dbPath: dbPath,
    content: 'Flutter is a UI toolkit.',
    embedding: embedding,
  );
  await rebuildHnswIndex(dbPath: dbPath);

  // 3. Search
  final queryEmbedding = await EmbeddingService.embed('What is Flutter?');
  final results = await searchSimilar(
    dbPath: dbPath,
    queryEmbedding: queryEmbedding,
    topK: 5,
  );

  print(results.first); // "Flutter is a UI toolkit."
}

Benchmarks

Rust-powered components (M3 Pro macOS):

Operation Time Notes
Tokenization (234 chars) 0.04ms HuggingFace tokenizers crate
HNSW Search (100 docs) 0.3ms instant-distance (O(log n))

These are the components where Rust provides 10-100x speedup over pure Dart implementations.

Embedding generation uses ONNX Runtime (platform-dependent, typically 25-100ms per text).


Architecture

This package bridges the best of two worlds: Flutter for UI and Rust for heavy lifting.

Architecture

Component Technology
Embedding ONNX Runtime with quantized models (INT8)
Storage SQLite for metadata + memory-mapped vector index
Search instant-distance (HNSW) for low-latency retrieval
Tokenization HuggingFace tokenizers crate

Model Options

Model Size Best For
Teradata/bge-m3 (INT8) ~200MB Multilingual (Korean, English, etc.)
all-MiniLM-L6-v2 ~25MB English only, faster

Custom Models: Export any Sentence Transformer to ONNX:

pip install optimum[exporters]
optimum-cli export onnx --model sentence-transformers/YOUR_MODEL ./output

Releases


Roadmap

  • x INT8 quantization support
  • x Chunking strategies for long documents
  • Korean-specific models (KoSimCSE, KR-SBERT)
  • Hybrid search (keyword + semantic)
  • iOS/Android On-Demand Resources

Contributing

Bug reports, feature requests, and PRs are all welcome!

License

This project is licensed under the MIT License.

Libraries

mobile_rag_engine
Mobile RAG Engine
services/benchmark_service
services/context_builder
Context assembly for LLM prompts.
services/embedding_service
services/prompt_compressor
REFRAG-style prompt compression service.
services/quality_test_service
services/source_rag_service
High-level RAG service for managing sources and chunks.