Mobile RAG Engine

pub package

Production-ready, fully local RAG (Retrieval-Augmented Generation) engine for Flutter.

Powered by a Rust core, it delivers lightning-fast vector search and embedding generation directly on the device. No servers, no API costs, no latency.

Why this package?

No Rust Installation Required

You do NOT need to install Rust, Cargo, or Android NDK.

This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.

Performance

Feature	Pure Dart	Mobile RAG Engine (Rust)
Tokenization	Slow	10x Faster (HuggingFace tokenizers)
Vector Search	O(n)	O(log n) (HNSW Index)
Memory Usage	High	Optimized (Zero-copy FFI)

100% Offline & Private

Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).

Features

Cross-Platform: Works seamlessly on iOS, Android, and macOS
HNSW Vector Index: Fast approximate nearest neighbor search (proven scale up to 10k+ docs)
Hybrid Search Ready: Supports semantic search combined with exact matching
Auto-Chunking: Intelligent text splitting strategies included (Unicode-based semantic chunking)
Model Flexibility: Use standard ONNX models (e.g., bge-m3, all-MiniLM-L6-v2)

Benchmark Results

iOS Benchmark Android Benchmark

Installation

1. Add the dependency

dependencies:
  mobile_rag_engine: ^0.3.9

2. Download Model Files

# Create assets folder
mkdir -p assets && cd assets

# Download BGE-m3 model (INT8 quantized, multilingual)
curl -L -o model.onnx "https://huggingface.co/Teradata/bge-m3/resolve/main/onnx/model_int8.onnx"
curl -L -o tokenizer.json "https://huggingface.co/BAAI/bge-m3/resolve/main/tokenizer.json"

See Model Setup Guide for alternative models and production deployment strategies.

Quick Start

Initialize the engine and start searching in just a few lines of code:

import 'package:mobile_rag_engine/mobile_rag_engine.dart';

void main() async {
  // 1. Initialize Rust library & services
  await RustLib.init(externalLibrary: ExternalLibrary.process(iKnowHowToUseIt: true));
  await initTokenizer(tokenizerPath: 'assets/tokenizer.json');
  await EmbeddingService.init(modelBytes);

  // 2. Add Documents (Auto-embedded & indexed)
  final embedding = await EmbeddingService.embed('Flutter is a UI toolkit.');
  await addDocument(
    dbPath: dbPath,
    content: 'Flutter is a UI toolkit.',
    embedding: embedding,
  );
  await rebuildHnswIndex(dbPath: dbPath);

  // 3. Search
  final queryEmbedding = await EmbeddingService.embed('What is Flutter?');
  final results = await searchSimilar(
    dbPath: dbPath,
    queryEmbedding: queryEmbedding,
    topK: 5,
  );

  print(results.first); // "Flutter is a UI toolkit."
}

Benchmarks

Rust-powered components (M3 Pro macOS):

Operation	Time	Notes
Tokenization (234 chars)	0.04ms	HuggingFace `tokenizers` crate
HNSW Search (100 docs)	0.3ms	`instant-distance` (O(log n))

These are the components where Rust provides 10-100x speedup over pure Dart implementations.

Embedding generation uses ONNX Runtime (platform-dependent, typically 25-100ms per text).

Architecture

This package bridges the best of two worlds: Flutter for UI and Rust for heavy lifting.

Architecture

Component	Technology
Embedding	ONNX Runtime with quantized models (INT8)
Storage	SQLite for metadata + memory-mapped vector index
Search	`instant-distance` (HNSW) for low-latency retrieval
Tokenization	HuggingFace `tokenizers` crate

Model Options

Model	Size	Best For
Teradata/bge-m3 (INT8)	~200MB	Multilingual (Korean, English, etc.)
all-MiniLM-L6-v2	~25MB	English only, faster

Custom Models: Export any Sentence Transformer to ONNX:

pip install optimum[exporters]
optimum-cli export onnx --model sentence-transformers/YOUR_MODEL ./output

Releases

v0.3.0 - Rust Semantic Chunking - Unicode-based semantic chunking
v0.2.0 - LLM-Optimized Chunking - Chunking and context assembly

Roadmap

x INT8 quantization support
x Chunking strategies for long documents
Korean-specific models (KoSimCSE, KR-SBERT)
Hybrid search (keyword + semantic)
iOS/Android On-Demand Resources

Contributing

Bug reports, feature requests, and PRs are all welcome!

License

This project is licensed under the MIT License.

Mobile RAG Engine

Why this package?

No Rust Installation Required

Performance

100% Offline & Private

Features

Benchmark Results

Installation

1. Add the dependency

2. Download Model Files

Quick Start

Benchmarks

Architecture

Model Options

Releases

Roadmap

Contributing

License

Libraries

mobile_rag_engine package