flutter_gemma_embedder

A Flutter plugin for running EmbeddingGemma models locally on mobile devices. Generate high-quality text embeddings for semantic search, retrieval, and similarity tasks using Google's EmbeddingGemma 300M model.

Features

🧠 On-Device AI: Run EmbeddingGemma 300M locally without internet
🔍 Semantic Search: Generate embeddings for text similarity and search
📱 Cross-Platform: Android, iOS, and Web support
⚡ Multiple Variants: Support for different sequence lengths (256, 512, 1024, 2048 tokens)
🎯 Task-Specific: Optimized for retrieval tasks with proper prompting
🚀 GPU/CPU Backend: Choose optimal backend for your device
🔒 Privacy-First: All processing happens locally on device

Supported Models

Model	Sequence Length	Model Size	Use Case
EmbeddingGemma 300M (256)	256 tokens	179MB	Mobile & Real-time (~200 words)
EmbeddingGemma 300M (512)	512 tokens	187MB	General Purpose (~400 words)
EmbeddingGemma 300M (1024)	1024 tokens	191MB	Content Analysis (~800 words)
EmbeddingGemma 300M (2048)	2048 tokens	196MB	Research & Documents (~1600 words)

All models generate 768-dimensional embeddings optimized for retrieval tasks.

Platform Support

Platform	Status	Notes
Android	✅ Full	GPU and CPU backends
iOS	✅ Full	GPU and CPU backends
Web	🔶 Partial	CPU backend only

Installation

Add this to your pubspec.yaml:

dependencies:
  flutter_gemma_embedder: ^0.10.4

Quick Start

1. Initialize the Plugin

import 'package:flutter_gemma_embedder/flutter_gemma_embedder.dart';

final embedder = FlutterGemmaEmbedder.instance;

2. Create and Load a Model

// Create model instance
final model = await embedder.createModel(
  modelPath: '/path/to/model.tflite',
  modelType: EmbeddingModelType.embeddingGemma300M,
  dimensions: 768,
  taskType: EmbeddingTaskType.retrieval,
  backend: PreferredBackend.gpu,
);

// Initialize the model
await model.initialize();

3. Generate Embeddings

// Generate embedding for a single text
final embedding = await model.encode('Your text here');
print('Embedding dimensions: ${embedding.length}');

// Generate embeddings for multiple texts
final embeddings = await model.batchEncode([
  'First document',
  'Second document',
  'Third document',
]);

4. Calculate Similarity

final text1 = 'Flutter is a UI toolkit';
final text2 = 'Flutter helps build mobile apps';

final embedding1 = await model.encode(text1);
final embedding2 = await model.encode(text2);

final similarity = model.cosineSimilarity(embedding1, embedding2);
print('Similarity: ${similarity.toStringAsFixed(4)}'); // 0.8234

Advanced Usage

Task-Specific Prompting

The plugin automatically applies task-specific prompts for optimal embeddings:

// For retrieval tasks (default)
final model = await embedder.createModel(
  // ... other params
  taskType: EmbeddingTaskType.retrieval,
);

// The plugin will automatically format your text with:
// "Represent this sentence for searching relevant passages: YOUR_TEXT"
final embedding = await model.encode('machine learning algorithms');

Matryoshka Embeddings

Reduce embedding dimensions for faster processing:

// Use only first 512 dimensions instead of full 768
final embedding = await model.encode(
  'Your text here',
  outputDimensionality: 512,
);

Batch Processing

Process multiple texts efficiently:

final documents = [
  'Document 1 content',
  'Document 2 content', 
  'Document 3 content',
];

final embeddings = await model.batchEncode(documents);

// Find most similar document to query
final query = 'search query';
final queryEmbedding = await model.encode(query);

double bestSimilarity = -1;
int bestIndex = -1;

for (int i = 0; i < embeddings.length; i++) {
  final similarity = model.cosineSimilarity(queryEmbedding, embeddings[i]);
  if (similarity > bestSimilarity) {
    bestSimilarity = similarity;
    bestIndex = i;
  }
}

print('Most similar document: ${documents[bestIndex]}');
print('Similarity score: ${bestSimilarity.toStringAsFixed(4)}');

Model Management

Download Models

Models need to be downloaded and stored locally. The plugin provides utilities for model management:

import 'package:flutter_gemma_embedder_example/models/embedding_model_config.dart';

// Choose a model configuration
final config = EmbeddingModelConfig.embeddingGemma300M_seq512;

// Download and store the model
// (Implementation depends on your download strategy)

Model File Structure

Store your .tflite model files in the app's documents directory:

Documents/
└── embeddinggemma-300M_seq512_mixed-precision.tflite

Performance Tips

Backend Selection

GPU Backend: Faster inference, higher memory usage
CPU Backend: Lower memory usage, slower inference

// For better performance on newer devices
final model = await embedder.createModel(
  // ... other params
  backend: PreferredBackend.gpu,
);

// For memory-constrained devices
final model = await embedder.createModel(
  // ... other params  
  backend: PreferredBackend.cpu,
);

Model Selection

Choose the right model variant for your use case:

256 tokens: Fast inference for short texts (tweets, titles)
512 tokens: Balanced performance for medium texts (paragraphs)
1024 tokens: High capacity for long texts (articles)
2048 tokens: Maximum capacity for very long texts (documents)

Example App

The plugin includes a complete example app demonstrating:

Model selection with filtering and sorting
Model download with progress tracking
Text embedding generation
Similarity comparison
Real-time inference

Run the example:

cd example
flutter run

API Reference