flutter_gemma_embedder

A Flutter plugin for running EmbeddingGemma models locally on mobile devices. Generate high-quality text embeddings for semantic search, retrieval, and similarity tasks using Google's EmbeddingGemma 300M model.

pub package License

Features

  • 🧠 On-Device AI: Run EmbeddingGemma 300M locally without internet
  • πŸ” Semantic Search: Generate embeddings for text similarity and search
  • πŸ“± Cross-Platform: Android, iOS, and Web support
  • ⚑ Multiple Variants: Support for different sequence lengths (256, 512, 1024, 2048 tokens)
  • 🎯 Task-Specific: Optimized for retrieval tasks with proper prompting
  • πŸš€ GPU/CPU Backend: Choose optimal backend for your device
  • πŸ”’ Privacy-First: All processing happens locally on device

Supported Models

Model Sequence Length Model Size Use Case
EmbeddingGemma 300M (256) 256 tokens 179MB Mobile & Real-time (~200 words)
EmbeddingGemma 300M (512) 512 tokens 187MB General Purpose (~400 words)
EmbeddingGemma 300M (1024) 1024 tokens 191MB Content Analysis (~800 words)
EmbeddingGemma 300M (2048) 2048 tokens 196MB Research & Documents (~1600 words)

All models generate 768-dimensional embeddings optimized for retrieval tasks.

Platform Support

Platform Status Notes
Android βœ… Full GPU and CPU backends
iOS βœ… Full GPU and CPU backends
Web πŸ”Ά Partial CPU backend only

Installation

Add this to your pubspec.yaml:

dependencies:
  flutter_gemma_embedder: ^0.10.4

Quick Start

1. Initialize the Plugin

import 'package:flutter_gemma_embedder/flutter_gemma_embedder.dart';

final embedder = FlutterGemmaEmbedder.instance;

2. Create and Load a Model

// Create model instance
final model = await embedder.createModel(
  modelPath: '/path/to/model.tflite',
  modelType: EmbeddingModelType.embeddingGemma300M,
  dimensions: 768,
  taskType: EmbeddingTaskType.retrieval,
  backend: PreferredBackend.gpu,
);

// Initialize the model
await model.initialize();

3. Generate Embeddings

// Generate embedding for a single text
final embedding = await model.encode('Your text here');
print('Embedding dimensions: ${embedding.length}');

// Generate embeddings for multiple texts
final embeddings = await model.batchEncode([
  'First document',
  'Second document',
  'Third document',
]);

4. Calculate Similarity

final text1 = 'Flutter is a UI toolkit';
final text2 = 'Flutter helps build mobile apps';

final embedding1 = await model.encode(text1);
final embedding2 = await model.encode(text2);

final similarity = model.cosineSimilarity(embedding1, embedding2);
print('Similarity: ${similarity.toStringAsFixed(4)}'); // 0.8234

Advanced Usage

Task-Specific Prompting

The plugin automatically applies task-specific prompts for optimal embeddings:

// For retrieval tasks (default)
final model = await embedder.createModel(
  // ... other params
  taskType: EmbeddingTaskType.retrieval,
);

// The plugin will automatically format your text with:
// "Represent this sentence for searching relevant passages: YOUR_TEXT"
final embedding = await model.encode('machine learning algorithms');

Matryoshka Embeddings

Reduce embedding dimensions for faster processing:

// Use only first 512 dimensions instead of full 768
final embedding = await model.encode(
  'Your text here',
  outputDimensionality: 512,
);

Batch Processing

Process multiple texts efficiently:

final documents = [
  'Document 1 content',
  'Document 2 content', 
  'Document 3 content',
];

final embeddings = await model.batchEncode(documents);

// Find most similar document to query
final query = 'search query';
final queryEmbedding = await model.encode(query);

double bestSimilarity = -1;
int bestIndex = -1;

for (int i = 0; i < embeddings.length; i++) {
  final similarity = model.cosineSimilarity(queryEmbedding, embeddings[i]);
  if (similarity > bestSimilarity) {
    bestSimilarity = similarity;
    bestIndex = i;
  }
}

print('Most similar document: ${documents[bestIndex]}');
print('Similarity score: ${bestSimilarity.toStringAsFixed(4)}');

Model Management

Download Models

Models need to be downloaded and stored locally. The plugin provides utilities for model management:

import 'package:flutter_gemma_embedder_example/models/embedding_model_config.dart';

// Choose a model configuration
final config = EmbeddingModelConfig.embeddingGemma300M_seq512;

// Download and store the model
// (Implementation depends on your download strategy)

Model File Structure

Store your .tflite model files in the app's documents directory:

Documents/
└── embeddinggemma-300M_seq512_mixed-precision.tflite

Performance Tips

Backend Selection

  • GPU Backend: Faster inference, higher memory usage
  • CPU Backend: Lower memory usage, slower inference
// For better performance on newer devices
final model = await embedder.createModel(
  // ... other params
  backend: PreferredBackend.gpu,
);

// For memory-constrained devices
final model = await embedder.createModel(
  // ... other params  
  backend: PreferredBackend.cpu,
);

Model Selection

Choose the right model variant for your use case:

  • 256 tokens: Fast inference for short texts (tweets, titles)
  • 512 tokens: Balanced performance for medium texts (paragraphs)
  • 1024 tokens: High capacity for long texts (articles)
  • 2048 tokens: Maximum capacity for very long texts (documents)

Example App

The plugin includes a complete example app demonstrating:

  • Model selection with filtering and sorting
  • Model download with progress tracking
  • Text embedding generation
  • Similarity comparison
  • Real-time inference

Run the example:

cd example
flutter run

API Reference

FlutterGemmaEmbedder

Main plugin singleton for creating embedding models.

class FlutterGemmaEmbedder {
  static FlutterGemmaEmbedder get instance;
  
  Future<EmbeddingModel> createModel({
    required String modelPath,
    required EmbeddingModelType modelType,
    required int dimensions,
    required EmbeddingTaskType taskType,
    required PreferredBackend backend,
  });
}

EmbeddingModel

Core class for text embedding operations.

class EmbeddingModel {
  Future<void> initialize();
  
  Future<List<double>> encode(String text, {int? outputDimensionality});
  
  Future<List<List<double>>> batchEncode(
    List<String> texts, 
    {int? outputDimensionality}
  );
  
  double cosineSimilarity(List<double> a, List<double> b);
  
  void dispose();
}

Enums

enum EmbeddingModelType {
  embeddingGemma300M,
}

enum EmbeddingTaskType {
  retrieval,
  // More task types may be added in future versions
}

enum PreferredBackend {
  cpu,
  gpu,
}

Requirements

Android

  • Minimum SDK: 21 (Android 5.0)
  • Target SDK: 34 (Android 14)
  • NDK support for TensorFlow Lite

iOS

  • iOS 12.0+
  • TensorFlow Lite Swift framework

Web

  • Modern browsers with WebAssembly support
  • CPU backend only

Troubleshooting

Common Issues

Model loading fails

  • Ensure the model file exists at the specified path
  • Check file permissions
  • Verify model file is not corrupted

Out of memory errors

  • Use CPU backend instead of GPU
  • Choose smaller model variant (256 or 512 tokens)
  • Process texts in smaller batches

Slow inference

  • Use GPU backend on supported devices
  • Choose appropriate model size for your use case
  • Enable device GPU acceleration

Web platform limitations

  • Only CPU backend is supported
  • Large models may cause memory issues
  • Consider using smaller model variants

Getting Help

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments


Built with ❀️ by the Flutter community