llm_toolkit 0.0.2 copy "llm_toolkit: ^0.0.2" to clipboard
llm_toolkit: ^0.0.2 copied to clipboard

A comprehensive Flutter SDK for running Large Language Models (LLMs) locally on mobile and desktop devices. Supports multiple inference engines including Gemma (TFLite) and Llama (GGUF) with integrate [...]

πŸš€ LLM Toolkit for Flutter #

A comprehensive Flutter SDK for running Large Language Models (LLMs) locally on mobile and desktop devices. Supports multiple inference engines including Gemma (TFLite) and Llama (GGUF) with integrated model discovery, download, and chat capabilities.

✨ Features #

🎯 Multi-Engine Support #

  • Gemma Engine: TFLite models with GPU acceleration
  • Llama Engine: GGUF models with CPU/GPU hybrid processing
  • Auto-Detection: Automatic engine selection based on model format

πŸ” Model Discovery & Management #

  • HuggingFace Integration: Search and download models directly
  • Format Support: GGUF, TFLite, GGML formats
  • Smart Filtering: Filter by size, compatibility, and popularity
  • Progress Tracking: Real-time download progress with resumption

πŸ’¬ Chat & Inference #

  • Streaming Generation: Real-time token streaming
  • Multimodal Support: Text + image input (Gemma models)
  • Configurable Parameters: Temperature, top-K, context size
  • Memory Management: Optimized for mobile devices

πŸ› οΈ Developer Tools #

  • Debug Console: Real-time logging and diagnostics
  • Performance Monitoring: Memory usage and generation metrics
  • Error Handling: Comprehensive exception handling
  • Native Library Checks: Automatic compatibility validation

πŸ“± Screenshots #

[Model Browser] [Chat Interface] [Debug Console]

πŸš€ Quick Start #

1. Add Dependency #

dependencies:
  llm_toolkit:
    git:
      url: https://github.com/DevMaan707/llm_toolkit.git
      ref: main
  flutter_gemma: ^0.2.4
  llama_cpp_dart: ^0.1.5

2. Initialize SDK #

import 'package:llm_toolkit/llm_toolkit.dart';

void main() {
  runApp(MyApp());
}

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    // Initialize LLM Toolkit
    LLMToolkit.instance.initialize(
      huggingFaceApiKey: 'your_hf_token', // Optional
      defaultConfig: InferenceConfig.mobile(),
    );

    return MaterialApp(
      home: YourHomeScreen(),
    );
  }
}

3. Search & Download Models #

// Search for models
final models = await LLMToolkit.instance.searchModels(
  'gemma 2b',
  limit: 10,
  onlyCompatible: true,
);

// Download a model
final modelPath = await LLMToolkit.instance.downloadModel(
  models.first,
  'model.tflite',
  onProgress: (progress) {
    print('Download: ${(progress * 100).toInt()}%');
  },
);

4. Load & Generate #

// Load model
await LLMToolkit.instance.loadModel(
  modelPath,
  config: InferenceConfig.mobile(),
);

// Generate text
LLMToolkit.instance.generateText(
  'Tell me about Flutter development',
  params: GenerationParams.creative(),
).listen((token) {
  print(token); // Stream of generated tokens
});

πŸ—οΈ Architecture #

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   LLM Toolkit   β”‚    β”‚  Model Providers β”‚    β”‚ Inference Mgr   β”‚
β”‚   (Main SDK)    β”œβ”€β”€β”€β”€β”€  - HuggingFace   β”œβ”€β”€β”€β”€β”€ - Gemma Engine  β”‚
β”‚                 β”‚    β”‚  - Local Files   β”‚    β”‚ - Llama Engine  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   UI Widgets    β”‚    β”‚  Model Detector  β”‚    β”‚ Config Manager  β”‚
β”‚ - Model Browser β”‚    β”‚ - Format Detectionβ”‚    β”‚ - Engine Config β”‚
β”‚ - Chat Interfaceβ”‚    β”‚ - Compatibility  β”‚    β”‚ - Parameters    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Configuration #

Inference Configurations #

// Mobile optimized
final mobileConfig = InferenceConfig.mobile();

// Desktop optimized
final desktopConfig = InferenceConfig.desktop();

// Multimodal (image + text)
final multimodalConfig = InferenceConfig.multimodal(
  maxTokens: 4096,
  maxNumImages: 1,
);

// Custom configuration
final customConfig = InferenceConfig(
  promptFormat: 'chatml',
  maxTokens: 2048,
  nCtx: 4096,
  preferredBackend: PreferredBackend.gpu,
);

Generation Parameters #

// Creative generation
final creativeParams = GenerationParams.creative();

// Precise generation
final preciseParams = GenerationParams.precise();

// Custom parameters
final customParams = GenerationParams(
  temperature: 0.8,
  topK: 40,
  maxTokens: 512,
  stopSequences: ['</s>', '\n\n'],
);

πŸ“š Examples #

Complete Chat Implementation #

class ChatScreen extends StatefulWidget {
  @override
  _ChatScreenState createState() => _ChatScreenState();
}

class _ChatScreenState extends State<ChatScreen> {
  final TextEditingController _controller = TextEditingController();
  final List<ChatMessage> _messages = [];
  bool _isGenerating = false;

  void _sendMessage() async {
    if (_controller.text.trim().isEmpty) return;

    final userMessage = ChatMessage(
      text: _controller.text,
      isUser: true,
    );

    setState(() {
      _messages.add(userMessage);
      _isGenerating = true;
    });

    final prompt = _controller.text;
    _controller.clear();

    final aiMessage = ChatMessage(text: '', isUser: false);
    setState(() => _messages.add(aiMessage));

    // Stream generation
    LLMToolkit.instance.generateText(
      prompt,
      params: GenerationParams.creative(),
    ).listen(
      (token) {
        setState(() => aiMessage.text += token);
      },
      onDone: () => setState(() => _isGenerating = false),
      onError: (error) {
        setState(() {
          aiMessage.text = 'Error: $error';
          _isGenerating = false;
        });
      },
    );
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: Text('Chat')),
      body: Column(
        children: [
          Expanded(
            child: ListView.builder(
              itemCount: _messages.length,
              itemBuilder: (context, index) =>
                ChatBubble(message: _messages[index]),
            ),
          ),
          _buildInputArea(),
        ],
      ),
    );
  }
}

Multimodal Generation #

// Generate response with image
Stream<String> generateWithImage(String prompt, String imagePath) {
  return LLMToolkit.instance.generateMultimodalResponse(
    prompt,
    [imagePath],
    params: GenerationParams(temperature: 0.7),
  );
}

// Usage
generateWithImage(
  'What do you see in this image?',
  '/path/to/image.jpg',
).listen((token) {
  print(token);
});

Model Management #

class ModelManager {
  // Search with filters
  static Future<List<ModelInfo>> searchSmallModels() {
    return LLMToolkit.instance.searchModels(
      'gemma 2b',
      format: ModelFormat.tflite,
      limit: 5,
      onlyCompatible: true,
    );
  }

  // Download with progress
  static Future<String> downloadWithProgress(
    ModelInfo model,
    String filename,
  ) async {
    return LLMToolkit.instance.downloadModel(
      model,
      filename,
      onProgress: (progress) {
        print('Progress: ${(progress * 100).toInt()}%');
      },
    );
  }

  // Load optimal model
  static Future<void> loadOptimalModel(String modelPath) async {
    final config = await _getOptimalConfig();
    await LLMToolkit.instance.loadModel(modelPath, config: config);
  }

  static Future<InferenceConfig> _getOptimalConfig() async {
    // Auto-detect optimal configuration based on device
    final deviceInfo = await DeviceInfoPlugin().androidInfo;
    final totalMemoryMB = deviceInfo.systemFeatures.length * 512; // Rough estimate

    if (totalMemoryMB < 3000) {
      return InferenceConfig.mobile();
    } else {
      return InferenceConfig.desktop();
    }
  }
}

πŸ” Debugging & Diagnostics #

Debug Console #

// Enable debug mode
LlamaInferenceEngine.setDebugMode(true);

// Get debug status
final status = llamaEngine.getDebugStatus();
print('Model loaded: ${status['isModelLoaded']}');

// Print debug info
llamaEngine.printDebugInfo();

// Check native libraries
final available = await LlamaInferenceEngine.checkNativeLibrariesAvailable();
print('Native libs available: $available');

Performance Monitoring #

// Memory recommendations
final recommendations = await LlamaInferenceEngine.getModelRecommendations();
print('Recommended quantization: ${recommendations['recommendedQuantization']}');
print('Recommended context size: ${recommendations['recommendedNCtx']}');

🎯 Supported Models #

Gemma Models (TFLite) #

  • βœ… Gemma 2B/7B IT (Instruction Tuned)
  • βœ… Gemma 2 variants
  • βœ… Gemma Nano (multimodal)
  • βœ… DeepSeek models
  • βœ… Phi-3 models

Llama Models (GGUF) #

  • βœ… Llama 2/3 (all sizes)
  • βœ… Code Llama
  • βœ… Mistral models
  • βœ… Qwen models
  • βœ… Any GGUF compatible model

Quantization Support #

  • GGUF: Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0
  • TFLite: Native TensorFlow Lite quantization
  • Recommended: Q4_K_M for best quality/size ratio

⚑ Performance Tips #

Memory Optimization #

// Use smaller context for mobile
final mobileConfig = InferenceConfig(
  nCtx: 1024,        // Smaller context
  maxTokens: 512,    // Limit output
  verbose: false,    // Reduce logging
);

// Monitor memory usage
final memInfo = await LlamaInferenceEngine.getMemoryInfo();
print('Available: ${memInfo['availableMB']}MB');

Model Selection #

  • Mobile: Use Q4_0 or Q4_K_M quantization
  • Desktop: Use Q5_K_M or Q6_K for better quality
  • RAM < 4GB: Stick to 2B/3B parameter models
  • RAM > 6GB: 7B parameter models work well

Generation Optimization #

// Faster generation
final fastParams = GenerationParams(
  temperature: 0.1,  // More deterministic
  topK: 1,          // Greedy sampling
  maxTokens: 256,   // Shorter responses
);

// Balanced generation
final balancedParams = GenerationParams(
  temperature: 0.7,
  topK: 40,
  maxTokens: 512,
);

πŸ› οΈ Troubleshooting #

Common Issues #

Model not loading:

// Check model file integrity
final isValid = await LlamaInferenceEngine.validateGGUFFile(modelPath);
if (!isValid) {
  print('Model file is corrupted, re-download required');
}

Out of memory errors:

// Use smaller models or reduce context
final safeConfig = InferenceConfig(
  nCtx: 512,         // Reduce context
  maxTokens: 256,    // Limit output
);

Native library issues:

// Check native library availability
final available = await LlamaInferenceEngine.checkNativeLibrariesAvailable();
if (!available) {
  print('Native libraries not found. Check app bundle.');
}

Error Codes #

Error Description Solution
InferenceException Model loading failed Check model format and memory
ModelProviderException Download/search failed Check network and API keys
DownloadException File download failed Check storage space and network
VectorStorageException RAG operations failed Check database permissions

πŸ“¦ Dependencies #

Core Dependencies #

dependencies:
  flutter_gemma: ^0.2.4      # Gemma inference engine
  llama_cpp_dart: ^0.1.5     # Llama inference engine
  dio: ^5.3.2                # HTTP client
  path_provider: ^2.1.1      # File system access

Optional Dependencies #

dependencies:
  device_info_plus: ^9.1.0   # Device information
  permission_handler: ^11.0.1 # Storage permissions
  shared_preferences: ^2.2.2  # Settings storage

🀝 Contributing #

We welcome contributions! Please see our Contributing Guide for details.

Development Setup #

# Clone repository
git clone https://github.com/DevMaan707/llm_toolkit.git

# Get dependencies
flutter pub get

# Run example app
cd example
flutter run

Testing #

# Run tests
flutter test

# Run integration tests
flutter test integration_test/

πŸ“„ License #

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments #

πŸ“ž Support #


Made with ❀️ for the Flutter community

⭐ Star us on GitHub β€’ 🐦 Follow on Twitter β€’ πŸ“¦ Pub.dev Package

5
likes
0
points
117
downloads

Publisher

unverified uploader

Weekly Downloads

A comprehensive Flutter SDK for running Large Language Models (LLMs) locally on mobile and desktop devices. Supports multiple inference engines including Gemma (TFLite) and Llama (GGUF) with integrated model discovery, download, and chat capabilities.

Repository (GitHub)
View/report issues

License

unknown (license)

Dependencies

collection, crypto, dio, flutter, flutter_gemma, http, json_annotation, llama_cpp_dart, objectbox, objectbox_flutter_libs, path, path_provider

More

Packages that depend on llm_toolkit