llm_toolkit 0.0.3
llm_toolkit: ^0.0.3 copied to clipboard
A comprehensive Flutter SDK for running Large Language Models (LLMs) locally on mobile and desktop devices. Supports multiple inference engines including Gemma (TFLite) and Llama (GGUF) with integrate [...]
π LLM Toolkit for Flutter #
A comprehensive Flutter SDK for running Large Language Models (LLMs) locally on mobile and desktop devices. Supports multiple inference engines including Gemma (TFLite) and Llama (GGUF) with integrated model discovery, download, and chat capabilities.
β¨ Features #
π― Multi-Engine Support #
- Gemma Engine: TFLite models with GPU acceleration
- Llama Engine: GGUF models with CPU/GPU hybrid processing
- Auto-Detection: Automatic engine selection based on model format
π Model Discovery & Management #
- HuggingFace Integration: Search and download models directly
- Format Support: GGUF, TFLite, GGML formats
- Smart Filtering: Filter by size, compatibility, and popularity
- Progress Tracking: Real-time download progress with resumption
π¬ Chat & Inference #
- Streaming Generation: Real-time token streaming
- Multimodal Support: Text + image input (Gemma models)
- Configurable Parameters: Temperature, top-K, context size
- Memory Management: Optimized for mobile devices
π οΈ Developer Tools #
- Debug Console: Real-time logging and diagnostics
- Performance Monitoring: Memory usage and generation metrics
- Error Handling: Comprehensive exception handling
- Native Library Checks: Automatic compatibility validation
π± Screenshots #



π Quick Start #
1. Add Dependency #
dependencies:
llm_toolkit:
git:
url: https://github.com/DevMaan707/llm_toolkit.git
ref: main
flutter_gemma: ^0.2.4
llama_cpp_dart: ^0.1.5
2. Initialize SDK #
import 'package:llm_toolkit/llm_toolkit.dart';
void main() {
runApp(MyApp());
}
class MyApp extends StatelessWidget {
@override
Widget build(BuildContext context) {
// Initialize LLM Toolkit
LLMToolkit.instance.initialize(
huggingFaceApiKey: 'your_hf_token', // Optional
defaultConfig: InferenceConfig.mobile(),
);
return MaterialApp(
home: YourHomeScreen(),
);
}
}
3. Search & Download Models #
// Search for models
final models = await LLMToolkit.instance.searchModels(
'gemma 2b',
limit: 10,
onlyCompatible: true,
);
// Download a model
final modelPath = await LLMToolkit.instance.downloadModel(
models.first,
'model.tflite',
onProgress: (progress) {
print('Download: ${(progress * 100).toInt()}%');
},
);
4. Load & Generate #
// Load model
await LLMToolkit.instance.loadModel(
modelPath,
config: InferenceConfig.mobile(),
);
// Generate text
LLMToolkit.instance.generateText(
'Tell me about Flutter development',
params: GenerationParams.creative(),
).listen((token) {
print(token); // Stream of generated tokens
});
ποΈ Architecture #
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β LLM Toolkit β β Model Providers β β Inference Mgr β
β (Main SDK) ββββββ€ - HuggingFace ββββββ€ - Gemma Engine β
β β β - Local Files β β - Llama Engine β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β UI Widgets β β Model Detector β β Config Manager β
β - Model Browser β β - Format Detectionβ β - Engine Config β
β - Chat Interfaceβ β - Compatibility β β - Parameters β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
π§ Configuration #
Inference Configurations #
// Mobile optimized
final mobileConfig = InferenceConfig.mobile();
// Desktop optimized
final desktopConfig = InferenceConfig.desktop();
// Multimodal (image + text)
final multimodalConfig = InferenceConfig.multimodal(
maxTokens: 4096,
maxNumImages: 1,
);
// Custom configuration
final customConfig = InferenceConfig(
promptFormat: 'chatml',
maxTokens: 2048,
nCtx: 4096,
preferredBackend: PreferredBackend.gpu,
);
Generation Parameters #
// Creative generation
final creativeParams = GenerationParams.creative();
// Precise generation
final preciseParams = GenerationParams.precise();
// Custom parameters
final customParams = GenerationParams(
temperature: 0.8,
topK: 40,
maxTokens: 512,
stopSequences: ['</s>', '\n\n'],
);
π Examples #
Complete Chat Implementation #
class ChatScreen extends StatefulWidget {
@override
_ChatScreenState createState() => _ChatScreenState();
}
class _ChatScreenState extends State<ChatScreen> {
final TextEditingController _controller = TextEditingController();
final List<ChatMessage> _messages = [];
bool _isGenerating = false;
void _sendMessage() async {
if (_controller.text.trim().isEmpty) return;
final userMessage = ChatMessage(
text: _controller.text,
isUser: true,
);
setState(() {
_messages.add(userMessage);
_isGenerating = true;
});
final prompt = _controller.text;
_controller.clear();
final aiMessage = ChatMessage(text: '', isUser: false);
setState(() => _messages.add(aiMessage));
// Stream generation
LLMToolkit.instance.generateText(
prompt,
params: GenerationParams.creative(),
).listen(
(token) {
setState(() => aiMessage.text += token);
},
onDone: () => setState(() => _isGenerating = false),
onError: (error) {
setState(() {
aiMessage.text = 'Error: $error';
_isGenerating = false;
});
},
);
}
@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(title: Text('Chat')),
body: Column(
children: [
Expanded(
child: ListView.builder(
itemCount: _messages.length,
itemBuilder: (context, index) =>
ChatBubble(message: _messages[index]),
),
),
_buildInputArea(),
],
),
);
}
}
Multimodal Generation #
// Generate response with image
Stream<String> generateWithImage(String prompt, String imagePath) {
return LLMToolkit.instance.generateMultimodalResponse(
prompt,
[imagePath],
params: GenerationParams(temperature: 0.7),
);
}
// Usage
generateWithImage(
'What do you see in this image?',
'/path/to/image.jpg',
).listen((token) {
print(token);
});
Model Management #
class ModelManager {
// Search with filters
static Future<List<ModelInfo>> searchSmallModels() {
return LLMToolkit.instance.searchModels(
'gemma 2b',
format: ModelFormat.tflite,
limit: 5,
onlyCompatible: true,
);
}
// Download with progress
static Future<String> downloadWithProgress(
ModelInfo model,
String filename,
) async {
return LLMToolkit.instance.downloadModel(
model,
filename,
onProgress: (progress) {
print('Progress: ${(progress * 100).toInt()}%');
},
);
}
// Load optimal model
static Future<void> loadOptimalModel(String modelPath) async {
final config = await _getOptimalConfig();
await LLMToolkit.instance.loadModel(modelPath, config: config);
}
static Future<InferenceConfig> _getOptimalConfig() async {
// Auto-detect optimal configuration based on device
final deviceInfo = await DeviceInfoPlugin().androidInfo;
final totalMemoryMB = deviceInfo.systemFeatures.length * 512; // Rough estimate
if (totalMemoryMB < 3000) {
return InferenceConfig.mobile();
} else {
return InferenceConfig.desktop();
}
}
}
π Debugging & Diagnostics #
Debug Console #
// Enable debug mode
LlamaInferenceEngine.setDebugMode(true);
// Get debug status
final status = llamaEngine.getDebugStatus();
print('Model loaded: ${status['isModelLoaded']}');
// Print debug info
llamaEngine.printDebugInfo();
// Check native libraries
final available = await LlamaInferenceEngine.checkNativeLibrariesAvailable();
print('Native libs available: $available');
Performance Monitoring #
// Memory recommendations
final recommendations = await LlamaInferenceEngine.getModelRecommendations();
print('Recommended quantization: ${recommendations['recommendedQuantization']}');
print('Recommended context size: ${recommendations['recommendedNCtx']}');
π― Supported Models #
Gemma Models (TFLite) #
- β Gemma 2B/7B IT (Instruction Tuned)
- β Gemma 2 variants
- β Gemma Nano (multimodal)
- β DeepSeek models
- β Phi-3 models
Llama Models (GGUF) #
- β Llama 2/3 (all sizes)
- β Code Llama
- β Mistral models
- β Qwen models
- β Any GGUF compatible model
Quantization Support #
- GGUF: Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0
- TFLite: Native TensorFlow Lite quantization
- Recommended: Q4_K_M for best quality/size ratio
β‘ Performance Tips #
Memory Optimization #
// Use smaller context for mobile
final mobileConfig = InferenceConfig(
nCtx: 1024, // Smaller context
maxTokens: 512, // Limit output
verbose: false, // Reduce logging
);
// Monitor memory usage
final memInfo = await LlamaInferenceEngine.getMemoryInfo();
print('Available: ${memInfo['availableMB']}MB');
Model Selection #
- Mobile: Use Q4_0 or Q4_K_M quantization
- Desktop: Use Q5_K_M or Q6_K for better quality
- RAM < 4GB: Stick to 2B/3B parameter models
- RAM > 6GB: 7B parameter models work well
Generation Optimization #
// Faster generation
final fastParams = GenerationParams(
temperature: 0.1, // More deterministic
topK: 1, // Greedy sampling
maxTokens: 256, // Shorter responses
);
// Balanced generation
final balancedParams = GenerationParams(
temperature: 0.7,
topK: 40,
maxTokens: 512,
);
π οΈ Troubleshooting #
Common Issues #
Model not loading:
// Check model file integrity
final isValid = await LlamaInferenceEngine.validateGGUFFile(modelPath);
if (!isValid) {
print('Model file is corrupted, re-download required');
}
Out of memory errors:
// Use smaller models or reduce context
final safeConfig = InferenceConfig(
nCtx: 512, // Reduce context
maxTokens: 256, // Limit output
);
Native library issues:
// Check native library availability
final available = await LlamaInferenceEngine.checkNativeLibrariesAvailable();
if (!available) {
print('Native libraries not found. Check app bundle.');
}
Error Codes #
Error | Description | Solution |
---|---|---|
InferenceException |
Model loading failed | Check model format and memory |
ModelProviderException |
Download/search failed | Check network and API keys |
DownloadException |
File download failed | Check storage space and network |
VectorStorageException |
RAG operations failed | Check database permissions |
π¦ Dependencies #
Core Dependencies #
dependencies:
flutter_gemma: ^0.2.4 # Gemma inference engine
llama_cpp_dart: ^0.1.5 # Llama inference engine
dio: ^5.3.2 # HTTP client
path_provider: ^2.1.1 # File system access
Optional Dependencies #
dependencies:
device_info_plus: ^9.1.0 # Device information
permission_handler: ^11.0.1 # Storage permissions
shared_preferences: ^2.2.2 # Settings storage
π€ Contributing #
We welcome contributions! Please see our Contributing Guide for details.
Development Setup #
# Clone repository
git clone https://github.com/DevMaan707/llm_toolkit.git
# Get dependencies
flutter pub get
# Run example app
cd example
flutter run
Testing #
# Run tests
flutter test
# Run integration tests
flutter test integration_test/
π License #
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments #
- Flutter Gemma - Gemma inference engine
- Llama.cpp Dart - Llama inference engine
- HuggingFace - Model repository and API
- Google - Gemma model family
π Support #
- π§ Email: support@llm-toolkit.dev
- π¬ Discord: Join our community
- π Issues: GitHub Issues
- π Docs: Full Documentation
Made with β€οΈ for the Flutter community
β Star us on GitHub β’ π¦ Follow on Twitter β’ π¦ Pub.dev Package