LLM Toolkit SDK Changelog #

0.0.4 July 4, 2025 #

🎤 New Features - Speech Recognition (ASR) #

Complete ASR Integration - Full TensorFlow Lite ASR engine with Whisper model support
Real-time Speech Recognition - Live streaming transcription with continuous audio processing
Multi-format Audio Support - WAV, PCM16, PCM32, Float32 audio format compatibility
Advanced Audio Preprocessing - Whisper-compatible mel-spectrogram generation with FFT processing
Voice Activity Detection (VAD) - Smart silence detection and automatic recording termination
Chunked Audio Processing - Efficient handling of long audio files with overlap management
ASR Service Layer - High-level service wrapper for easy speech recognition integration
Microphone Integration - Native microphone access with permission handling

🔊 ASR Technical Features #

Enhanced Audio Pipeline - Optimized mel-filter bank computation with isolate processing
Adaptive Processing - Dynamic audio length optimization based on content characteristics
Silence Detection - Advanced RMS-based silence detection with configurable thresholds
Audio Resampling - High-quality interpolation-based audio resampling for format compatibility
Model Type Detection - Automatic ASR model type detection (Whisper, Wav2Vec2, DeepSpeech)
Streaming Optimization - Low-latency streaming with configurable chunk sizes and overlap
Memory Efficient - Optimized for mobile devices with minimal memory footprint
Error Recovery - Robust error handling with graceful fallbacks for audio processing

🎯 ASR Configuration Options #

ASRConfig Class - Comprehensive configuration for sample rates, bit rates, and streaming parameters
Quality Presets - Pre-built configurations for high-quality, low-latency, mobile, and streaming scenarios
VAD Settings - Configurable voice activity detection with silence timeouts and thresholds
Performance Tuning - Adjustable confidence thresholds and token limits for optimal performance
Streaming Parameters - Customizable streaming intervals and chunk processing settings

🔧 ASR Service Methods #

File Transcription - transcribeFile() for processing audio files with enhanced chunking
Byte Transcription - transcribeBytes() for direct audio data processing
Live Recording - startRecording() / stopRecording() for real-time audio capture
Streaming Recognition - startStreamingTranscription() for continuous live transcription
VAD Recording - recordWithVAD() for intelligent voice-activated recording
Timed Recording - recordAndTranscribe() for fixed-duration audio capture

🎤 ASR Audio Processing #

WAV Header Parsing - Complete WAV file format validation and metadata extraction
Multi-channel Support - Automatic stereo-to-mono conversion for ASR compatibility
Audio Validation - Comprehensive audio file validation with detailed error reporting
Format Detection - Automatic audio format detection and appropriate decoder selection
Quality Metrics - Audio quality assessment and processing recommendations

⚠️ Current Limitations #

ASR Status: ASR functionality is implemented but currently experiencing stability issues and may not work reliably in all scenarios
Multimodal Status: Multimodal image+text generation is implemented but not thoroughly tested - use with caution
Platform Compatibility: ASR features may have varying performance across different Android devices
Model Compatibility: Some ASR models may require specific preprocessing that hasn't been fully optimized

🔧 Improvements from Previous Version #

Enhanced Multimodal Support - Improved Gemma multimodal integration with proper image handling
Better Engine Detection - More accurate ASR model type detection and configuration
Optimized Memory Usage - Reduced memory footprint for ASR processing on mobile devices
Improved Error Messages - More descriptive error messages for ASR-related issues
Debug Enhancements - Extended debug console with ASR-specific logging and metrics

🐛 Bug Fixes #

Fixed audio preprocessing pipeline for Whisper model compatibility
Resolved memory leaks in continuous ASR processing
Corrected mel-spectrogram generation for various audio durations
Fixed streaming audio buffer management and overlap handling
Resolved TensorFlow Lite model loading issues for ASR engines
Fixed microphone permission handling across different Android versions

📚 Documentation Updates #

ASR Integration Guide - Complete guide for implementing speech recognition features
Audio Processing Tutorial - Technical documentation for audio preprocessing and format handling
Streaming Best Practices - Guidelines for optimal real-time speech recognition performance
Troubleshooting Guide - Common ASR issues and their solutions
Model Compatibility Matrix - Supported ASR models and their specific requirements

🔮 Upcoming in Next Release #

ASR Stability Improvements - Focus on resolving current ASR reliability issues
Multimodal Testing - Comprehensive testing and validation of image+text generation
Additional ASR Models - Support for more ASR model architectures and formats
Performance Optimizations - Further memory and speed optimizations for mobile deployment

0.0.3 July 1, 2025 #

🚀 New Features #

RAG (Retrieval-Augmented Generation) Architecture - Complete modular RAG system with engine-agnostic design
Multi-Engine RAG Support - Extensible architecture supporting GGUF, SGML, TFLite, and future inference engines
Document Management System - Add, remove, and manage documents with automatic chunking and embedding generation
Intelligent Document Chunking - Multiple chunking strategies (sentence, paragraph, fixed-size) with configurable overlap
Semantic Search - Cosine similarity-based document retrieval with confidence scoring
Context-Aware Generation - Generate answers using relevant document chunks as context
Llama RAG Engine - First implementation using llama_cpp_dart for both embeddings and text generation
Flexible Configuration - Customizable similarity thresholds, chunk sizes, and generation parameters

🔧 RAG System Components #

RagEngine Interface - Abstract base class for implementing different RAG backends
DocumentChunk Model - Structured document representation with metadata and relevance scoring
TextChunker Utility - Advanced text splitting with sentence-aware chunking and overlap management
SimilarityCalculator - Multiple similarity metrics (cosine, euclidean) for document matching
RagService Integration - Seamless integration with existing LLM service architecture
Batch Processing - Efficient batch embedding generation for large document collections

🧠 RAG Features #

Document Metadata Support - Rich metadata storage and filtering capabilities
Multi-Document Support - Handle multiple documents with unique identification
Relevance Scoring - Confidence-based ranking of retrieved document chunks
Dynamic Context Building - Intelligent context assembly from multiple relevant chunks
Memory-Efficient Processing - Optimized for mobile devices with limited memory
Streaming Integration - Compatible with existing streaming text generation

🔧 Technical Improvements #

Modular Architecture - Clean separation between RAG core, engines, and utilities
Engine Abstraction - Easy addition of new inference engines (TFLite, ONNX, etc.)
Error Handling - Comprehensive error management for document processing and retrieval
Performance Optimization - Efficient embedding caching and similarity computation
Memory Management - Smart memory usage for large document collections
Async Operations - Non-blocking document processing and query handling

📚 RAG Configuration Options #

Chunking Configuration - Customizable chunk size, overlap, and splitting strategies
Retrieval Parameters - Adjustable similarity thresholds and maximum relevant chunks
Generation Settings - Temperature, max tokens, and system prompt customization
Engine-Specific Settings - Tailored configurations for different inference engines

🛠️ Implementation Details #

LlamaRagEngine - Complete implementation using llama_cpp_dart for embeddings and generation
Document Processing Pipeline - Automated chunking, embedding, and indexing workflow
Query Processing - Efficient query embedding and similarity search
Context Assembly - Smart context building with relevance-based chunk selection
Response Generation - Context-aware answer generation with source attribution

🔧 Improvements from Previous Version #

Enhanced Model Browser with tabbed interface (Search, Recommended, Downloaded)
File Browser Integration - Load models directly from device storage
Device-Specific Recommendations - Memory-aware model suggestions based on device capabilities
Advanced Debug Console - Real-time logging with color-coded categories and filtering
Improved Model Cards - Better file organization, download status, and compatibility indicators
Native Library Validation - Automatic detection of llama.cpp compatibility and health checks
Progressive Model Loading - Fallback configurations for memory-constrained devices
Enhanced Error Handling - Detailed error messages with troubleshooting tips and recovery suggestions

🐛 Bug Fixes #

Fixed SIGSEGV crashes with problematic quantizations on Android ARM64 devices
Resolved memory leaks during model loading/unloading cycles
Fixed download interruption handling and resume functionality
Corrected model file validation for corrupted GGUF files
Fixed context size calculation for low-memory devices
Resolved engine detection issues for certain model formats
Fixed UI state management during concurrent operations

📚 Documentation Updates #

RAG Architecture Guide - Comprehensive documentation for implementing RAG systems
Engine Integration Tutorial - Step-by-step guide for adding new inference engines
Document Processing Best Practices - Guidelines for optimal chunking and embedding strategies
Performance Optimization - RAG-specific performance tuning recommendations
API Reference - Complete API documentation for RAG components and utilities

🎯 Future RAG Roadmap #

Vector Database Integration - Support for external vector databases (Pinecone, Weaviate)
Advanced Retrieval Strategies - Hybrid search, re-ranking, and query expansion
Multi-Modal RAG - Support for images, PDFs, and other document formats
Distributed RAG - Cloud-based document processing and retrieval
RAG Analytics - Performance metrics and retrieval quality analysis

0.0.2 June 29, 2025 #

🚀 New Features #

Enhanced Model Browser with tabbed interface (Search, Recommended, Downloaded)
File Browser Integration - Load models directly from device storage
Device-Specific Recommendations - Memory-aware model suggestions based on device capabilities
Advanced Debug Console - Real-time logging with color-coded categories and filtering
Improved Model Cards - Better file organization, download status, and compatibility indicators
Native Library Validation - Automatic detection of llama.cpp compatibility and health checks
Progressive Model Loading - Fallback configurations for memory-constrained devices
Enhanced Error Handling - Detailed error messages with troubleshooting tips and recovery suggestions

🔧 Improvements #

Quantization Safety Checks - Automatic detection and warnings for unstable quantizations (Q2_K, IQ1_S, IQ1_M)
Memory Management - Dynamic context size adjustment based on available device memory
Download Progress - Real-time progress indicators with speed metrics and ETA
Model Detection - Improved engine detection for GGUF and TFLite files with better accuracy
UI/UX Enhancements - Material Design 3 with gradient themes, better accessibility, and responsive design
Performance Monitoring - Token generation speed tracking, memory usage reporting, and performance analytics
Search Optimization - Better model discovery with multiple search strategies and relevance ranking

🐛 Bug Fixes #

Fixed SIGSEGV crashes with problematic quantizations on Android ARM64 devices
Resolved memory leaks during model loading/unloading cycles
Fixed download interruption handling and resume functionality
Corrected model file validation for corrupted GGUF files
Fixed context size calculation for low-memory devices
Resolved engine detection issues for certain model formats
Fixed UI state management during concurrent operations

📚 Documentation #

Added comprehensive model compatibility guide with device-specific recommendations
Enhanced troubleshooting documentation for Android devices with common solutions
Updated API documentation with new configuration options and examples
Added performance optimization guidelines for mobile deployment

🛠️ Technical Improvements #

Isolate-based Model Loading - Better crash protection and memory isolation
Enhanced Logging System - Categorized debug logs with filtering and export capabilities
Improved Error Recovery - Graceful fallback mechanisms for failed operations
Better Resource Management - Optimized memory usage and cleanup procedures

0.0.1 June 27, 2025 #

🎉 Initial Release #

Dual Engine Support - Llama (GGUF) and Gemma (TFLite) inference engines
HuggingFace Integration - Direct model search and download from HuggingFace Hub
Model Management - Automatic model detection and configuration
Chat Interface - Real-time streaming text generation with conversation history
Cross-Platform - Support for Android and iOS devices

🔧 Core Features #

Model Search - Search and filter models by format, size, and compatibility
Automatic Downloads - Background model downloading with progress tracking
Engine Detection - Automatic selection of appropriate inference engine based on model format
Memory Optimization - Dynamic configuration based on device capabilities
Error Handling - Comprehensive exception handling and user feedback

📱 Supported Formats #

GGUF - Quantized models for efficient inference (Q4_K_M, Q4_0, Q5_K_M, Q8_0)
TFLite - TensorFlow Lite models optimized for mobile devices
Model Types - Support for Gemma, Llama, Phi, DeepSeek, and custom models

🛠️ Technical Stack #

Flutter Framework - Cross-platform mobile development
llama_cpp_dart - Native GGUF model inference with ARM64 optimization
flutter_gemma - TFLite model inference with GPU acceleration
Dio HTTP Client - Efficient model downloading and API communication
Path Provider - Cross-platform file system access and management

🎯 Initial Capabilities #

Text Generation - Streaming text generation with customizable parameters
Model Discovery - Browse and search thousands of compatible models
Local Storage - Efficient model caching and management
Debug Tools - Basic logging and error reporting
Configuration Management - Flexible inference configuration options

llm_toolkit 0.0.4 llm_toolkit: ^0.0.4 copied to clipboard

Metadata

LLM Toolkit SDK Changelog #

0.0.4 July 4, 2025 #

🎤 New Features - Speech Recognition (ASR) #

🔊 ASR Technical Features #

🎯 ASR Configuration Options #

🔧 ASR Service Methods #

🎤 ASR Audio Processing #

⚠️ Current Limitations #

🔧 Improvements from Previous Version #

🐛 Bug Fixes #

📚 Documentation Updates #

🔮 Upcoming in Next Release #

0.0.3 July 1, 2025 #

🚀 New Features #

🔧 RAG System Components #

🧠 RAG Features #

🔧 Technical Improvements #

📚 RAG Configuration Options #

🛠️ Implementation Details #

🔧 Improvements from Previous Version #

🐛 Bug Fixes #

📚 Documentation Updates #

🎯 Future RAG Roadmap #

0.0.2 June 29, 2025 #

🚀 New Features #

🔧 Improvements #

🐛 Bug Fixes #

📚 Documentation #

🛠️ Technical Improvements #

0.0.1 June 27, 2025 #

🎉 Initial Release #

🔧 Core Features #

📱 Supported Formats #

🛠️ Technical Stack #

🎯 Initial Capabilities #

← Metadata

Publisher

Weekly Downloads

Metadata

Documentation

License

Dependencies

More

llm_toolkit 0.0.4
llm_toolkit: ^0.0.4 copied to clipboard