Llamafu #

A Flutter package for running language models on device with support for completion, instruct mode, tool calling, streaming, constrained generation, LoRA, and multi-modal inputs (images, audio).

Features #

🚀 Run language models directly on device (Android and iOS)
💬 Support for text completion
🤖 Instruct mode for chat-like interactions
🛠️ Tool calling capabilities
🌊 Streaming output
🔒 Constrained generation (GBNF grammars)
🧬 LoRA adapter support
🖼️🎧 Multi-modal support (images, audio)

Prerequisites #

Flutter 3.0 or higher
Android SDK/NDK for Android development
Xcode for iOS development
Pre-built llama.cpp libraries

Installation #

Add llamafu as a dependency in your pubspec.yaml file:

dependencies:
  llamafu: ^0.0.1

Then run:

flutter pub get

Usage #

Text Completion #

import 'package:llamafu/llamafu.dart';

// Initialize the model
final llamafu = await Llamafu.init(
  modelPath: '/path/to/your/model.gguf',
  threads: 4,
  contextSize: 512,
);

// Generate text
final result = await llamafu.complete(
  prompt: 'The quick brown fox',
  maxTokens: 128,
  temperature: 0.8,
);

print(result);

// Clean up resources
llamafu.close();

import 'package:llamafu/llamafu.dart';

// Initialize the model with multi-modal projector
final llamafu = await Llamafu.init(
  modelPath: '/path/to/your/model.gguf',
  mmprojPath: '/path/to/your/mmproj.gguf', // Multi-modal projector file
  threads: 4,
  contextSize: 512,
  useGpu: false, // Set to true to use GPU for multi-modal processing
);

// Generate text with image input
final mediaInputs = [
  MediaInput(
    type: MediaType.image,
    data: '/path/to/your/image.jpg', // Path to image file
  ),
];

final result = await llamafu.multimodalComplete(
  prompt: 'Describe this image: <image>',
  mediaInputs: mediaInputs,
  maxTokens: 128,
  temperature: 0.8,
);

print(result);

// Clean up resources
llamafu.close();

LoRA Adapter Support #

import 'package:llamafu/llamafu.dart';

// Initialize the model
final llamafu = await Llamafu.init(
  modelPath: '/path/to/your/model.gguf',
  threads: 4,
  contextSize: 512,
);

// Load a LoRA adapter
final loraAdapter = await llamafu.loadLoraAdapter('/path/to/your/lora.gguf');

// Apply the LoRA adapter with a scale factor
await llamafu.applyLoraAdapter(loraAdapter, scale: 0.5);

// Generate text with the LoRA adapter applied
final result = await llamafu.complete(
  prompt: 'Write a story about space exploration',
  maxTokens: 128,
  temperature: 0.8,
);

print(result);

// Remove the LoRA adapter
await llamafu.removeLoraAdapter(loraAdapter);

// Or clear all LoRA adapters
await llamafu.clearAllLoraAdapters();

// Clean up resources
llamafu.close();

Constrained Generation #

import 'package:llamafu/llamafu.dart';

// Initialize the model
final llamafu = await Llamafu.init(
  modelPath: '/path/to/your/model.gguf',
  threads: 4,
  contextSize: 512,
);

// Define a JSON grammar
final jsonGrammar = '''
root   ::= object
value  ::= object | array | string | number | ("true" | "false" | "null") ws

object ::=
  "{" ws (
            string ":" ws value
    ("," ws string ":" ws value)*
  )? "}" ws

array  ::=
  "[" ws (
            value
    ("," ws value)*
  )? "]" ws

string ::=
  "\"" (
    [^\"\\\\\x7F\x00-\x1F] |
    "\\\\" (["\\\\bfnrt] | "u" [0-9a-fA-F]{4}) # escapes
  )* "\"" ws

number ::= ("-"? ([0-9] | [1-9] [0-9]{0,15})) ("." [0-9]+)? ([eE] [-+]? [0-9] [1-9]{0,15})? ws

# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= | " " | "\n" [ \t]{0,20}
''';

// Generate text constrained to JSON format
final result = await llamafu.completeWithGrammar(
  prompt: 'Generate a JSON object describing a person:',
  grammarStr: jsonGrammar,
  grammarRoot: 'root',
  maxTokens: 256,
  temperature: 0.8,
);

print(result);

// Clean up resources
llamafu.close();

Llamafu supports various multi-modal models through the llama.cpp MTMD library:

Vision Models #

Gemma 3
SmolVLM
Pixtral 12B
Qwen 2 VL
Qwen 2.5 VL
Mistral Small 3.1 24B
InternVL 2.5 and 3
Llama 4 Scout
Moondream2

Audio Models #

Ultravox 0.5
Qwen2-Audio
SeaLLM-Audio
Voxtral Mini

Mixed Modalities #

Qwen2.5 Omni (audio + vision)

Building #

Android #

Ensure you have the Android NDK installed

Build the native libraries:

cd android/src/main/cpp
mkdir build
cd build
cmake .. -DLLAMA_CPP_DIR=/path/to/llama.cpp
make

iOS #

Ensure you have Xcode installed
Build the native libraries using Xcode or CMake

API Reference #

For detailed API documentation, please refer to the API documentation.

License #

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments #

This project uses the excellent llama.cpp library for running language models.
Multi-modal support is provided by the MTMD library in llama.cpp.
LoRA support is provided by the native LoRA adapter functionality in llama.cpp.
Constrained generation support is provided by the grammar sampler functionality in llama.cpp.

llamafu 0.0.1
llamafu: ^0.0.1 copied to clipboard

Metadata

Llamafu #

Features #

Prerequisites #

Installation #

Usage #

Text Completion #

LoRA Adapter Support #

Constrained Generation #

Vision Models #

Audio Models #

Mixed Modalities #

Building #

Android #

iOS #

API Reference #

License #

Acknowledgments #

← Metadata

Publisher

Weekly Downloads

Metadata

Topics

Documentation

License

Dependencies

More

llamafu 0.0.1 llamafu: ^0.0.1 copied to clipboard

Metadata

Llamafu #

Features #

Prerequisites #

Installation #

Usage #

Text Completion #

Multi-modal Inference #

LoRA Adapter Support #

Constrained Generation #

Supported Multi-modal Models #

Vision Models #

Audio Models #

Mixed Modalities #

Building #

Android #

iOS #

API Reference #

License #

Acknowledgments #

← Metadata

Publisher

Weekly Downloads

Metadata

Topics

Documentation

License

Dependencies

More

llamafu 0.0.1
llamafu: ^0.0.1 copied to clipboard