HTTP client bindings to call the llama.cpp RPC server.

Usage

import 'package:llamacpp_rpc_client/llamacpp_rpc_client.dart';

void main() async {
  final client = LlamacppRpcClient('http://localhost:8080');

  // Text completion
  final completion = await client.completion(
    'The capital of France is',
    options: CompletionOptions(
      maxTokens: 50,
      temperature: 0.7,
    ),
  );
  print(completion.content);

  // Streaming completion
  await for (final chunk in client.streamCompletion('Tell me a story')) {
    print(chunk.content);
  }

  // Text embedding
  final embedding = await client.embedding('Hello world');
  print(embedding.embedding.length);

  client.close();
}

CLI Usage

The package includes a command-line interface for easy interaction with llama.cpp servers:

Completion Command

Generate text completions:

dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "The capital of France is" \
  --temperature 0.7 \
  --max-tokens 50

# Stream completion in real-time
dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "Tell me a story" \
  --stream

# Deterministic generation with seed
dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "Hello world" \
  --seed 42

Options:

  • --url, -u: Base URL of the llama.cpp RPC server (required)
  • --prompt, -p: Input prompt for completion (required)
  • --temperature, -t: Temperature for randomness (0.0-2.0)
  • --max-tokens, -m: Maximum tokens to generate
  • --top-p: Nucleus sampling parameter (0.0-1.0)
  • --top-k: Top-k sampling parameter
  • --seed: Random seed for deterministic generation
  • --stream, -s: Stream completion in real-time

Embedding Command

Generate text embeddings:

dart run bin/llamacpp_rpc_client.dart embedding \
  --url http://localhost:8080 \
  --input "machine learning"

# Output raw embedding values
dart run bin/llamacpp_rpc_client.dart embedding \
  --url http://localhost:8080 \
  --input "artificial intelligence" \
  --raw

Options:

  • --url, -u: Base URL of the llama.cpp RPC server (required)
  • --input, -i: Input text for embedding generation (required)
  • --raw, -r: Output raw embedding vector values