llamacpp_rpc_client 0.2.0 copy "llamacpp_rpc_client: ^0.2.0" to clipboard
llamacpp_rpc_client: ^0.2.0 copied to clipboard

HTTP client bindings to call the llama.cpp RPC server

HTTP client bindings to call the llama.cpp RPC server.

Usage #

import 'package:llamacpp_rpc_client/llamacpp_rpc_client.dart';

void main() async {
  final client = LlamacppRpcClient('http://localhost:8080');

  // Text completion
  final completion = await client.completion(
    'The capital of France is',
    options: CompletionOptions(
      maxTokens: 50,
      temperature: 0.7,
    ),
  );
  print(completion.content);

  // Streaming completion
  await for (final chunk in client.streamCompletion('Tell me a story')) {
    print(chunk.content);
  }

  // Text embedding
  final embedding = await client.embedding('Hello world');
  print(embedding.embedding.length);

  client.close();
}

CLI Usage #

The package includes a command-line interface for easy interaction with llama.cpp servers:

Completion Command #

Generate text completions:

dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "The capital of France is" \
  --temperature 0.7 \
  --max-tokens 50

# Stream completion in real-time
dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "Tell me a story" \
  --stream

# Deterministic generation with seed
dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "Hello world" \
  --seed 42

Options:

  • --url, -u: Base URL of the llama.cpp RPC server (required)
  • --prompt, -p: Input prompt for completion (required)
  • --temperature, -t: Temperature for randomness (0.0-2.0)
  • --max-tokens, -m: Maximum tokens to generate
  • --top-p: Nucleus sampling parameter (0.0-1.0)
  • --top-k: Top-k sampling parameter
  • --seed: Random seed for deterministic generation
  • --stream, -s: Stream completion in real-time

Embedding Command #

Generate text embeddings:

dart run bin/llamacpp_rpc_client.dart embedding \
  --url http://localhost:8080 \
  --input "machine learning"

# Output raw embedding values
dart run bin/llamacpp_rpc_client.dart embedding \
  --url http://localhost:8080 \
  --input "artificial intelligence" \
  --raw

Options:

  • --url, -u: Base URL of the llama.cpp RPC server (required)
  • --input, -i: Input text for embedding generation (required)
  • --raw, -r: Output raw embedding vector values
0
likes
160
points
150
downloads

Publisher

verified publisheragilord.com

Weekly Downloads

HTTP client bindings to call the llama.cpp RPC server

Repository (GitHub)
View/report issues

Topics

#llamacpp #llama-cpp #rpc #llm #client

Documentation

API reference

License

BSD-3-Clause (license)

Dependencies

args, executor, http, json_annotation

More

Packages that depend on llamacpp_rpc_client