llamacpp_rpc_client 0.2.0
llamacpp_rpc_client: ^0.2.0 copied to clipboard
HTTP client bindings to call the llama.cpp RPC server
HTTP client bindings to call the llama.cpp RPC server.
Usage #
import 'package:llamacpp_rpc_client/llamacpp_rpc_client.dart';
void main() async {
final client = LlamacppRpcClient('http://localhost:8080');
// Text completion
final completion = await client.completion(
'The capital of France is',
options: CompletionOptions(
maxTokens: 50,
temperature: 0.7,
),
);
print(completion.content);
// Streaming completion
await for (final chunk in client.streamCompletion('Tell me a story')) {
print(chunk.content);
}
// Text embedding
final embedding = await client.embedding('Hello world');
print(embedding.embedding.length);
client.close();
}
CLI Usage #
The package includes a command-line interface for easy interaction with llama.cpp servers:
Completion Command #
Generate text completions:
dart run bin/llamacpp_rpc_client.dart completion \
--url http://localhost:8080 \
--prompt "The capital of France is" \
--temperature 0.7 \
--max-tokens 50
# Stream completion in real-time
dart run bin/llamacpp_rpc_client.dart completion \
--url http://localhost:8080 \
--prompt "Tell me a story" \
--stream
# Deterministic generation with seed
dart run bin/llamacpp_rpc_client.dart completion \
--url http://localhost:8080 \
--prompt "Hello world" \
--seed 42
Options:
--url, -u
: Base URL of the llama.cpp RPC server (required)--prompt, -p
: Input prompt for completion (required)--temperature, -t
: Temperature for randomness (0.0-2.0)--max-tokens, -m
: Maximum tokens to generate--top-p
: Nucleus sampling parameter (0.0-1.0)--top-k
: Top-k sampling parameter--seed
: Random seed for deterministic generation--stream, -s
: Stream completion in real-time
Embedding Command #
Generate text embeddings:
dart run bin/llamacpp_rpc_client.dart embedding \
--url http://localhost:8080 \
--input "machine learning"
# Output raw embedding values
dart run bin/llamacpp_rpc_client.dart embedding \
--url http://localhost:8080 \
--input "artificial intelligence" \
--raw
Options:
--url, -u
: Base URL of the llama.cpp RPC server (required)--input, -i
: Input text for embedding generation (required)--raw, -r
: Output raw embedding vector values