forMaxPerformance method
Configure for maximum performance (GPU-optimized)
Optimizes settings for maximum inference speed using GPU acceleration. Requires sufficient GPU memory.
Implementation
OllamaBuilder forMaxPerformance() {
return numGpu(-1) // Use all GPU layers
.numBatch(512) // Large batch size
.keepAlive("1h") // Keep loaded for 1 hour
.numa(true); // Enable NUMA if available
}