forMaxPerformance method

OllamaBuilder forMaxPerformance()

Configure for maximum performance (GPU-optimized)

Optimizes settings for maximum inference speed using GPU acceleration. Requires sufficient GPU memory.

Implementation

OllamaBuilder forMaxPerformance() {
  return numGpu(-1) // Use all GPU layers
      .numBatch(512) // Large batch size
      .keepAlive("1h") // Keep loaded for 1 hour
      .numa(true); // Enable NUMA if available
}