numGpu method
Sets the number of GPU layers to use
Controls how many layers of the model are loaded onto the GPU. More layers on GPU means faster inference but higher GPU memory usage.
- 0: CPU only (slowest, lowest memory)
- -1: Load all layers on GPU (fastest, highest memory)
- Positive number: Load specified number of layers on GPU
Implementation
OllamaBuilder numGpu(int gpuLayers) {
_baseBuilder.extension('numGpu', gpuLayers);
return this;
}