numBatch method
Sets the batch size for processing
Controls the number of tokens processed in parallel during inference. Larger batch sizes can improve throughput but use more memory.
- Default: 512
- Range: 1-2048 (depends on available memory)
- Higher values: Better throughput, more memory usage
- Lower values: Less memory usage, potentially lower throughput
Implementation
OllamaBuilder numBatch(int batchSize) {
_baseBuilder.extension('numBatch', batchSize);
return this;
}