mirror of
https://github.com/ollama/ollama.git
synced 2026-05-13 06:21:28 +00:00
Register sequences with Add/Remove; each Sample call takes any subset of registered slots and samples one token per row, appending to each slot's ring-buffer history. When all slots share Options and penalty rings are full, one fused transform pass runs over the whole batch via a persistent pooled history tensor; otherwise calls fall back to per-slot serial processing indexed against the same pool. Performance is unchanged for a single sequence, which is all that is exposed for now. |
||
|---|---|---|
| .. | ||
| agent | ||
| cmd | ||
| create | ||
| imagegen | ||
| mlxrunner | ||
| models | ||
| safetensors | ||
| server | ||
| tokenizer | ||
| tools | ||