mirror of
https://github.com/ollama/ollama.git
synced 2026-05-13 14:27:00 +00:00
This change adds support for MTP (multi-token prediction) speculative decoding for the gemma4 model family. It includes: * support for importing safetensors based gemma4 draft models with `ollama create` * a new DRAFT command in the Modelfile for specifying draft models * a --quantize-draft flag for the ollama create command to quantize the draft model * cache support for speculation * changes to the rotating cache to be able to handle MTP correctly * sampling support for draft model token prediction --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com> |
||
|---|---|---|
| .. | ||
| client | ||
| create.go | ||
| create_test.go | ||
| dtype.go | ||
| gemma4.go | ||
| gemma4_test.go | ||
| imagegen.go | ||
| laguna.go | ||
| laguna_test.go | ||
| qwen35.go | ||