ollama

mirror of https://github.com/ollama/ollama.git synced 2026-05-13 14:27:00 +00:00

History

Patrick Devine 15e6076d79 mlx: Gemma4 MTP speculative decoding (#15980 ) This change adds support for MTP (multi-token prediction) speculative decoding for the gemma4 model family. It includes: * support for importing safetensors based gemma4 draft models with `ollama create` * a new DRAFT command in the Modelfile for specifying draft models * a --quantize-draft flag for the ollama create command to quantize the draft model * cache support for speculation * changes to the rotating cache to be able to handle MTP correctly * sampling support for draft model token prediction --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com>		2026-05-05 08:55:04 -07:00
..
client	mlx: Gemma4 MTP speculative decoding (#15980 )	2026-05-05 08:55:04 -07:00
create.go	mlx: Gemma4 MTP speculative decoding (#15980 )	2026-05-05 08:55:04 -07:00
create_test.go	mlx: Gemma4 MTP speculative decoding (#15980 )	2026-05-05 08:55:04 -07:00
dtype.go	mlx: Support NVIDIA TensorRT Model Optimizer import (#15566 )	2026-04-27 18:28:10 -07:00
gemma4.go	Keep Gemma4 router projection in source precision (#15613 )	2026-04-15 15:04:23 -07:00
gemma4_test.go	Keep Gemma4 router projection in source precision (#15613 )	2026-04-15 15:04:23 -07:00
imagegen.go	create: Clean up experimental paths, fix create from existing safetensor model (#14679 )	2026-04-07 08:12:57 -07:00
laguna.go	New models (#15861 )	2026-04-28 11:50:12 -07:00
laguna_test.go	New models (#15861 )	2026-04-28 11:50:12 -07:00
qwen35.go	create: Clean up experimental paths, fix create from existing safetensor model (#14679 )	2026-04-07 08:12:57 -07:00