ollama

mirror of https://github.com/ollama/ollama.git synced 2026-05-13 14:27:00 +00:00

History

Patrick Devine 15e6076d79 mlx: Gemma4 MTP speculative decoding (#15980 ) This change adds support for MTP (multi-token prediction) speculative decoding for the gemma4 model family. It includes: * support for importing safetensors based gemma4 draft models with `ollama create` * a new DRAFT command in the Modelfile for specifying draft models * a --quantize-draft flag for the ollama create command to quantize the draft model * cache support for speculation * changes to the rotating cache to be able to handle MTP correctly * sampling support for draft model token prediction --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com>		2026-05-05 08:55:04 -07:00
..
support	app: add code for macOS and Windows apps under 'app' (#12933 )	2025-11-04 11:40:17 -08:00
.this-is-the-create-dmg-repo	app: add code for macOS and Windows apps under 'app' (#12933 )	2025-11-04 11:40:17 -08:00
build_darwin.sh	Update MLX and MLX-C with threading fixes (#15845 )	2026-05-03 10:03:14 -07:00
build_docker.sh	Update ROCm (6.3 linux, 6.2 windows) and CUDA v12.8 (#9304 )	2025-02-25 13:47:36 -08:00
build_linux.sh	mlx: Gemma4 MTP speculative decoding (#15980 )	2026-05-05 08:55:04 -07:00
build_windows.ps1	ci: fix missing windows zip file (#14807 )	2026-03-12 16:14:00 -07:00
create-dmg.sh	app: add code for macOS and Windows apps under 'app' (#12933 )	2025-11-04 11:40:17 -08:00
deduplicate_cuda_libs.sh	CI: dedup cuda libraries to reduce payload size (#13704 )	2026-01-13 11:25:31 -08:00
env.sh	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
install.ps1	win: progress reporting on install download (#14219 )	2026-02-12 12:06:56 -08:00
install.sh	install: prevent partial download script execution (#14311 )	2026-02-18 18:32:45 -08:00
push_docker.sh
tag_latest.sh