ollama/server
Jesse Gross bbbad97686 sched: Model eviction for MLX
MLX runners (image generation and LLM) previously bypassed the
scheduler's standard load path via a separate loadMLX method. This meant
they skipped VRAM fitting checks and couldn't participate in model
eviction.

Now all model types flow through the same load function. Model eviction
for MLX is based on weights as KV cache and compute graph are dynamic.
This means that eviction does not take into account the worst case
memory and models can still compete for memory but it is a significant
improvement.
2026-03-16 17:40:29 -07:00
..
internal docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00
auth.go server: reject unexpected auth hosts (#13738) 2026-01-16 14:10:36 -05:00
auth_test.go server: reject unexpected auth hosts (#13738) 2026-01-16 14:10:36 -05:00
cloud_proxy.go server: decompress zstd request bodies in cloud passthrough middleware (#14827) 2026-03-13 15:06:47 -07:00
cloud_proxy_test.go server: decompress zstd request bodies in cloud passthrough middleware (#14827) 2026-03-13 15:06:47 -07:00
create.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
create_test.go Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00
download.go Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00
fixblobs.go
fixblobs_test.go
images.go mlxrunner: Enforce model context limit 2026-02-27 17:29:47 -08:00
images_test.go x/imagegen: add image edit capabilities (#13846) 2026-01-22 20:35:08 -08:00
logprob.go logprob: add bytes to logprobs (#13068) 2025-11-13 13:49:25 -08:00
model.go Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00
model_resolver.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
model_resolver_test.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
prompt.go mlxrunner: Enforce model context limit 2026-02-27 17:29:47 -08:00
prompt_test.go model/renderers: fix glm-ocr image tags in renderer prompts (#14584) 2026-03-03 12:51:34 -08:00
quantization.go model: support for qwen3.5 architecture (#14378) 2026-02-24 20:08:05 -08:00
quantization_test.go model: support for qwen3.5 architecture (#14378) 2026-02-24 20:08:05 -08:00
routes.go server: remove experimental aliases support (#14810) 2026-03-12 20:27:24 -07:00
routes_cloud_test.go cloud_proxy: send ollama client version (#14769) 2026-03-10 15:53:25 -07:00
routes_create_test.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
routes_debug_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
routes_delete_test.go Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
routes_generate_renderer_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
routes_generate_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
routes_harmony_streaming_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
routes_list_test.go Update the /api/create endpoint to use JSON (#7935) 2024-12-31 18:02:30 -08:00
routes_options_test.go server: use tiered VRAM-based default context length 2026-02-02 10:47:09 -08:00
routes_test.go server: return error when embedding contains NaN or Inf values (#13599) 2026-01-03 02:20:12 -05:00
routes_web_experimental_test.go cloud_proxy: send ollama client version (#14769) 2026-03-10 15:53:25 -07:00
sched.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
sched_test.go sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
sparse_common.go Don't hard fail on sparse setup error 2024-08-09 12:16:19 -07:00
sparse_windows.go Don't hard fail on sparse setup error 2024-08-09 12:16:19 -07:00
test_home_test.go add ability to disable cloud (#14221) 2026-02-12 15:47:00 -08:00
upload.go Clean up the manifest and modelpath (#13807) 2026-01-21 11:46:17 -08:00