Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models. https://ollama.com
Find a file
Daniel Hiltgen 206b049508
mlx: avoid status timeout during inference (#16086)
The MLX runner now routes model work through a locked worker thread. Status also used that worker only to sample memory, so a scheduler health ping could sit behind long prefill or generation until its 10s context expired, causing /v1/status to return 500 and the server to treat the runner as unhealthy.

While Metal doesn't change VRAM reporting, CUDA does. Cache the last memory sample and make status perform only a short best-effort refresh. If the worker is busy, status returns the cached value while a single background refresh continues and updates the cache when the worker becomes available. The in-flight guard and lifecycle context keep this from spawning unbounded refreshes while preserving live VRAM refresh behavior for CUDA.

Fixes #16081
2026-05-11 16:03:38 -07:00
.github app: harden update flows (#16100) 2026-05-11 12:24:01 -07:00
anthropic anthropic: fix empty inputs in content blocks (#15105) 2026-03-27 15:41:27 -07:00
api launch: add plan-aware model gating (#16027) 2026-05-06 14:34:26 -07:00
app app: harden update flows (#16100) 2026-05-11 12:24:01 -07:00
auth auth: fix problems with the ollama keypairs (#12373) 2025-09-22 23:20:20 -07:00
cmd mlx: refined model push behavior (#15431) 2026-05-08 14:25:30 -07:00
convert New models (#15861) 2026-04-28 11:50:12 -07:00
discover metal: harden for ggml initialization failures (#15755) 2026-04-30 16:28:03 -07:00
docs launch: disable Claude Desktop launch (#16028) 2026-05-07 10:46:18 -07:00
envconfig app: harden update flows (#16100) 2026-05-11 12:24:01 -07:00
format chore(all): replace instances of interface with any (#10067) 2025-04-02 09:44:27 -07:00
fs New models (#15861) 2026-04-28 11:50:12 -07:00
harmony Parser for Cogito v2 (#13145) 2025-11-19 17:21:07 -08:00
integration test: integration test hardening (#13532) 2026-05-08 15:54:17 -07:00
internal Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
kvcache model: support for qwen3.5 architecture (#14378) 2026-02-24 20:08:05 -08:00
llama cgo: suppress deprecated warning to quiet down go build (#15438) 2026-04-13 13:04:11 -07:00
llm Update MLX and MLX-C with threading fixes (#15845) 2026-05-03 10:03:14 -07:00
logutil logutil: fix source field (#12279) 2025-09-16 16:18:07 -07:00
manifest create: avoid gc race with create (#15628) 2026-04-16 13:29:16 -07:00
middleware Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
ml metal: harden for ggml initialization failures (#15755) 2026-04-30 16:28:03 -07:00
model renderers: update gemma4 renderer (#15886) 2026-04-29 18:40:23 -07:00
openai openai: map responses reasoning effort to think (#15789) 2026-04-24 02:49:36 -07:00
parser mlx: Gemma4 MTP speculative decoding (#15980) 2026-05-05 08:55:04 -07:00
progress Add z-image image generation prototype (#13659) 2026-01-09 21:09:46 -08:00
readline Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
runner metal: harden for ggml initialization failures (#15755) 2026-04-30 16:28:03 -07:00
sample Revert "runner: add token history sampling parameters to ollama runner (#14537)" (#14776) 2026-03-10 21:07:52 -07:00
scripts mlx: Gemma4 MTP speculative decoding (#15980) 2026-05-05 08:55:04 -07:00
server mlx: update the imagegen runner for mlx thread affinity (#16096) 2026-05-11 13:05:06 -07:00
template template: fix args-as-json rendering (#13636) 2026-01-06 18:33:57 -08:00
thinking thinking: fix double emit when no opening tag 2025-08-21 21:03:12 -07:00
tokenizer tokenizer: fix multi-regex BPE offset handling (#15844) 2026-04-27 14:14:27 -07:00
tools preserve tool definition and call JSON ordering (#13525) 2026-01-05 18:03:36 -08:00
types mlx: Gemma4 MTP speculative decoding (#15980) 2026-05-05 08:55:04 -07:00
version
x mlx: avoid status timeout during inference (#16086) 2026-05-11 16:03:38 -07:00
.dockerignore next build (#8539) 2025-01-29 15:03:38 -08:00
.gitattributes .gitattributes: add app/webview to linguist-vendored (#13274) 2025-11-29 23:46:10 -05:00
.gitignore create: Clean up experimental paths, fix create from existing safetensor model (#14679) 2026-04-07 08:12:57 -07:00
.golangci.yaml ci: restore previous linter rules (#13322) 2025-12-03 18:55:02 -08:00
CMakeLists.txt Update MLX and MLX-C with threading fixes (#15845) 2026-05-03 10:03:14 -07:00
CMakePresets.json mlx: Gemma4 MTP speculative decoding (#15980) 2026-05-05 08:55:04 -07:00
CONTRIBUTING.md docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00
Dockerfile mlx: Gemma4 MTP speculative decoding (#15980) 2026-05-05 08:55:04 -07:00
go.mod go: bump to 1.26 (#15904) 2026-05-03 23:24:35 -07:00
go.sum cmd: set codex env vars on launch and handle zstd request bodies (#14122) 2026-02-18 17:19:36 -08:00
LICENSE
main.go lint 2024-08-01 17:06:06 -07:00
Makefile.sync Revert "Update vendored llama.cpp to b7847" (#14061) 2026-02-03 18:39:36 -08:00
MLX_C_VERSION Update MLX and MLX-C with threading fixes (#15845) 2026-05-03 10:03:14 -07:00
MLX_VERSION Update MLX and MLX-C with threading fixes (#15845) 2026-05-03 10:03:14 -07:00
README.md cmd/launch: add Copilot CLI integration (#15583) 2026-04-15 17:22:53 -07:00
SECURITY.md docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00

ollama

Ollama

Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as Claude Code, OpenClaw, OpenCode , Codex, Copilot, and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Copilot CLI, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama
from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama
import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

  • llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Desktop

  • Dify.AI - LLM app development platform
  • AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
  • Maid - Cross-platform mobile and desktop client
  • Witsy - AI desktop app for Mac, Windows, and Linux
  • Cherry Studio - Multi-provider desktop client
  • Ollama App - Multi-platform client for desktop and mobile
  • PyGPT - AI desktop assistant for Linux, Windows, and Mac
  • Alpaca - GTK4 client for Linux and macOS
  • SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
  • Enchanted - Native macOS and iOS client
  • RWKV-Runner - Multi-model desktop runner
  • Ollama Grid Search - Evaluate and compare models
  • macai - macOS client for Ollama and ChatGPT
  • AI Studio - Multi-provider desktop IDE
  • Reins - Parameter tuning and reasoning model support
  • ConfiChat - Privacy-focused with optional encryption
  • LLocal.in - Electron desktop client
  • MindMac - AI chat client for Mac
  • Msty - Multi-model desktop client
  • BoltAI for Mac - AI chat client for Mac
  • IntelliBar - AI-powered assistant for macOS
  • Kerlig AI - AI writing assistant for macOS
  • Hillnote - Markdown-first AI workspace
  • Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

  • RAGFlow - RAG engine based on deep document understanding
  • R2R - Open-source RAG engine
  • MaxKB - Ready-to-use RAG chatbot
  • Minima - On-premises or fully local RAG
  • Chipper - AI interface with Haystack RAG
  • ARGO - RAG and deep research on Mac/Windows/Linux
  • Archyve - RAG-enabling document library
  • Casibase - AI knowledge base with RAG and SSO
  • BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

  • Opik - Debug, evaluate, and monitor LLM applications
  • OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
  • Lunary - LLM observability with analytics and PII masking
  • Langfuse - Open source LLM observability
  • HoneyHive - AI observability and evaluation for agents
  • MLflow Tracing - Open source LLM observability

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers