GitHub Copilot CLI is GitHub’s AI coding agent for the terminal — it reads your codebase, edits files, runs commands, and generally behaves like a junior pair-programmer that never gets tired. Out of the box it ships wired up to GitHub’s hosted models, but what most developers haven’t realised yet is that you can point it at any open model via Ollama — local or cloud.
That means you can run Copilot CLI against qwen3.5 on your laptop, or against cloud-hosted open weights like kimi-k2.5:cloud and glm-5.1:cloud, without sending a single token to a proprietary endpoint.
Let’s walk through it.
Why this matters
Three reasons this combination is worth your attention:
- Model choice. Copilot CLI is a solid agent harness. Ollama is a solid model runtime. Decoupling the two means you pick the best model for the job instead of being locked to one vendor’s defaults.
- Local-first when you need it. If you’re working on a proprietary codebase, running
qwen3.5locally through Ollama keeps the whole loop — prompts, code, edits — on your machine. - Cloud-scale when you need that instead. For tasks that need a bigger brain, Ollama’s cloud models (
kimi-k2.5:cloud,glm-5:cloud,minimax-m2.7:cloud,qwen3.5:cloud) give you frontier-class open weights over the same interface.
Step 1: Install Copilot CLI
Pick whichever matches your setup:
# macOS / Linux (Homebrew)
brew install copilot-cli
npm, a shell installer script, and WinGet for Windows are also supported — check the Copilot CLI page for the exact commands.
Step 2: The one-command quick setup
If you already have Ollama installed, this is the whole setup:
ollama launch copilot
That’s it. Ollama wires Copilot CLI up to a local model and drops you into the agent. To pick the model directly:
ollama launch copilot --model kimi-k2.5:cloud
Recommended models
A few that are worth trying first:
kimi-k2.5:cloud— strong all-rounder for agentic codingglm-5:cloudminimax-m2.7:cloudqwen3.5:cloudglm-4.7-flash— faster, smaller, good for tight loopsqwen3.5— fully local
The full catalogue of cloud-hosted open models lives at ollama.com/search?c=cloud.
Headless mode for CI and containers
This is the part that gets interesting if you’re building automation. Copilot CLI can run non-interactively, which means you can drop it straight into a Docker container, a GitHub Action, or a shell script:
ollama launch copilot --model kimi-k2.5:cloud --yes -- -p "how does this repository work?"
A couple of things worth knowing:
--yesauto-pulls the model, skips the interactive selectors, and requires you to specify--modelexplicitly (no guessing).- Anything after
--is passed straight through to Copilot CLI, so you have the fullcopilotflag surface available.
This is the shape you want for “agent-in-a-pipeline” workflows — repo analysis, PR review bots, scheduled maintenance agents.
Manual setup (for the rest of us)
If you’d rather wire things up yourself — for example because you want to drop this into a specific shell profile or container image — Copilot CLI talks to Ollama over the OpenAI-compatible API, configured through environment variables:
export COPILOT_PROVIDER_BASE_URL=http://localhost:11434/v1
export COPILOT_PROVIDER_API_KEY=
export COPILOT_PROVIDER_WIRE_API=responses
export COPILOT_MODEL=qwen3.5
Then just:
copilot
Or do it all in one line:
COPILOT_PROVIDER_BASE_URL=http://localhost:11434/v1 \
COPILOT_PROVIDER_API_KEY= \
COPILOT_PROVIDER_WIRE_API=responses \
COPILOT_MODEL=glm-5:cloud \
copilot
Note that COPILOT_PROVIDER_API_KEY is intentionally empty — local Ollama doesn’t need one, and the variable still has to be set for Copilot CLI to skip its own auth flow.
One important gotcha: context length
Copilot CLI is a real coding agent, which means it wants to stuff a lot into the context window — file contents, tool outputs, its own reasoning traces. Plan for at least 64k tokens of context.
If you run a small context model on default settings, you’ll see the agent start losing track of the task halfway through. Bump the context length in Ollama before you blame the model. The Ollama context length docs cover exactly how to set this per-model.
Wrapping up
The trend here is worth naming: agent harnesses and model runtimes are decoupling. Claude Code, Codex, OpenCode, Goose, and now Copilot CLI are all converging on the same pattern — a standard OpenAI-compatible endpoint and a handful of environment variables.
The practical upshot is that “which coding agent do I use?” and “which model does it run on?” are becoming two separate questions. That’s a good thing. Pick the harness that fits your workflow, pick the model that fits the task, and let the runtime figure out the rest.
For the Copilot CLI + Ollama combination specifically — if you haven’t tried qwen3.5 or kimi-k2.5:cloud as your daily driver yet, this is a low-effort way in.