Ollama, a powerful framework for running and managing large language models (LLMs) locally, is now available as a native Windows application. This means you no longer need to rely on Windows Subsystem for Linux (WSL) to run Ollama. Whether you’re a software developer, AI engineer, or DevOps professional, this guide will walk you through setting up Ollama on Windows, optimizing it for your workflow, and leveraging its capabilities for AI-driven applications.
Why Ollama on Windows?
Ollama simplifies the process of running LLMs locally, making it an excellent choice for developers and engineers who need to work with AI models without relying on cloud-based solutions. With native Windows support, Ollama now offers:
- Native Performance: No more WSL overhead—Ollama runs directly on Windows.
- GPU Support: Full support for NVIDIA and AMD Radeon GPUs, enabling faster model inference.
- Ease of Use: The
ollama
command-line interface (CLI) is available incmd
,powershell
, or any terminal application. - Local API: The Ollama API is served on
http://localhost:11434
, making it easy to integrate into your applications.
System Requirements
Before installing Ollama, ensure your system meets the following requirements:
- Operating System: Windows 10 22H2 or newer (Home or Pro).
- GPU Drivers:
- NVIDIA: 452.39 or newer.
- AMD Radeon: Latest drivers from AMD’s support page.
- Storage: At least 4GB of space for the binary install, plus additional space for model files (which can range from tens to hundreds of GB).
Installing Ollama on Windows
Ollama is designed to be easy to install and use. Follow these steps to get started:
Step 1: Download and Install Ollama

Download the Ollama installer from the official website and run it. The installer does not require administrator privileges and will install Ollama in your home directory by default.
OllamaSetup.exe
If you want to install Ollama in a custom directory, use the following command:
OllamaSetup.exe /DIR="d:\some\location"
Step 2: Set Up Model Storage
By default, Ollama stores models in your home directory. If you want to change this location, set the OLLAMA_MODELS
environment variable:
- Open Settings (Windows 11) or Control Panel (Windows 10) and search for environment variables.
- Click on Edit environment variables for your account.
- Create or edit the
OLLAMA_MODELS
variable and set it to your desired directory. - Restart Ollama or open a new terminal for the changes to take effect.
Step 3: Verify Installation
After installation, open a terminal and run the following command to verify that Ollama is installed correctly:
ollama --version
You should see the installed version of Ollama displayed.
Running Ollama on Windows
Once installed, Ollama runs in the background, and you can interact with it using the CLI or API. Here’s how to get started:
Step 1: Pull a Model
To use Ollama, you need to download a model. For example, to pull the llama3.2
model, run:
ollama pull llama3.2
This will download the model and make it available for use.
Step 2: Run the Model
You can now run the model and interact with it via the CLI or API. For example, to generate text using the llama3.2
model, run:
ollama run llama3.2
This will start an interactive session where you can input prompts and receive responses from the model.
Step 3: Use the Ollama API
Ollama provides a REST API that you can use to integrate it into your applications. The API is available at http://localhost:11434
. Here’s an example of using the API in PowerShell:
(Invoke-WebRequest -method POST -Body '{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
This will send a prompt to the model and return the generated response.
Optimizing Ollama for Windows
To get the most out of Ollama on Windows, consider the following optimizations:
1. GPU Acceleration
Ollama supports both NVIDIA and AMD GPUs. Ensure your GPU drivers are up to date to take full advantage of hardware acceleration. This will significantly improve the performance of model inference.
2. Model Quantization
Quantization reduces the size of models, making them faster and more efficient. Ollama supports various quantization levels, such as q4_K_M
and q5_K_M
. You can quantize a model during the pull process:
ollama pull llama3.2:q4_K_M
3. Custom Model Storage
If your home directory is on a small drive, consider changing the OLLAMA_MODELS
environment variable to point to a directory on a larger drive. This will prevent storage issues when working with large models.
Troubleshooting
If you encounter issues, Ollama stores logs and other files in the following locations:
- Logs:
%LOCALAPPDATA%\Ollama
- Binaries:
%LOCALAPPDATA%\Programs\Ollama
- Models:
%HOMEPATH%\.ollama
- Temporary Files:
%TEMP%
Check the logs for detailed error messages if something goes wrong.
Uninstalling Ollama
To uninstall Ollama, go to Add or remove programs in Windows Settings and select Ollama. Note that if you changed the OLLAMA_MODELS
location, the installer will not remove your downloaded models.
Conclusion
Ollama on Windows provides a seamless experience for running and managing large language models locally. With native support for NVIDIA and AMD GPUs, easy installation, and a powerful API, Ollama is an excellent tool for developers, AI engineers, and DevOps professionals. Whether you’re building AI applications or experimenting with LLMs, Ollama makes it easy to get started.
For more information, check out the Ollama documentation and start exploring the possibilities today!