Using Ollama With Roo Code
Roo Code supports running models locally using Ollama. This provides privacy, offline access, and potentially lower costs, but requires more setup and a powerful computer.
Website: https://ollama.com/
Setting up Ollama
-
Download and Install Ollama: Download the Ollama installer for your operating system from the Ollama website. Follow the installation instructions. Make sure Ollama is running
ollama serve
-
Download a Model: Ollama supports many different models. You can find a list of available models on the Ollama website. Some recommended models for coding tasks include:
codellama:7b-code
(good starting point, smaller)codellama:13b-code
(better quality, larger)codellama:34b-code
(even better quality, very large)qwen2.5-coder:32b
mistralai/Mistral-7B-Instruct-v0.1
(good general-purpose model)deepseek-coder:6.7b-base
(good for coding tasks)llama3:8b-instruct-q5_1
(good for general tasks)
To download a model, open your terminal and run:
ollama pull <model_name>
For example:
ollama pull qwen2.5-coder:32b
-
Configure the Model: Configure your model's context window in Ollama and save a copy.
Default Context BehaviorRoo Code automatically defers to the Modelfile's
num_ctx
setting by default. When you use a model with Ollama, Roo Code reads the model's configured context window and uses it automatically. You don't need to configure context size in Roo Code settings - it respects what's defined in your Ollama model.Option A: Interactive Configuration
Load the model (we will use
qwen2.5-coder:32b
as an example):ollama run qwen2.5-coder:32b
Change context size parameter:
/set parameter num_ctx 32768
Save the model with a new name:
/save your_model_name
Option B: Using a Modelfile (Recommended)
Create a
Modelfile
with your desired configuration:# Example Modelfile for reduced context
FROM qwen2.5-coder:32b
# Set context window to 32K tokens (reduced from default)
PARAMETER num_ctx 32768
# Optional: Adjust temperature for more consistent output
PARAMETER temperature 0.7
# Optional: Set repeat penalty
PARAMETER repeat_penalty 1.1Then create your custom model:
ollama create qwen-32k -f Modelfile
Override Context WindowIf you need to override the model's default context window:
- Permanently: Save a new model version with your desired
num_ctx
using either method above - Roo Code behavior: Roo automatically uses whatever
num_ctx
is configured in your Ollama model - Memory considerations: Reducing
num_ctx
helps prevent out-of-memory errors on limited hardware
- Permanently: Save a new model version with your desired
-
Configure Roo Code:
- Open the Roo Code sidebar ( icon).
- Click the settings gear icon ().
- Select "ollama" as the API Provider.
- Enter the model tag or saved name from the previous step (e.g.,
your_model_name
). - (Optional) Configure the base URL if you're running Ollama on a different machine. The default is
http://localhost:11434
. - (Optional) Enter an API Key if your Ollama server requires authentication.
- (Advanced) Roo uses Ollama's native API by default for the "ollama" provider. An OpenAI-compatible
/v1
handler also exists but isn't required for typical setups.
Tips and Notes
- Resource Requirements: Running large language models locally can be resource-intensive. Make sure your computer meets the minimum requirements for the model you choose.
- Model Selection: Experiment with different models to find the one that best suits your needs.
- Offline Use: Once you've downloaded a model, you can use Roo Code offline with that model.
- Token Tracking: Roo Code tracks token usage for models run via Ollama, helping you monitor consumption.
- Ollama Documentation: Refer to the Ollama documentation for more information on installing, configuring, and using Ollama.
Troubleshooting
Out of Memory (OOM) on First Request
Symptoms
- First request from Roo fails with an out-of-memory error
- GPU/CPU memory usage spikes when the model first loads
- Works after you manually start the model in Ollama
Cause If no model instance is running, Ollama spins one up on demand. During that cold start it may allocate a larger context window than expected. The larger context window increases memory usage and can exceed available VRAM or RAM. This is an Ollama startup behavior, not a Roo Code bug.
Fixes
-
Preload the model
ollama run <model-name>
Keep it running, then issue the request from Roo.
-
Pin the context window (
num_ctx
)- Option A — interactive session, then save:
# inside `ollama run <base-model>`
/set parameter num_ctx 32768
/save <your_model_name> - Option B — Modelfile (recommended for reproducibility):
Then create the model:
FROM <base-model>
PARAMETER num_ctx 32768
# Adjust based on your available memory:
# 16384 for ~8GB VRAM
# 32768 for ~16GB VRAM
# 65536 for ~24GB+ VRAMollama create <your_model_name> -f Modelfile
- Option A — interactive session, then save:
-
Ensure the model's context window is pinned Save your Ollama model with an appropriate
num_ctx
(via/set
+/save
, or preferably a Modelfile). Roo Code automatically detects and uses the model's configurednum_ctx
- there is no manual context size setting in Roo Code for the Ollama provider. -
Use smaller variants If GPU memory is limited, use a smaller quant (e.g., q4 instead of q5) or a smaller parameter size (e.g., 7B/13B instead of 32B).
-
Restart after an OOM
ollama ps
ollama stop <model-name>
Quick checklist
- Model is running before Roo request
num_ctx
pinned (Modelfile or/set
+/save
)- Model saved with appropriate
num_ctx
(Roo uses this automatically) - Model fits available VRAM/RAM
- No leftover Ollama processes