Running Claude Code with Free Ollama Models

Why This Matters

Claude Code is an impressive agentic coding tool that can read, modify, and execute code in your working directory. But there’s one catch: it usually requires paying for Claude API calls. What if I told you that you can run the exact same tool with completely free, local models?

That’s exactly what I’ve been experimenting with lately, and I wanted to share how you can do it too.

What you get with this setup:

Cost: Zero API fees. Run as many coding sessions as you want.
Privacy: Your code never leaves your machine. No cloud API calls.
Offline: Work without internet once models are downloaded.
Learning: Compare open-source coding models with Claude’s performance.

What Is Claude Code?

Claude Code is Anthropic’s official CLI tool that brings AI-powered coding assistance to your terminal. It can understand your codebase, make edits across multiple files, run commands, and help debug issues - all through a conversational interface.

Normally, it connects to Anthropic’s API and uses Claude models (Sonnet, Opus, Haiku). But thanks to an Anthropic-compatible API bridge, you can point it at Ollama instead.

Prerequisites

Before we start, you’ll need:

Ollama installed - If you don’t have it yet, check out my earlier post on running LLMs locally.
Claude Code CLI - Install from npm:

npm install -g @anthropics/claude-code

Enough RAM - Claude Code needs models with large context windows (minimum 64k tokens). Plan for at least 8GB of free RAM.

Step 1: Choose Your Model

Not all models work well with Claude Code. You need coding-specific models with large context windows. Here are the recommended ones:

Best options:

qwen2.5-coder:7b - Strong coding performance, good balance
qwen2.5-coder:14b - Better quality if you have the RAM
deepseek-coder-v2:16b - Excellent at following instructions
codellama:13b-instruct - Solid baseline for coding tasks

Pull your chosen model:

ollama pull qwen2.5-coder:7b

Tip: The first pull downloads several GBs and can take a few minutes depending on your connection.

Step 2: Quick Setup (Recommended)

The easiest way to get started is using Ollama’s automatic configuration:

ollama launch claude

This command will:

Start the Ollama server if it’s not running
Configure the necessary environment variables
Launch Claude Code pointed at your local Ollama instance

You’ll be prompted to select which model to use from your installed models.

Step 3: Manual Setup (Alternative)

If you prefer more control, you can configure it manually:

Set these environment variables:

export ANTHROPIC_API_KEY="ollama"
export ANTHROPIC_BASE_URL="http://localhost:11434/v1"

Then launch Claude Code with your chosen model:

claude --model qwen2.5-coder:7b

Note: The API key can be any string when using Ollama locally - it just needs to be set.

Using Claude Code with Local Models

Once launched, you can use Claude Code exactly as you would with Claude models:

Ask it to create a new feature:

Debug an issue:

Refactor code:

The model will:

Read relevant files in your project
Understand your codebase structure
Make necessary edits
Run commands to test changes
Explain what it did

Performance: Local vs Cloud

Here’s what I’ve noticed after using both:

What works well locally:

Simple bug fixes and code generation
Following clear, specific instructions
Working within a focused codebase area
Standard coding patterns and frameworks

Where Claude models still lead:

Complex multi-file refactoring
Nuanced architecture decisions
Understanding implicit requirements
Edge case handling

For most day-to-day coding tasks, the local models are surprisingly capable. You might be pleasantly surprised.

Troubleshooting

Out of memory errors?

Switch to a smaller model (7b instead of 14b)
Close other applications
Check ollama ps to see memory usage

Model responses are slow?

Local inference is slower than API calls - that’s expected
GPU acceleration helps significantly (Ollama auto-detects)
Smaller models respond faster

Context window errors?

Some models have smaller context than advertised
Try reducing the number of files you’re working with
Use more focused prompts

Connection refused?

Make sure Ollama server is running: ollama serve
Check the base URL is correct: http://localhost:11434/v1

Cost Comparison

Let’s do some rough math:

Claude API pricing (Sonnet 4.5):

Input: $3 per million tokens
Output: $15 per million tokens
A typical coding session might use 50k-200k tokens
Cost: roughly $0.50-$3 per session

Local Ollama setup:

One-time: Electricity cost (negligible for CPU, ~$0.10/hour for GPU)
Ongoing: $0

If you’re doing regular coding sessions, the local setup pays for itself quickly.

When to Use Each

Use local Ollama when:

Learning and experimenting
Working on personal projects
You want complete privacy
Budget is a concern
You’re offline

Use Claude API when:

Working on complex production systems
You need the absolute best performance
Speed matters more than cost
You don’t want to manage local infrastructure

For me, I’ve been using local models for 80% of my coding tasks and only switching to Claude API for the really tricky stuff. That hybrid approach gives me the best of both worlds.

Advanced: Switching Between Models

You can easily switch between different local models to compare them:

# Try a smaller, faster model
claude --model qwen2.5-coder:7b

# Or a larger, more capable one
claude --model deepseek-coder-v2:16b

This is great for finding the right balance between speed and quality for your specific use case.

Integration with Existing Workflow

Claude Code works in any directory, so you can:

cd ~/projects/my-app
claude --model qwen2.5-coder:7b

Then start asking it to help with your project. It will read your code, understand the structure, and make intelligent suggestions.

Works great with:

Git repositories
npm/yarn/pnpm projects
Python virtual environments
Any codebase really

Wrapping Up

Running Claude Code with Ollama is one of those setups that sounds too good to be true but actually works. You get a powerful agentic coding assistant running completely locally and free.

Is it as good as Claude Opus or Sonnet 4.5? Not quite. But for most everyday coding tasks, it’s more than capable. And the privacy, cost savings, and learning value make it absolutely worth trying.

If you’re already using Ollama for other AI tasks (which I covered in my Ollama guide), adding Claude Code is just one more command away.

Give it a try and let me know what you think. Happy coding!

Why This Matters#

What Is Claude Code?#

Prerequisites#

Step 1: Choose Your Model#

Step 2: Quick Setup (Recommended)#

Step 3: Manual Setup (Alternative)#

Using Claude Code with Local Models#

Performance: Local vs Cloud#

Troubleshooting#

Cost Comparison#

When to Use Each#

Advanced: Switching Between Models#

Integration with Existing Workflow#

Wrapping Up#

Further Reading#