Why This Matters
Claude Code is an impressive agentic coding tool that can read, modify, and execute code in your working directory. But there’s one catch: it usually requires paying for Claude API calls. What if I told you that you can run the exact same tool with completely free, local models?
That’s exactly what I’ve been experimenting with lately, and I wanted to share how you can do it too.
What you get with this setup:
- Cost: Zero API fees. Run as many coding sessions as you want.
- Privacy: Your code never leaves your machine. No cloud API calls.
- Offline: Work without internet once models are downloaded.
- Learning: Compare open-source coding models with Claude’s performance.
What Is Claude Code?
Claude Code is Anthropic’s official CLI tool that brings AI-powered coding assistance to your terminal. It can understand your codebase, make edits across multiple files, run commands, and help debug issues - all through a conversational interface.
Normally, it connects to Anthropic’s API and uses Claude models (Sonnet, Opus, Haiku). But thanks to an Anthropic-compatible API bridge, you can point it at Ollama instead.
Prerequisites
Before we start, you’ll need:
Ollama installed - If you don’t have it yet, check out my earlier post on running LLMs locally.
Claude Code CLI - Install from npm:
npm install -g @anthropics/claude-code
- Enough RAM - Claude Code needs models with large context windows (minimum 64k tokens). Plan for at least 8GB of free RAM.
Step 1: Choose Your Model
Not all models work well with Claude Code. You need coding-specific models with large context windows. Here are the recommended ones:
Best options:
qwen2.5-coder:7b- Strong coding performance, good balanceqwen2.5-coder:14b- Better quality if you have the RAMdeepseek-coder-v2:16b- Excellent at following instructionscodellama:13b-instruct- Solid baseline for coding tasks
Pull your chosen model:
ollama pull qwen2.5-coder:7b
Tip: The first pull downloads several GBs and can take a few minutes depending on your connection.
Step 2: Quick Setup (Recommended)
The easiest way to get started is using Ollama’s automatic configuration:
ollama launch claude
This command will:
- Start the Ollama server if it’s not running
- Configure the necessary environment variables
- Launch Claude Code pointed at your local Ollama instance
You’ll be prompted to select which model to use from your installed models.
Step 3: Manual Setup (Alternative)
If you prefer more control, you can configure it manually:
Set these environment variables:
export ANTHROPIC_API_KEY="ollama"
export ANTHROPIC_BASE_URL="http://localhost:11434/v1"
Then launch Claude Code with your chosen model:
claude --model qwen2.5-coder:7b
Note: The API key can be any string when using Ollama locally - it just needs to be set.
Using Claude Code with Local Models
Once launched, you can use Claude Code exactly as you would with Claude models:
Ask it to create a new feature:
Debug an issue:
Refactor code:
The model will:
- Read relevant files in your project
- Understand your codebase structure
- Make necessary edits
- Run commands to test changes
- Explain what it did
Performance: Local vs Cloud
Here’s what I’ve noticed after using both:
What works well locally:
- Simple bug fixes and code generation
- Following clear, specific instructions
- Working within a focused codebase area
- Standard coding patterns and frameworks
Where Claude models still lead:
- Complex multi-file refactoring
- Nuanced architecture decisions
- Understanding implicit requirements
- Edge case handling
For most day-to-day coding tasks, the local models are surprisingly capable. You might be pleasantly surprised.
Troubleshooting
Out of memory errors?
- Switch to a smaller model (7b instead of 14b)
- Close other applications
- Check
ollama psto see memory usage
Model responses are slow?
- Local inference is slower than API calls - that’s expected
- GPU acceleration helps significantly (Ollama auto-detects)
- Smaller models respond faster
Context window errors?
- Some models have smaller context than advertised
- Try reducing the number of files you’re working with
- Use more focused prompts
Connection refused?
- Make sure Ollama server is running:
ollama serve - Check the base URL is correct:
http://localhost:11434/v1
Cost Comparison
Let’s do some rough math:
Claude API pricing (Sonnet 4.5):
- Input: $3 per million tokens
- Output: $15 per million tokens
- A typical coding session might use 50k-200k tokens
- Cost: roughly $0.50-$3 per session
Local Ollama setup:
- One-time: Electricity cost (negligible for CPU, ~$0.10/hour for GPU)
- Ongoing: $0
If you’re doing regular coding sessions, the local setup pays for itself quickly.
When to Use Each
Use local Ollama when:
- Learning and experimenting
- Working on personal projects
- You want complete privacy
- Budget is a concern
- You’re offline
Use Claude API when:
- Working on complex production systems
- You need the absolute best performance
- Speed matters more than cost
- You don’t want to manage local infrastructure
For me, I’ve been using local models for 80% of my coding tasks and only switching to Claude API for the really tricky stuff. That hybrid approach gives me the best of both worlds.
Advanced: Switching Between Models
You can easily switch between different local models to compare them:
# Try a smaller, faster model
claude --model qwen2.5-coder:7b
# Or a larger, more capable one
claude --model deepseek-coder-v2:16b
This is great for finding the right balance between speed and quality for your specific use case.
Integration with Existing Workflow
Claude Code works in any directory, so you can:
cd ~/projects/my-app
claude --model qwen2.5-coder:7b
Then start asking it to help with your project. It will read your code, understand the structure, and make intelligent suggestions.
Works great with:
- Git repositories
- npm/yarn/pnpm projects
- Python virtual environments
- Any codebase really
Wrapping Up
Running Claude Code with Ollama is one of those setups that sounds too good to be true but actually works. You get a powerful agentic coding assistant running completely locally and free.
Is it as good as Claude Opus or Sonnet 4.5? Not quite. But for most everyday coding tasks, it’s more than capable. And the privacy, cost savings, and learning value make it absolutely worth trying.
If you’re already using Ollama for other AI tasks (which I covered in my Ollama guide), adding Claude Code is just one more command away.
Give it a try and let me know what you think. Happy coding!
Further Reading
- Official Ollama Claude Code docs: https://docs.ollama.com/integrations/claude-code
- Claude Code repository: https://github.com/anthropics/claude-code
- Ollama model library: https://ollama.com/library
- My post on MCP integration for even more AI tool capabilities