How to Connect Ollama to Claude Desktop Using MCP


I get asked the same question almost every week. People want to know how to connect Ollama to Claude Desktop using MCP so they can run a local model and still benefit from the polished tool calling experience that Anthropic has built into the desktop app. The honest answer is that it works, but not in the way most people initially assume. Claude Desktop is hardcoded to talk to Anthropic’s API. It is not a generic chat shell that points at any model you want. The trick is using MCP as the connective tissue and slotting Ollama in either through a bridge or by using Claude Desktop as the orchestrator while a separate local chat UI handles the Ollama side.

In my recent walkthrough I demonstrated this whole pattern with my Obsidian vault. The MCP server reads my notes, finds connections between concepts, and writes a new file with linked references. Everything runs on my machine. No tokens leave the laptop. If you want to see the full demo with the configuration on screen, the video is linked at the bottom of this post.

What does it actually mean to connect Ollama to Claude Desktop using MCP?

There is a small but important distinction here. Claude Desktop natively supports MCP servers. You configure them in a JSON file at the standard config path on macOS or Windows, and Claude Desktop launches each server as a subprocess on startup. The model that decides when to call those tools, however, is whichever Claude model you have selected in the app. So when people say they want to connect Ollama to Claude Desktop using MCP, what they usually want is one of two things.

The first interpretation is making a local Ollama model the brain that calls MCP tools. The second is letting Claude Desktop call into Ollama as if Ollama were itself a tool. Both are valid. Both require slightly different plumbing. I find that being clear about which one you want saves hours of debugging later. If you are still ramping up on the local model side, my Ollama local development guide covers the basics of getting models running and exposing them on a port your other tools can hit.

Why does the MCP server config matter so much?

The Claude Desktop config file is deceptively simple. You declare an mcpServers object with a name for each server, a command to launch it, optional arguments, and an environment block for secrets like API keys. When Claude Desktop boots, it spawns each command, talks to it over stdio, and registers all the tools the server exposes. If the JSON has a typo, the server silently fails to load and you get no obvious error. If the command path is wrong, same thing. If the environment variable is missing, the server starts but every call fails.

In practice I describe the config to people like this. You have a top-level mcpServers key. Inside it you have one entry per server, keyed by whatever name you want to see in the UI. Each entry has a command, which is usually uvx for Python servers or npx for Node servers. Then you have an args array with the package name and any flags. Then you have an env object where you put things like the Obsidian API key or the Ollama base URL. That is the entire shape. It is not complicated, but every detail has to be exactly right.

Where does Ollama fit into the picture?

Here is where the honest answer matters. Claude Desktop will not, on its own, route requests through Ollama. It calls Anthropic. So if you want Ollama to be the model doing the reasoning, you need a different setup. In my video I used LM Studio rather than Claude Desktop because LM Studio exposes an OpenAI compatible chat completions endpoint with tool support, and you can wire your local chat UI to that endpoint while pointing the same UI at MCP servers. Ollama can play the same role. You run Ollama on its default port, point your chat application at that endpoint, and have the chat application launch MCP servers in the background.

The crucial requirement is tool support. Not every local model knows how to emit the structured tool call syntax that MCP servers expect. In my demo I had to use a 14 billion parameter Qwen model because the smaller 7 and 8 billion parameter options either lacked tool training or produced inconsistent calls. With Ollama you want models tagged for tool use. Llama 3.1 instruct variants, Qwen 2.5 instruct, and Mistral instruct are usually safe bets. If a model has not been trained for tool use, it will hallucinate function calls or just describe what it would do in prose, which is useless when you need actual API execution.

If you are evaluating which local stack to commit to, I wrote a reality check on local AI coding that goes into which model sizes and architectures actually deliver on the promise versus which ones are just demo theater.

What about ollama-mcp bridge tools?

There is a growing ecosystem of bridge projects that try to make Ollama feel like a first-class MCP citizen inside Claude Desktop or Claude Code. The pattern is usually the same. The bridge presents itself to Claude Desktop as an MCP server. Internally, it forwards prompts to your local Ollama instance, parses the response, and translates anything that looks like a tool call back into MCP responses. Some of these bridges work well for simple flows. Most of them fall down on multi-turn tool use where the model needs to receive a tool result and then decide what to do next.

My recommendation is to start without a bridge. Get a chat UI working with Ollama directly, get one MCP server connected, and watch the prompt traffic. Once you understand what the tool array looks like in the request payload and what the tool call response looks like coming back, you will be in a much better position to evaluate whether a bridge is solving your problem or just adding a layer of abstraction you do not need.

If you want the exact starter projects I use for this kind of local stack experiment, including the chat UI scaffolding and the MCP wiring, grab them from the open source projects page. I keep them updated as the protocol evolves.

Which tool calls actually work with a local model?

The pleasant surprise from my experiment is that file system style tools work very well. List files. Read a file by path. Append content to a file. Write a new file. These are bread and butter operations and any tool-trained 7 billion parameter model can handle them reliably. The Obsidian MCP server I used exposes exactly these primitives, plus patching, and the Qwen model called them correctly across a multi-turn conversation that included reading three files and writing a fourth.

What does not work as reliably is anything that requires the model to plan a long sequence of calls and reason about partial results. State of the art cloud models will happily orchestrate ten tool calls in a row, hold the intermediate results in mind, and synthesize a final answer. Local 7 to 14 billion parameter models start to drift around call number four or five. They forget which files they have already read. They call the same tool twice with the same arguments. They sometimes invent file paths that do not exist.

The mitigation is to keep your prompts tight and to break complex tasks into smaller chats. I cover this pattern more thoroughly in my piece on sub-agent strategies for local AI coding, which applies equally well to MCP-driven workflows. Treating each MCP-heavy interaction as a focused sub-task instead of one giant orchestration session is the single biggest reliability improvement you can make.

What are the most common pitfalls when troubleshooting?

The number one pitfall is forgetting that the MCP server has to actually be running before the chat starts. Claude Desktop launches its servers automatically. If you are using a custom chat UI with Ollama, you need to make sure your launcher is starting them too. The second pitfall is API keys in the wrong place. Most MCP servers expect their secrets in environment variables, not on the command line. The third is context length. The Obsidian MCP server can dump a lot of markdown into the conversation, and if your local model is configured with a 4096 token context, you will overflow it the moment you read more than two notes. Bump the context length when you load the model.

The fourth pitfall is silent tool support failures. A model card might claim tool use, but the specific quantization you downloaded might have lost that capability. If your model is responding in prose to prompts that should trigger a tool call, swap to a different quantization or a different model entirely before you blame your config. For developers also experimenting with agentic coding workflows, my Apple Xcode agentic coding MCP guide walks through similar troubleshooting patterns in the IDE context.

Is it worth it compared to just using cloud Claude?

I am going to be honest. If raw capability is your only metric, cloud Claude wins. The frontier models are smarter, faster on long chains, and more reliable with tools. What you get from running Ollama with MCP is privacy, zero per-token cost, and the ability to point your AI at sensitive data without sending it across the internet. For my personal knowledge management use case, where the model is reading my private notes and writing back into my vault, that tradeoff is worth it every day of the week. For a production customer-facing system, the calculus is different.

The point of learning this setup is not to replace cloud models. It is to give yourself an option. When you understand the wiring, you can pick the right tool for each job instead of being locked into one provider.

If you want to see the full configuration walkthrough with the Obsidian vault demo, watch the video here: https://www.youtube.com/watch?v=dBSYt-vuEmA

And if you are building local AI systems and want to compare notes with other engineers doing the same, join the AI Engineering community at https://aiengineer.community/join. We share configs, debug each other’s setups, and generally help each other ship.

Zen van Riel

Zen van Riel

Senior AI Engineer | Ex-Microsoft, Ex-GitHub

I went from a $500/month internship to Senior AI Engineer. Now I teach 30,000+ engineers on YouTube and coach engineers toward six-figure AI careers in the AI Engineering community.

Blog last updated