What I Learned Running Local AI as My Daily Driver for a Year

A year ago I made a decision that most engineers would call either brave or stupid. I unplugged from cloud AI as much as possible and tried to run local models as my primary daily driver. No ChatGPT tab open in the background. No reflexive reach for Claude when something got hard. Just my RTX 5090, a stack of open weights models, and the quiet hum of a tower that doubles as a space heater in winter.

I want to give you the unvarnished retrospective. Not the marketing version where everything works and local AI is the future. Not the cynical version where I declare the whole experiment a waste. The truthful version, with the wins, the regrets, the dollars spent, the dollars saved, and the moments where I quietly opened a cloud API tab because the work needed to ship.

If you have been thinking about going local, or you are evaluating whether the skill is worth your time, this is the report I wish someone had handed me twelve months ago.

Why did I switch to local AI in the first place?

I did not switch because local was better. I switched because I wanted to know what I did not know. Two years ago I was using ChatGPT and GitHub Copilot like everyone else and I had no idea how any of it worked under the hood. That bothered me. I am the kind of engineer who wants to be able to explain the stack I am standing on, and renting tokens from someone else’s GPU was starting to feel like driving a car you cannot open the hood of.

The second reason was career oriented. I had been watching the edge AI numbers quietly grow into something serious. Twenty five billion this year, projected to hit one hundred and forty three billion by 2034 at a twenty one percent growth rate. Multiple research firms landed on the same trajectory independently. Meanwhile, eighty four percent of developers use AI tools, but only eighteen percent are actually building AI integrations, and three quarters of them said they have no plans to touch deployment and monitoring. That is a screaming asymmetry. If you want to read the longer version of that argument, I wrote it up in how local AI is shaping software engineering careers.

So I bought the 5090, set up LM Studio, started downloading every open weights model I could fit into VRAM, and committed to running my actual workflow on local infrastructure for as long as I could stand it.

What changed in my daily workflow?

The biggest shift was psychological, not technical. When you are paying per token, you ration. You write the prompt carefully, you batch your questions, you close the tab when you are done. When the model is on your own machine, that scarcity disappears. I started asking models things I would never have paid to ask. Half-formed questions. Rubber duck monologues. Long rambling thinking-out-loud sessions where I just wanted something to react.

That changed how I think. I have written before about unlimited AI coding sessions on local models and the post-scarcity feeling is real. The downside is that you can fall into a trap of asking the model everything instead of thinking. I had to teach myself to use abundance with discipline.

The second shift was that my transcription pipeline went fully local and never came back. Every video on my YouTube channel now runs through Faster Whisper with Large V3 Turbo. The raw transcript comes out, and then a local LLM cleans up filler words and pulls out key insights so I can build the next video faster. That two stage pipeline runs entirely on my hardware and produces results that match any cloud service I have tested. It is the single highest value local workflow I have built and it would have cost me real money to run in the cloud at my volume.

Image generation followed the same path. Once you have decent local image models humming, you stop reaching for paid APIs for thumbnails, blog hero images, and quick visual mockups. The quality is not always frontier, but it is plenty for most production work, and the iteration speed is faster because you are not waiting on a queue.

Where did local crush cloud?

The pattern that emerged is almost embarrassingly clear. Local crushes cloud on the boring, well defined, repetitive workloads. Speech to text. Document processing. Image generation. Code autocomplete on small to medium projects. Classification. Embedding generation. Anything where you call the model thousands of times a day on similar shaped inputs.

Three categories where I never went back to cloud:

Transcription. Whisper local just works, and at my volume it would have cost hundreds of dollars per month on a hosted service. On my own hardware it costs the electricity to run the GPU, which I will detail below.

Code autocomplete in my editor. I run a Qwen model through Continue Dev wired into LM Studio. It is not as smart as the frontier models, but autocomplete is a latency game more than an intelligence game. The local model responds before I finish thinking, and the suggestions are good enough often enough that I stopped paying for hosted autocomplete.

Privacy sensitive drafts. Anything I would feel weird putting into someone else’s API, financial notes, half-formed business ideas, drafts about clients, all of that runs on my machine. The peace of mind alone is worth the setup cost.

This pattern lines up with what I see in the enterprise market. Almost half of all enterprises already use a hybrid cloud edge architecture. The boring well defined use cases, that is exactly what hospitals, banks, and defense contractors need to keep on private infrastructure. If you want a deeper look at when each side wins, I broke it down in the local versus cloud LLM decision guide.

Where did I crawl back to the cloud API?

I want to be honest about this part because the local AI community has a tendency to oversell. There are jobs where I tried to stay local, gritted my teeth, lost an afternoon, and quietly opened a cloud tab.

Long context coding on real projects was the biggest one. I built a full stack app with Claude Code pointed at local models through LM Studio. Local models work for this, until they do not. The context window fills up, inference gets slow, the model starts making mistakes, and suddenly I am spending more time debugging the model’s output than building the app. On a serious codebase with real complexity, frontier cloud models still pull ahead by a margin that matters. I covered this in more depth in the local AI coding reality check, but the short version is that for full-blown agentic coding on production systems, I am still on cloud.

Multi tool agents are the second place I tap out. Local models get confused the moment you give them more than a couple of tools. The instruction following just is not there yet at sizes that fit on a consumer GPU. If I am building an agent that needs to coordinate five or six tools and reason about state across them, I use a frontier model.

Frontier reasoning tasks are the third. When I need the absolute best at a hard problem, gnarly architecture decisions, tricky math, novel debugging, the gap between local and frontier is still real. I do not pretend otherwise.

So my year ended in a hybrid setup. The boring volume work runs local. The frontier intelligence work runs on cloud APIs. That is the same hybrid pattern enterprises are converging on, and I think it is the honest answer for most engineers.

What were my hardware regrets?

The 5090 was the right call. The regret is that I did not buy more VRAM sooner. I spent the first few months trying to make smaller models work on a smaller card before I upgraded, and almost every limitation I hit was VRAM, not compute. If you are buying hardware for local AI today, optimize for VRAM first, raw speed second.

The second regret is cooling. I underestimated how aggressively a GPU running models all day would heat my office. I had to add a second case fan, then a small standalone fan, and eventually move my desk to give the tower more breathing room. Plan your physical setup before you plan your software setup.

The third regret is that I bought too much storage too late. Open weights models are big and you will accumulate them faster than you expect. I burned a weekend shuffling files across drives because I had not planned for the model graveyard.

What were my model regrets?

I spent too long chasing the largest model I could fit. The intuition that bigger is better breaks down faster than you would think. A well chosen smaller model running fast at full precision often beats a big model running slow at aggressive quantization. I learned to test models on my actual workloads instead of trusting benchmarks, and I downsized more often than I upsized.

I also spent too long sticking with a single model for everything. The right answer was building a small roster. A fast model for autocomplete, a stronger model for chat, a specialized model for transcription cleanup, a vision model for image work. Routing the right task to the right model gave me more leverage than chasing one model that does it all.

If you want a head start on what is worth running locally, I have over fifteen local AI projects you can get for free over on the open source page. It is the fastest way to see what works without doing a full year of trial and error like I did.

What did it actually cost me?

Let me put real numbers on this.

Hardware. The 5090 build came in around three thousand five hundred dollars all in, including the case, power supply, additional cooling, and storage upgrades. Spread over a year that is roughly two hundred and ninety dollars per month, though I expect to keep this rig for at least three years, which brings the effective cost down to around one hundred dollars per month.

Electricity. The GPU under sustained load draws meaningful power. I estimate the local AI work added around twenty to thirty dollars per month to my electric bill. Not free, but trivially small compared to the hardware amortization.

Cloud API spend. This is where it gets interesting. The year before, my combined cloud AI spend was running around three hundred to four hundred dollars per month between coding tools, transcription services, image generation credits, and miscellaneous API calls. After going local, my cloud spend dropped to around eighty dollars per month, mostly for the frontier coding work I refused to give up.

Net math. I save somewhere between two hundred and three hundred dollars per month on cloud spend. Hardware amortizes at one hundred per month. Electricity adds twenty five. Net savings of roughly seventy five to one hundred and seventy five dollars per month, which means the rig pays for itself somewhere in year two.

The real return is not the dollars though. It is the skill. Running local AI for a year taught me how models actually behave, where they break, how to deploy them, how to tune them for specific hardware, and how to debug inference problems that no cloud user ever has to think about. That skill set is rare, the market is expanding fast, and it shows up in compensation. I covered the broader picture in the AI engineer salary complete guide, but the short version is that engineers who can deploy on private infrastructure earn meaningfully more than engineers who only call APIs.

Would I do it again?

Yes, with the caveats I just gave you. I would buy more VRAM sooner. I would build a model roster instead of chasing one giant model. I would accept the hybrid reality from day one instead of pretending I could be one hundred percent local. And I would treat the year as a deliberate investment in a skill that almost nobody else is building, because that turned out to be the actual prize.

If you take one thing from this retrospective, take this. Local AI does not need to beat cloud at everything. It needs to beat cloud at the boring volume work that enterprises cannot send off premises, and it needs to teach you how the stack actually works. Both of those are achievable on consumer hardware today, and both of those compound into a career advantage that is still wildly underpriced in the labor market.

If you want to see the deeper version of this story on video, the original walkthrough lives over on my YouTube channel. And if you want to actually start building local AI projects with people who are doing the same thing, come join us inside the AI Engineer Community at aiengineer.community/join. The next year of your career might look very different on the other side of that decision.

Zen van Riel

Senior AI Engineer | Ex-Microsoft, Ex-GitHub

I went from a $500/month internship to Senior AI Engineer. Now I teach 30,000+ engineers on YouTube and coach engineers toward six-figure AI careers in the AI Engineering community.

Blog last updated Jul 7, 2026