Get Smart With Stupid Models

The AI companies want you to believe you need to pay up to keep up. Don't buy it. Here's how you can get great output out of "stupid" models.

Get Smart With Stupid Models
Photo by Cash Macanaya / Unsplash

Hey friends 👋,

Anthropic changed the terms on Claude Code Max last Friday. Third-party coding tools – OpenCode, Cline, OpenClaw – are no longer covered by your subscription. Want to keep using Opus? Use their proprietary harness or pay per token.

This is the AI landscape right now. The model you're using can be rug-pulled at any time. There's always a newer, bigger, more expensive one waiting to replace it. And everyone is telling you that if you don't pay up and keep up, you'll be left behind.

I don't buy it. Literally.

For the past 6 months, almost all of my personal coding has been done with small, distilled, open-weight models. By every metric that's supposed to matter, these models are "stupid."

But I've used them to tackle ambitious projects, pick up new languages, and become a better engineer all by focusing less on how well the model thinks and more on how I can use AI to sharpen my own thinking.

Here's how.

Plan Mode ≠ Planning

By now, we all know about "plan mode." You describe what you want, the model "thinks" through the problem, and produces a series of steps it's going to take to build a solution. You review it, make adjustments, then let it execute.

This is not planning. This is the illusion of planning. And the bigger the model, the more convincing that illusion gets. Opus can produce a plan that reads like it was written by a senior engineer. It's coherent, it's detailed, it has a test plan and addresses edge cases.

The smaller models can't really do that. Their plans are obviously shallow – they miss things, hand-wave, and the moment you push back they hit you with a "You're absolutely right. I made a mistake".

Kimi K, you're killing meeeh.

But this limitation is a feature, not a bug.

Because planning isn't an output. It's a process. The whole point is to surface the things nobody's thought through yet – your constraints, your assumptions, the load-bearing decisions you haven't identified.

A big model can guess confidently enough that you don't feel the need to do that work yourself. A small model can't, which forces you to think.

That's why I write plans manually first. I open a text editor and explain to myself how a system should work and why. Once I have a loose understanding, I bring in an agent.

I have a persona called Architect whose only job is to read my docs and push back – find ambiguities, untested assumptions, aspects that aren't clear. A small, fast model is great for this.

When I was building the storage layer for ebb, I wrote an architecture proposal under the assumption that SQLite could handle my write throughput targets. Architect highlighted this assumption and pushed me to validate it.

I vibe coded a quick benchmark test and found out that SQLite wouldn't cut it. It cost me $0.14 in tokens and saved me weeks of going down the wrong path.

Big models make it easy to skip the thinking. Small models don't let you. And thinking is the point of planning.

Pair Programming, Not Vibe Coding

So if I've done the hard thinking, I've tested my assumptions, and I have a solid plan, why not then just dispatch the execution to a large model like Opus? Go full gas town with an agentic loop that reads the plan, builds, tests, reviews the code, all while I sleep.

Because that assumes the thinking is done and what's left is mechanical. But implementation surfaces its own layer of decisions – Should this process restart on failure or let the supervisor handle it? Does this function need to be synchronous or can it cast? What happens if this message arrives before that one?

The answers to these emergent questions aren't in the spec. And answering them them well is the difference between a system that works and one that works until it doesn't.

If you hand all of that to a model, you don't just risk bugs. You risk losing understanding. And that compounds – the more you automate, the less you grasp, the more you need to automate because you can no longer make those calls yourself.

That's not leverage. That's dependency.

A small model never lets you get there. It needs you in the loop – which means you stay close to the code and keep understanding your system as it grows.

So I pair program with the model as the "driver". The model handles the mechanical work – getting the syntax right, wiring up modules, writing the boilerplate. The stuff that's slow to type but not hard to think about. I handle the thinking. I catch when it reaches for the wrong pattern or misunderstands the data flow. I adjust, redirect, sometimes scrap a file and re-prompt with more context.

Now, I'm not a purist about this. For proofs of concept, experiments, anything where the goal is to learn fast – I let the agent rip. The marketing site for ebb and the first prototype were completely vibe coded.

But when the prototype worked well enough that I decided to build it for real – a real-time sync engine where I care about write throughput, conflict resolution correctness, and not losing people's data – I want to stay in the driver's seat and truly own every decision.

Now, I'm not perfect – I'm writing Elixir for the first time on this project. And sometimes the model produces code that I genuinely can't classify as right or wrong.

That's when I stop generating and start learning.

A Tutor That Never Tires

More often than I'd like to admit, the model generates code that does what I've asked – the tests pass, the behavior seems correct – but I don't fully understand what's been written.

The easy thing to do in this situation is just to ask, "Explain this code." The model generates a walkthrough, you read it, nod, move on. But reading an explanation isn't the same as understanding something. Understanding is proven when you can explain it yourself.

That's why I have another persona called Aristotle that acts as my Socratic tutor. It breaks concepts down, grounds explanations in my actual code, and checks that things are landing by asking me to reason through problems instead of just nod along.

This came up constantly while building ebb's storage layer. The model would generate code that worked, but I'd notice inconsistencies I couldn't explain. In one module it would use :atoms as keys, in another it would use "strings". Both versions passed the tests. I had no idea why it was choosing one over the other.

So I stopped and pulled up Aristotle. Instead of just explaining the difference, it asked me what I thought was happening – why I thought the distinction might matter. Then it walked me through the runtime angle: atoms in Elixir aren't garbage collected. They live in a global table for the lifetime of the VM. If you're creating them dynamically from external data, you can eventually crash the system. Strings don't have that problem.

But it didn't stop there. Aristotle pushed me to think about when atoms are the right choice – how pattern matching on atoms is faster and more expressive, and how ebb's internal data model had places where a fixed set of known keys made atoms the better fit. It asked me to identify which parts of the codebase should use which, and why, based on where the data was coming from.

That fifteen-minute conversation made me a better reviewer of every piece of Elixir the model generated after that. Because here's the thing – the model kept mixing up atoms and strings in later sessions. It didn't learn. But I had.

A small model can do this just as well as a big one. Aristotle doesn't need deep reasoning. It just needs to push me to reason deeply.

This rhythm of generate, review, stop, work through it, go back is much slower than just accepting everything and moving on. But every time I stop and work through a concept, I understand the implementation better. And that understanding compounds – each thing I learn makes me a sharper reviewer for the next thing the model produces.

You could argue this won't matter for long – that models will get good enough that you won't need to catch their mistakes. Maybe. But that assumes building software is just about producing code. I don't think it is.

Building software is not a solvable problem. It's a way of solving problems. And we still need to understand the solutions we come up with.

The model is just a mirror. If your thinking is sharp, the output is sharp. If it's not, no parameter count in the world will save you.

I'm not anti-big-model. I use Opus at work every day. When you bring these same practices to a frontier model, the output is better. But a small model forces the discipline on you. A big one won't. You have to bring it yourself.

Everything's changing. Nothing's changed.

Every week there's a new model. A new agentic framework. A new CEO going on a podcast to declare that software engineering as we know it is dead – that within 6 months from one year ago, AI will write all the code and developers will be destitute.

But the Anthropic plan you signed up for last month already has different terms. And the tool you learned last quarter is already legacy.

It's a lot of noise. And if you take it all at face value, the rational move is to chase the frontier – always use the biggest model, always adopt the newest tool, always stay on the bleeding edge because the ground is shifting under your feet.

But here's what I keep coming back to: I get great output out of a mediocre model because nothing I've described in this post is new.

Writing things down before you build them. Testing your assumptions. Understanding the code. Learning the tools and concepts you're working with instead of abstracting them away. These aren't "AI workflows". They're just engineering. They're the fundamentals that good developers have always relied on.

AI didn't change what it means to be a good engineer. It just made being a bad engineer more accessible. It raised the importance of understanding what you're building. If you do, a cheap model can get you remarkably far. If you don't, the most expensive model in the world will just help you build the wrong thing faster.

The incentive is to think less. The opportunity is to think more.

I'm not sharing my workflow because I think everyone should copy it. I'm sharing it because there's a narrative out there telling developers that technical understanding and skill doesn't matter anymore – that the model will handle it. And I don't think that's true.

The fundamentals remain fundamental

I'd love to hear how you work with AI – whether you're chasing the frontier, deliberately staying small, or somewhere in between. Leave a comment or hit reply to this email. Let's compare notes.

Until next time,
Drew