← All posts

How I Actually Use AI Coding Tools Day to Day

A concrete account of what Cursor does well, where I override it, and how the workflow has evolved as the tooling has changed

The AI coding tool discourse tends toward two poles: breathless hype about replacing developers, and dismissive skepticism from people who tried autocomplete once and found it wanting. Neither reflects how these tools actually work in a real development practice.

Here's what I actually do.

The Baseline: Cursor with Auto Mode

I work primarily in Cursor. Most of the time I leave it in Auto mode, which draws from a separate usage pool than manually selecting a model. This matters because the economics are different — Auto is more conservative and tends toward cost-efficient routing, which means I can run more agent interactions before hitting limits.

For routine work — refactoring a component, adding a route handler, updating types after a schema change — Auto mode is fine. The output quality is sufficient and the cost is low. I'm not reaching for Opus 4.7 to write a migration script.

Where I override it: anything that involves meaningful architectural decisions, anything that will be difficult to undo, and anything client-facing that Keith will see and react to. For that work I set the model manually. Opus 4.7 Extra High for the planning and review passes, where reasoning quality actually matters. The execution can ride on a cheaper model.

The Anthropic Cap Problem

I hit Cursor's monthly Anthropic usage cap in the middle of active Composer Catalog development. The cap is a hard block — not a degraded experience, not a billing prompt, just a wall. All access to Opus 4.7 stops until the cycle resets.

This forced a decision I probably should have made earlier. I set up an OpenRouter account, wired in DeepSeek V4-Pro, and configured Cursor's base URL override to route through OpenRouter for non-Anthropic models. The routing is scoped to the OpenAI-compatible request path — native Anthropic and Google models continue using their own pipeline, unaffected.

The economics are striking. A typical agent interaction costs around $0.017 via DeepSeek V4-Pro versus roughly 16x that for equivalent Opus usage. For routine work — the kind where you're iterating on a feature, fixing bugs, making the thing work — the quality difference is not meaningful.

The models I've landed on: V4-Pro as default for routine Composer and chat work, Kimi K2.6 for experimentation, and manual Opus 4.7 escalation reserved for high-stakes decisions and client-facing changes. Cursor Tab autocomplete and Background Agents use Cursor's own model routing and are unaffected by the override.

One quirk: Composer 2 is policy-blocked when a custom API key is configured. I found this out empirically — it returns a clean error. Since V4-Pro handles that workload better anyway, it's not a practical loss.

When I Use the Director System vs Direct Cursor

I have an autonomous build system I call the Director — it runs a Claude API process on a Hetzner server that supervises Cursor agent sessions, issuing instructions turn by turn based on a task spec with acceptance criteria. When it works well, I can start a run, walk away, and come back to a completed feature branch.

The honest answer about when to use it versus working directly with Cursor: it's a shape question, not a size question.

Director runs are worth the setup overhead when the work is heavily spec-driven, the expected runtime is over an hour, and I'd be stepping away from the computer anyway. They're not the right choice when the work has high question-yield — when the agent is likely to hit decision points that benefit from my real-time input. Forcing those situations through Director just means the run stalls or makes locally optimal choices that are globally awkward.

The clearest failure mode I've documented: the Director produces work that passes its own acceptance criteria but misses things a human reviewer catches immediately. This isn't a flaw in the architecture — it's a calibration signal about when spec-driven execution is the right approach versus when you actually need the feedback loop.

For follow-up passes after a Director run — cleanup, testing, a recommendation sweep — I use Cursor directly. The context is already set up, the session is live, and the kind of work involved (judgment calls, code review, fixing specific findings) benefits from human-in-the-loop iteration.

The Stale Branch Problem

Something I discovered the hard way: branches that sit unmerged for more than a few days while other work lands on the trunk can produce silent content loss when you eventually merge them.

The failure mode is specifically with markdown files and documentation — STATUS.md, CHANGELOG.md, append-only files where changes from multiple branches diverge. The auto-merger produces both a visible conflict (awkward concatenation, easy to spot) and a silent failure (sections dropped entirely with no conflict markers). The second type is the dangerous one.

I now treat any branch older than three days as potentially stale. The protocol: rebase on current trunk first, exposing conflicts in per-commit context where they're easier to resolve, rather than merging directly to current. For non-trivial merges, I extract the header structure from the markdown files and diff it against the working tree before committing, specifically looking for silently dropped sections.

This took one significant incident to internalize. It's now just part of how I close out branches.

What Cursor Gets Right

Tab autocomplete, genuinely. The ergonomic improvement from having the next line or next block of obvious code appear as a suggestion — not the agent, just the completion — is substantial and compounds over a day of work. I'd keep Cursor for Tab alone.

The agent for well-scoped tasks with clear success conditions. "Add a loading state to this component," "refactor these four endpoints to use the new auth helper," "update the seed data to match the new schema" — these are high-value uses where the agent is reliably faster than doing it manually and the result is easy to verify.

Where I Override or Ignore It

The agent for tasks with implicit design requirements. When the work involves "make this feel right" or "match the existing visual language" — the agent will produce something that technically satisfies the spec and is aesthetically wrong. I've learned to recognize these tasks early and do them myself.

Planning. The agent is poor at the meta-level question of "what should we build and in what order." It will produce a plan if you ask, but the plan will optimize for completeness over quality and miss the actual constraints. I do planning with Claude in a strategy conversation, not through Cursor agent execution.

Anything that touches a file I haven't looked at in a while. The agent works from the context window it has, and that context can be stale. For files I'm uncertain about, I read them first.

The Honest Tradeoff

Using AI coding tools well requires being willing to not use them. The discipline is knowing which tasks actually benefit from automation and which ones need your judgment — and not defaulting to the agent because it's available.

The economics are now favorable enough that cost isn't the primary constraint anymore. The constraint is attention. Every agent run you kick off that produces output you then have to review carefully costs you something. The right model for this isn't "how much can the agent do" but "what can the agent do well enough that my review is fast."

That's the line I'm continually calibrating.