Conversion & Code

November 18, 2025

The last three months, I’ve been AI pilled. I’m about 6 months late, which absolutely sucks but is also awesome because you get to soak up a lot of YouTube videos of best practices.

After working on my Big Idea for years, I decided to start an entirely new, tightly scoped project for my adventures into agentic coding. It started off amazing: AI is great at bootstrapping, to the point that all these bootstrap.

It’s been going decently well. My understanding/control of my repo has been gradually slipping away, but so it goes boiling a frog. During Thanksgiving weekend, I basically let Claude go hog wild in my repo, firing off massive specs and then walking off to spend time with family, and then just feeling too exhausted to really review anything and merging it all in. It’s December now and I’m 100% feeling the slop hangover.

ted is sorta good enough for now, so I’m thinking of just letting the embers smolder for now, but I definitely have learned a few things.

Context

Context window has become everything. These little machines truly are box tickers and when reviewing their code changes, you’ll be amazed by the redundant variables they invent in their mad dash to get to the finish line. I finally understand why people are defining code review agents: AI isn’t dumb, it’s just messy and lazy.

Agents are good at “connect the dots”. Except code is the coast of Britain and your agent’s the first mate and is a bit nearsighted. You’ve got a destination in mind, but it’s less a GPS coordinate and more a way to Shangri-La overheard at the pub.

Spec’ing

Writing specs is crucial now. The spec is the map, but it isn’t the destination because you don’t know the destination. Most of the time, your destination is constantly being shaped as you build.

This is why agents are so damn good at cloning existing apps and so terrible at drawing Simon’s pelican riding a bicycle. Existing apps are the fully articulated map. Yes, agents still need to route and navigate us there, but it’s way clearer than the feature you and a customer hashed out over a Zoom call last week.

Babysitting

AI babysitting is the most frustrating part of using AI. I tend to spend way too much time thinking, “Is there something I could have done to have avoided this mess?”

Yes, Eric. You could have thought more in the beginning. Or you could think more now. Stop thinking about how you can think less…and just do the damn work.

For instance, in ted, changing tracking sounded great, but the added complexity only became clear once I started implementation and started sketching out another feature: read-only columns for tables or views.

Because sorting is better done at the SQL layer, by the time values make their way to ted, they’re comparable but not ordered.

That sounded awful, until I realized that you need to scan a page from the database no matter what: you need it for the next refresh, as the rendered view will show the diff’s, not the previous row data. Once you have all of the previous and current rows, diff’ing becomes a lot easier.

Maybe I could have thought of that to begin with (I definitely felt stupid when I realized my error), but the process of building something is the clarifying process. You will always learn and change when actually fitting a design into its environment.

To that extent, ideas like React Grab seem directionally correct, letting you edit the design in its end environment. Bret Victor articulated this point beautifully: switching between contexts is hard for humans and AI alike, but switching between visual, symbolic, and data modes of thinking is particularly taxing for humans. The more that this idea distillation phase can be more native to the end-result-environment, the better.

Testing

Testing is more important than ever, especially E2E tests. You’re wearing the tech lead hat for a team of 5 remote, junior engineers. Code review agents will help, but as things scale up to agent clusters, E2E tests and acceptance testing is probably the best control and monitoring point.

I messed this up. I choose tview over bubbletea because I really wanted a bar cursor, yet bubbletea’s rendering system made that impossible. It’s clear that was a mistake now: most people don’t actually give a shit about the bar cursors and tview’s lack of end-to-end testing is awful for agentic coding.

Tools like Playwright or Kernel are going to be more essential than ever in this.

Future

I have a feeling that this post will look foolish in a year or even six months. For example, I still haven’t even touched subagents, so while I nod along to the YouTube videos, it’s almost 2026 and I’m still not using agent clusters and I don’t really get “agent fleets.”

I do believe that this productivity gain will create an era of “fast fashion SaaS”. When engineering is 5x more productive, companies should just have more engineers automating non-engineering jobs. How many integrations would improve GTM teams but engineering was unwilling to do them?

Context is a solvable problem, either with better/more hardware or new techniques, so it’s clear that agents will only get better and better at connecting more distant dots.