<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Elevate]]></title><description><![CDATA[Addy Osmani's newsletter on elevating your effectiveness. Join his community of 600,000 readers across social media.]]></description><link>https://addyo.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!8WxC!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3704470-b6d5-48a9-a9d1-564bd833fc5c_1280x1280.png</url><title>Elevate</title><link>https://addyo.substack.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 01 May 2026 06:58:35 GMT</lastBuildDate><atom:link href="https://addyo.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Addy Osmani]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[addyo@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[addyo@substack.com]]></itunes:email><itunes:name><![CDATA[Addy Osmani]]></itunes:name></itunes:owner><itunes:author><![CDATA[Addy Osmani]]></itunes:author><googleplay:owner><![CDATA[addyo@substack.com]]></googleplay:owner><googleplay:email><![CDATA[addyo@substack.com]]></googleplay:email><googleplay:author><![CDATA[Addy Osmani]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Long-running Agents]]></title><description><![CDATA[A long-running AI agent can keep making progress over hours, days, or weeks.]]></description><link>https://addyo.substack.com/p/long-running-agents</link><guid isPermaLink="false">https://addyo.substack.com/p/long-running-agents</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Thu, 30 Apr 2026 14:30:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FqTC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p><strong>A long-running AI agent can keep making progress over hours, days, or weeks. It can do this across many context windows and sandboxes, recover from failure, leave structured artifacts behind, and resume where it left off.</strong></p></blockquote><p>For two years the dominant image of an &#8220;AI agent&#8221; has been a chat window with a clever loop in it. You type a goal, the agent calls some tools, you watch tokens stream by, you stop watching when the work runs out of patience or the context window fills up. That paradigm got us a long way, but it has a ceiling. The model forgets. It declares &#8220;task complete&#8221; when it isn&#8217;t. It re-introduces a bug it fixed nine turns ago. The whole thing is structured around a single sitting.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4O50!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4O50!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4O50!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4O50!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4O50!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4O50!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg" width="1375" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1375,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Long-running AI agents&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Long-running AI agents" title="Long-running AI agents" srcset="https://substackcdn.com/image/fetch/$s_!4O50!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4O50!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4O50!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4O50!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda5ccdb-7770-425c-9e92-c72938025a32_1375x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Long-running agents are what comes next. The idea is easy to state: an agent that keeps making forward progress on a goal across many sessions and many sandboxes, possibly many days or weeks, while leaving the workspace clean enough that the next session can pick up where the last one left off. The engineering is harder. You have to solve for persistence, recovery, and verification in a way that doesn&#8217;t just paper over the cracks. You have to build a state layer that lives outside the model&#8217;s context window, and you have to design the handoff between sessions so the agent doesn&#8217;t lose its mind when it wakes up and finds itself in a different sandbox with a different context window.</p><p>This post is my attempt to lay out what&#8217;s changed, who&#8217;s pushing on it, and how an engineer can use long-running agents today without writing the whole thing from scratch.</p><div><hr></div><h2><strong>What &#8220;long-running&#8221; actually means</strong></h2><p>&#8220;Long-running&#8221; gets used to mean at least three different things in practice, and it helps to keep them separate.</p><p><strong>Long-horizon reasoning.</strong> The agent has to plan and execute over many dependent steps. This is mostly a model-quality story: coherence, planning, the ability to recover from a wrong turn ten steps ago. METR has been tracking this with their <em>time horizon</em> metric, which estimates how long a task a frontier model can complete with 50% reliability. The headline finding is that the metric has been <a href="https://metr.org/time-horizons/">doubling roughly every seven months</a> since 2019, and their <a href="https://metr.org/blog/2026-1-29-time-horizon-1-1/">TH1.1 update</a> earlier this year doubled the count of 8-hour-plus tasks in the eval set. <strong>If that curve holds, frontier agents complete tasks at the day scale by 2028 and the year scale by 2034.</strong></p><p><strong>Long-running execution.</strong> The agent&#8217;s <em>process</em> runs for hours or days. Maybe it&#8217;s a coding job, maybe it&#8217;s a research sweep, maybe it&#8217;s a 24/7 monitoring service. The model might be invoked thousands of times across the run. This is mostly a <em>harness</em> story, and it&#8217;s the one this post is mostly about.</p><p><strong>Persistent agency.</strong> The agent has an identity that outlives any single task. It accumulates memory, learns user preferences, and is always available. This is the <a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview">Memory Bank</a> flavor of long-running.</p><p>In practice the three blur together. A real production agent does long-horizon reasoning <em>inside</em> a long-running execution <em>backed by</em> persistent agency. But the engineering problems are different in each, and so are the products that solve them.</p><div><hr></div><h2><strong>Why this matters</strong></h2><p>There are two reasons I believe this work matters a lot right now.</p><p>The first is a phase change in what&#8217;s economically feasible to delegate. An agent that runs for ten minutes can answer a question, summarize a doc, fix a small bug. An agent that runs for ten hours can own an entire feature, finish a migration that was on the backlog for six quarters, or do the kind of overnight research sweep that used to require a junior analyst. One of Anthropic&#8217;s <a href="https://www.anthropic.com/news/claude-sonnet-4-5">Claude Sonnet announcements</a> put concrete numbers on this last fall: 30+ hours of autonomous coding in internal tests, including <a href="https://venturebeat.com/ai/anthropics-new-claude-can-code-for-30-hours-think-of-it-as-your-ai-coworker">one run</a> that produced an 11,000-line Slack-style app. <strong>That&#8217;s already past the threshold where the answer to &#8220;should I delegate this?&#8221; is no longer obvious.</strong></p><p>The second is that persistence changes what the agent <em>is</em>. A stateless agent answers your question and disappears. A long-running one accumulates context: which competitor moved which way last week, which test flaked twice on Tuesday, what you usually mean by &#8220;the dashboard.&#8221; Anthropic&#8217;s <a href="https://www.anthropic.com/research/project-vend-1">Project Vend</a> was the most public early demonstration of this. They had a Claude instance run an actual office vending business for a month, managing inventory, setting prices, talking to suppliers. It failed in informative ways, and <a href="https://www.anthropic.com/research/project-vend-2">the second phase</a> ran much better, but the point wasn&#8217;t profitability. The point was watching what kinds of weird coherence problems show up when an agent has to maintain identity across weeks instead of turns.</p><p>Those are the same problems every team building production agents now hits.</p><div><hr></div><h2><strong>The three walls every long-running agent hits</strong></h2><p>Three walls show up in basically every write-up I&#8217;ve read this year.</p><p><strong>Finite context.</strong> Even a 1M-token window fills. And <a href="https://addyosmani.com/blog/agent-harness-engineering/">context rot</a>, the steady degradation of model performance as the window gets full, kicks in well before the hard limit. A 24-hour run is not going to fit in any context window the field has on its roadmap. Something has to give.</p><p><strong>No persistent state.</strong> A new session starts blank. Anthropic&#8217;s framing in their <a href="https://www.anthropic.com/research/long-running-Claude">scientific computing post</a> is the cleanest version I&#8217;ve seen: <em>&#8220;imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no memory of what happened on the previous shift.&#8221;</em> Without an explicit persistence story, every shift change is a productivity disaster.</p><p><strong>No self-verification.</strong> Models reliably skew positive when they grade their own work. Asked &#8220;are you done?&#8221; they answer &#8220;yes&#8221; more often than they should. Without a separate signal that the work meets a bar, you get the agent that ships at 30% complete with full confidence.</p><p>Long-running agent designs are mostly answers to these three problems. <strong>The major labs have converged on similar shapes of answer, but with very different surface area.</strong></p><div><hr></div><h2><strong>The Ralph loop: one of the simpler practitioner versions of long-running agents</strong></h2><p>The <strong>Ralph loop</strong> (sometimes called the Ralph Wiggum technique) is one of &#8220;simpler&#8221; practitioner version of long-running agents, popularized by <a href="https://ghuntley.com/ralph/">Geoffrey Huntley</a> and <a href="https://github.com/snarktank/ralph">Ryan Carson</a>. The reference implementation is <a href="https://ghuntley.com/ralph/">literally a bash script</a> that loops:</p><ol><li><p>Pick the next unfinished task from a list (<code>prd.json</code> or equivalent).</p></li><li><p>Build a prompt with the task, the relevant context, and any persistent notes.</p></li><li><p>Call the agent.</p></li><li><p>Run tests or other checks.</p></li><li><p>Append what happened to <code>progress.txt</code>.</p></li><li><p>Update the task list (done, failed, blocked).</p></li><li><p>Go back to step 1.</p></li></ol><p>The reason it works is the same reason any of the harnesses below work: state lives outside the agent&#8217;s context. <code>prd.json</code> is the plan, <code>progress.txt</code> is the lab notes, <code>AGENTS.md</code> is the rolling rulebook. <strong>The agent itself is amnesiac, but the filesystem isn&#8217;t.</strong> Each iteration starts fresh and reads enough state from disk to keep going. Carson&#8217;s <a href="https://github.com/snarktank/compound-product">Compound Product</a> extends the idea by chaining multiple loops (an analysis loop that reads daily reports, a planning loop that emits a PRD, an execution loop that writes the code), which is roughly the open-source version of the planner-generator-evaluator triad Anthropic landed on independently.</p><p>I went deeper on all of this in <a href="https://addyosmani.com/blog/self-improving-agents/">Self-improving agents</a>: task list structure, progress files, QA gates, monitoring, the failure modes you&#8217;ll actually hit. The short version is that you can build a working long-running agent in an evening with a bash script and a JSON file. Most of what Google and Anthropic have productized is the work of making this pattern recoverable, secure, and observable at scale.</p><p>The big-lab stories below are different ways of paying for that production-readiness.</p><div><hr></div><h2><strong>Anthropic: harnesses, then the brain/hands/session split</strong></h2><p>Anthropic has been the most public about the engineering. Two posts are worth reading end-to-end.</p><p>The first is <a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents">&#8220;Effective harnesses for long-running agents&#8221;</a>, which lays out a two-agent harness for autonomous full-stack development. An <strong>initializer agent</strong> runs once at the start of a project to set up the environment, expand the prompt into a structured <code>feature-list.json</code>, and write an <code>init.sh</code> that future sessions will run on boot. A <strong>coding agent</strong> is then woken up over and over, each session asked to make incremental progress on one feature, run tests, leave a <code>claude-progress.txt</code> note, and commit. A test ratchet (<em>&#8220;it is unacceptable to remove or edit tests because this could lead to missing or buggy functionality&#8221;</em>) sits in the prompt to stop the very common failure of an agent deleting failing tests to &#8220;make them pass.&#8221; <a href="https://www.infoq.com/news/2026/04/anthropic-three-agent-harness-ai/">InfoQ&#8217;s writeup</a> extends this into a planner, generator, and evaluator triad, on the same logic that separating generation from evaluation matters because models grade their own work too generously.</p><p>The second is <a href="https://www.anthropic.com/engineering/managed-agents">&#8220;Scaling Managed Agents: Decoupling the brain from the hands&#8221;</a>, the architectural post behind <a href="https://platform.claude.com/docs/en/managed-agents/overview">Claude Managed Agents</a> (Anthropic&#8217;s hosted runtime, launched in early April). The argument is that an agent has three components that should be independently replaceable. The Brain is the model and the harness loop that calls it. The Hands are sandboxed, ephemeral execution environments where tools actually run. The Session is an append-only event log of every thought, tool call, and observation.</p><p>This sounds abstract and it isn&#8217;t. Anthropic&#8217;s framing: <em>&#8220;every component in a harness encodes an assumption about what the model can&#8217;t do on its own.&#8221;</em> When you couple them, an assumption that goes stale (e.g., the model used to need an explicit planner and now plans natively) means the whole system has to change at once. When you decouple them, the harness becomes stateless, sandboxes become <em>cattle, not pets</em>, and a brain crash doesn&#8217;t lose the run. A fresh container calls <code>wake(sessionId)</code> and reconstitutes the state from the log. They reported <a href="https://www.anthropic.com/engineering/managed-agents">time-to-first-token dropped ~60% at p50 and over 90% at p95</a> just from being able to start inference before the sandbox is ready.</p><p><strong>The session-as-event-log idea is the part most teams underappreciate.</strong> It is what makes a long-running agent recoverable. Without it, a container failure is a session failure and you&#8217;re debugging into a stale snapshot. With it, the agent&#8217;s memory is a queryable artifact that lives outside whatever process happens to be running at the moment.</p><p>For the scientific computing crowd, Anthropic&#8217;s <a href="https://www.anthropic.com/research/long-running-Claude">long-running Claude post</a> reduces all of this to a simpler stack: <code>CLAUDE.md</code> as a living plan the agent edits as it learns, <code>CHANGELOG.md</code> as portable lab notes, <code>tmux</code> plus <code>SLURM</code> plus <code>git</code> as the execution and coordination layer, and the <strong>Ralph loop</strong>, a <code>for</code> loop that kicks the agent back into context whenever it claims completion and asks if it&#8217;s <em>really</em> done. Their flagship case study is a Boltzmann solver Claude Opus built over a few days that reached sub-percent agreement with a reference CLASS implementation. Months-to-years of researcher time, compressed.</p><p>Same patterns across all three posts: an explicit plan file, an explicit progress file, structured handoffs between sessions, separate generation from evaluation, and a loop that refuses to let the agent stop early.</p><div><hr></div><h2><strong>Cursor: planners, workers, judges</strong></h2><p><a href="https://cursor.com/blog/scaling-agents">Cursor&#8217;s &#8220;Scaling long-running autonomous coding&#8221;</a> is the other essential read this year. They walked into walls that Anthropic mostly papered over.</p><p>Their first attempt was a flat coordination model: equal-status agents writing to shared files with locks. It became a bottleneck and made the agents risk-averse, churning rather than committing. Their second attempt swapped locks for optimistic concurrency control, which removed the bottleneck but didn&#8217;t fix the coordination problem. The third design is what&#8217;s running in production now and what they describe as solving most of the problem:</p><ul><li><p><strong>Planners</strong> continuously explore the codebase and emit tasks. They can recursively spawn sub-planners.</p></li><li><p><strong>Workers</strong> are focused executors. They don&#8217;t coordinate with each other and they don&#8217;t worry about the big picture.</p></li><li><p><strong>Judges</strong> decide when an iteration is finished and when to restart.</p></li></ul><p>Two things stand out from the post. One: <em>&#8220;a surprising amount of the system&#8217;s behavior comes down to how we prompt the agents&#8221;</em> more than the harness or the model. Two: different models slot into different roles. Their reported finding is that a GPT model was better than Opus for <em>extended autonomous work</em> specifically because Opus tended to stop early and take shortcuts. <strong>Same task, different role, different model.</strong> The matching is becoming part of the design surface.</p><p>This pairs with <a href="https://cursor.com/blog/composer">Composer 2</a> (their proprietary frontier coding model that ships in <a href="https://cursor.com/changelog/2-0">Cursor 3</a>) and their <strong>background cloud agents</strong>: long-running tasks that run on Anysphere&#8217;s cloud infrastructure rather than your laptop. Eight-hour refactors and codebase-wide migrations survive a closed lid. You can start a task locally, hit <em>run in cloud</em> when you realize it&#8217;ll take 30 minutes, and re-attach later from your phone. Each agent runs in an isolated git worktree and merges back via PR. The handoff between local and remote is the part most teams haven&#8217;t figured out yet, and Cursor&#8217;s bet is that it has to be its own product surface.</p><p>The shape ends up close to Anthropic&#8217;s: roles are split, sessions are durable, judges sit beside the worker, and a long task runs in a cloud sandbox with git as the coordination substrate.</p><div><hr></div><h2><strong>Google: long-running agents on the Agent Platform</strong></h2><p>Google&#8217;s announcement at <a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform">Cloud Next &#8216;26</a> two weeks ago folded Vertex AI into the <strong>Gemini Enterprise Agent Platform</strong> and turned long-running agents into a named product, with named SLAs.</p><p>The pieces that matter for this post:</p><ul><li><p><strong>Agent Runtime</strong> supports agents that <em>&#8220;run autonomously for days at a time&#8221;</em> with sub-second cold starts and on-demand sandbox provisioning. The launch post&#8217;s example use case is a sales prospecting sequence that takes a week to play out, which is roughly the right shape for it.</p></li><li><p><strong>Agent Sessions</strong> persist conversation and event history. You can pin them to a custom session ID that maps to your own CRM or DB record, so the agent&#8217;s state lives next to the business state instead of in a separate AI silo.</p></li><li><p><strong><a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/scale/memory-bank">Agent Memory Bank</a></strong> is the persistent long-term memory layer, generally available as of Next &#8216;26. It curates memories from sessions, scopes them to a user identity, and exposes a search API so the next agent invocation can pull what&#8217;s relevant. Payhawk reported that auto-submitting expenses through a Memory-Bank-backed agent cut submission time by over 50%.</p></li><li><p><strong>Agent Sandbox</strong> handles hardened code execution.</p></li><li><p><strong>Agent-to-Agent Orchestration</strong>, <strong>Agent Registry</strong>, <strong>Agent Identity</strong>, <strong>Agent Gateway</strong>, <strong>Agent Observability</strong>, and <strong>Agent Simulation</strong> cover basically every operational concern you&#8217;d otherwise build by hand for a production fleet, including the cryptographic-identity-and-audit-log story enterprises actually need to ship.</p></li></ul><p>Architecturally this is the same brain/hands/session split Anthropic described, just productized at platform scale and bundled with <a href="https://google.github.io/adk-docs/">ADK</a> (the code-first dev kit) and Agent Studio (the visual one). If you&#8217;re building inside Google Cloud, you don&#8217;t have to design a session log or a memory store from scratch anymore. You wire an ADK agent into Memory Bank and Sessions, deploy onto Agent Runtime, and the persistence question is answered.</p><p>Notice how much this looks like the pattern Anthropic and Cursor describe, just unbundled into named services with SLAs. Three years ago you&#8217;d have built all of this yourself. <strong>Now you pick which version of &#8220;decoupled brain, hands, and session&#8221; you want to rent.</strong></p><div><hr></div><h2><strong>Five patterns for long-running agents in production</strong></h2><p>Shubham Saboo and I <a href="https://x.com/GoogleCloudTech/status/2046989964077146490">wrote up</a> five design patterns we&#8217;ve seen separate working long-running agents from demos. They aren&#8217;t Google-specific, but they map cleanly onto the primitives Agent Runtime now exposes, so it&#8217;s worth walking through them here in shortened form.</p><p><strong>Checkpoint-and-resume.</strong> The most common multi-day failure is context loss. An agent processes 200 documents over four hours, hits an error on document 201, and without a checkpoint you start from scratch. Treat the agent like a long-running server process: write intermediate state to disk, checkpoint every N units of work, recover from failures. The Agent Runtime sandbox gives you a persistent filesystem, but choosing the right checkpoint granularity (not every step, not only the end) is on you.</p><p><strong>Delegated approval (human-in-the-loop).</strong> Most &#8220;human-in-the-loop&#8221; implementations are: serialize state to JSON, fire a webhook, hope someone responds. The state goes stale, the notification gets buried, the agent re-deserializes into a slightly different world. Long-running runtimes let the agent pause in place with full execution state intact: reasoning chain, working memory, tool history, pending action. Hours of human time pass, the agent consumes zero compute, and it resumes with sub-second latency. Mission Control is Google&#8217;s inbox for this. The pattern works regardless of vendor.</p><p><strong>Memory-layered context.</strong> A seven-day agent needs more than session state. Memory Bank handles long-term curated memory, Memory Profiles add low-latency lookups, and the failure mode you&#8217;ll hit in production is <strong>memory drift</strong>: the agent learns a procedural shortcut from a few atypical interactions and starts applying it broadly. <strong>Govern memory like you govern microservices.</strong> Agent Identity controls who can read and write which banks. Agent Registry tracks which version of which agent is running. Agent Gateway enforces policy on the wire. The auditing question stops being &#8220;what are my agents doing?&#8221; and becomes &#8220;what are my agents remembering, and how is that changing their behavior?&#8221;</p><p><strong>Ambient processing.</strong> Not every long-running agent talks to a human. Some sit on a Pub/Sub stream or a BigQuery table and act on events as they arrive: content moderation, anomaly detection, inbox triage. The architectural decision worth making early is to not hardcode policy into the agent. Define it in the Gateway and the fleet picks up policy changes without redeploys. Ambient agents run unsupervised for long stretches, and the only sane way to update a hundred of them is to update the policy layer once.</p><p><strong>Fleet orchestration.</strong> In real systems, you rarely have one agent. A coordinator delegates sub-tasks to specialists (a Lead Researcher Agent, a Scoring Agent, an Outreach Agent), each running independently for different durations. Each specialist gets its own Identity (so the Outreach Agent can&#8217;t read financial data meant for Scoring), its own policy enforcement, its own Registry entry. This is the same coordinator/worker shape distributed systems have used for decades. What&#8217;s new is that ADK handles it declaratively with graph-based workflows, and a bad deployment in one specialist doesn&#8217;t cascade to the others.</p><p>The patterns compose. A compliance system might use checkpointing for document processing, delegated approval for review gates, memory layering for cross-session knowledge, and fleet orchestration to coordinate the specialists. The opening question is always the same: <em>what&#8217;s the longest uninterrupted unit of work your agent needs to perform?</em> Minutes, and you don&#8217;t need long-running agents. Hours or days, and these patterns are where to start. The <a href="https://x.com/GoogleCloudTech/status/2046989964077146490">full write-up with code samples</a> covers each pattern in depth.</p><div><hr></div><h2><strong>So how do you actually build one today?</strong></h2><p>This is the practical question and it has a different answer depending on what you&#8217;re building.</p><p><strong>You&#8217;re a developer who wants long-running coding work on your own repo.</strong> Just use <a href="https://addyosmani.com/blog/agent-harness-engineering/">Claude Code</a> (or Antigravity, Cursor, or Codex). The harness is already there. Treat your <code>AGENTS.md</code> like a pilot&#8217;s checklist: short, every line earned by a real failure. Add hooks for typecheck and lint that surface failures back to the agent. Write a plan file before the agent starts. Use <a href="https://addyosmani.com/blog/self-improving-agents/">the Ralph loop</a> when the agent claims it&#8217;s done and you don&#8217;t believe it. For multi-hour or overnight jobs, run in a worktree so a closed laptop doesn&#8217;t kill the run, and have it commit progress every meaningful unit of work. <strong>This is the path most people should take, and it&#8217;s where the most leverage is right now.</strong></p><p><strong>You&#8217;re building a hosted agent product.</strong> Don&#8217;t build the runtime. Pick a managed one. The three real options today: <a href="https://cloud.google.com/products/gemini-enterprise-agent-platform">Google&#8217;s Agent Platform</a> (Agent Engine + Memory Bank + Sessions), <a href="https://platform.claude.com/docs/en/managed-agents/overview">Claude Managed Agents</a>, or roll something on top of <a href="https://google.github.io/adk-docs/">ADK</a>, the <a href="https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk">Claude Agent SDK</a>, or <a href="https://platform.openai.com/docs/codex">Codex SDK</a> and host it yourself. The trade-off is the usual one. Managed gets you the brain/hands/session split, observability, identity, and an audit trail out of the box. Self-hosted gets you control and the ability to use weird models for weird roles (Cursor&#8217;s pattern). For most teams, the right starting point is a managed runtime plus your own ADK or SDK code for the actual loop.</p><p><strong>You&#8217;re doing something autonomous and operational</strong> (monitoring, research, ops). Memory Bank-style persistence is what you want, and it&#8217;s the part that doesn&#8217;t exist in Claude Code. ADK + Memory Bank + Cloud Run + Cloud Scheduler is the cleanest stack I&#8217;ve seen for &#8220;agent runs every N hours, accumulates state, alerts on a threshold.&#8221; This is also where Cursor&#8217;s planner/worker/judge split starts to matter more than it does for IDE coding, because the work is genuinely parallel and the failure modes are different.</p><p>A few things matter regardless of which path you take.</p><p><em>Write down the done-condition before the agent starts.</em> This is the single highest-leverage move for long runs. The Anthropic harness post calls it the feature list; Cursor calls it the planner&#8217;s task spec. Either way, it&#8217;s an external file with explicit, testable completion criteria, and it exists so the agent can&#8217;t quietly redefine <em>done</em> mid-run.</p><p><em>Separate the evaluator from the generator.</em> Self-grading is the failure mode. A planner / worker / judge pipeline, or a generator / evaluator pair, is a real architectural pattern not a stylistic preference. Even if it&#8217;s the same model in different roles with different prompts.</p><p><em>Invest in the session log, not just the prompt.</em> The append-only event log is what makes the agent recoverable, debuggable, and auditable. If you can&#8217;t reconstruct what the agent did in the last 24 hours from durable storage, what you have is a long-running shell script that happens to call an LLM, not a long-running agent.</p><p><em>Treat compaction and context resets as first-class.</em> Anthropic is explicit that summarization-as-compaction wasn&#8217;t enough for very long jobs; they had to do full context resets where the harness tears the session down and rebuilds it from a structured handoff file. It is essentially how humans onboard a new engineer.</p><div><hr></div><h2><strong>There are some real limitations right now</strong></h2><p>A few things are still genuinely unsolved.</p><p><strong>Cost.</strong> A 24-hour run with a frontier model and a few tools is not cheap. Without budgets, circuit breakers, and a hard cap on tool spend, an agent can quietly burn through a week&#8217;s API budget in an afternoon. This is solvable, but it&#8217;s an explicit step you have to take.</p><p><strong>Security.</strong> A long-running agent with API keys, cloud access, and the ability to run shell commands has a much larger attack surface than a chat session. The brain/hands separation pattern matters here too: credentials should be unreachable from the sandbox where model-generated code runs, which is one of the benefits Anthropic calls out for Managed Agents.</p><p><strong>Alignment drift.</strong> Over many context windows, agents drift. The original goal gets summarized, then re-summarized, then loses fidelity. This is the part hooks and judges exist to defend against. It is also the most common reason &#8220;the agent went off and did something I didn&#8217;t ask for.&#8221;</p><p><strong>Verification.</strong> Auditing 24 hours of autonomous activity is a real human-time problem. Observability and structured artifacts (PRs, commits, briefings, test runs) are how you make this tractable. Without them, you&#8217;re scrolling logs and you&#8217;ll miss what matters.</p><p><strong>The human role.</strong> Defining work crisply enough that an agent can run for a day on it is harder than doing the work yourself. The skill that&#8217;s appreciating in value isn&#8217;t writing code. It&#8217;s writing specs that survive contact with an autonomous executor.</p><div><hr></div><h2><strong>Where this is going</strong></h2><p>Google, Anthropic, and Cursor have converged on roughly the same shape. <strong>Separate the model loop from the execution sandbox from the durable session log. Split planning from generation from evaluation. Bake in compaction, hooks, and context resets. Expose memory as a managed service that any agent invocation can query.</strong></p><p>Surface area is what differs. Google&#8217;s Agent Platform is the enterprise-stack version, with the identity and audit trail story baked in. The patterns underneath are the same. Claude Managed Agents is &#8220;Anthropic&#8217;s harness, hosted.&#8221; Cursor&#8217;s background agents are &#8220;long-running coding, pulled out of the IDE and into the cloud.&#8221;</p><p>The harder problems for the next year aren&#8217;t in any of those layers individually. They&#8217;re in the coordination above them. Many long-running agents on a shared codebase. Agents that read their own traces and patch their own harnesses. Harnesses that assemble tools and context just-in-time for a task instead of being pre-configured at startup. That&#8217;s where the agent stops looking like a smarter chat window and starts looking like a colleague who&#8217;s been on the project longer than you have.</p><p>The model is still load-bearing. But the gap between a chat window and an agent you can leave running overnight is mostly in the state, sessions, and structured handoffs wrapped around it. That&#8217;s where I&#8217;d spend my learning time right now.</p><p><em>You might be interested in checking out some of my O&#8217;Reilly books such as <a href="https://beyond.addy.ie/">Beyond Vibe Coding</a>, <a href="https://www.oreilly.com/library/view/the-effective-software/9798341638167/">The Effective Software Engineer</a> or <a href="https://www.oreilly.com/library/view/web-performance-engineering/9798341660182/">Web Perf engineering in the age of AI</a>.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FqTC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FqTC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FqTC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FqTC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FqTC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FqTC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41180,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/195959711?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FqTC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FqTC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FqTC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FqTC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf57c226-c3eb-4c62-86c4-083008b5f2f1_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[The Agent Stack Bet]]></title><description><![CDATA[The bet every serious developer needs to make on on their agent stack]]></description><link>https://addyo.substack.com/p/the-agent-stack-bet</link><guid isPermaLink="false">https://addyo.substack.com/p/the-agent-stack-bet</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Sat, 18 Apr 2026 17:16:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!w6mN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Peek under the hood of most &#8220;production agents&#8221; shipping today and you won&#8217;t find intelligence. You&#8217;ll find custom plumbing, fragile session logic, shared service accounts, and a security model held together by hope. This can be so much better.</p><p>If you&#8217;ve spent the last 18 months putting agents into production, you already know the models and tools have gotten <em>dramatically</em> better. You also know the problems that are still burning your on-call rotation are not problems you can prompt your way out of. We are running into a <strong>stack ceiling</strong>, and it is quietly creating a <strong>governance</strong> and <strong>reliability gap</strong> that the next generation of agentic systems cannot grow through.</p><p>Right now the industry is living with what I&#8217;d call <em>excessive agency</em>: <strong>autonomous systems given broad permissions to get things done</strong>, then left to discover - at runtime, in production - that a schema drifted, an API changed, or a downstream service started returning PII it wasn&#8217;t supposed to. Agents mark tasks &#8220;complete&#8221; while leaving a trail of corrupted state behind them. The humans find out on Monday.</p><p>This is not a failure of the people building agents. It is a failure of the stack they&#8217;re building on.</p><p>Here are the four architectural bets I think every serious team has to make in the next twelve months.</p><h2><strong>1) Agents need identities, not shared credentials</strong></h2><p>Every engineer who has shipped agents to production knows this specific flavor of dread: you have agents doing useful work, and effectively zero visibility into which tools they touched, which data they moved, or which credentials they used to do it. I call this <em>governance debt</em> - the silent accumulation of security and audit risk that eventually forces a full rewrite, usually right after the first incident that reaches the CISO.</p><p>The root cause is that most agents today are ghosts. They don&#8217;t have identities. They borrow a service account, inherit a human&#8217;s OAuth token, and &#8220;promise&#8221; - in application code, in a prompt - to stay inside the lines. In a real enterprise environment, a promise in a prompt is not a policy.</p><p><strong>My bet is that agent identity has to move from the application layer down into the platform layer.</strong> </p><p>The difference is between bolted-on vs. embedded security. Bolted-on looks like middleware in front of every tool call, politely asking the agent to behave: easy to bypass, expensive in latency, and invisible to your existing IAM. Embedded looks like a badge reader welded into a steel frame. The agent has a distinct, unforgeable identity recognized at the network and platform level, and policy is enforced at the source. If the agent reaches for a database it isn&#8217;t cleared for, the connection never opens. No middleware, no vibes.</p><p>Done right, this turns &#8220;a fleet of liabilities&#8221; into something that looks a lot more like a managed workforce: every action attributable, every permission auditable, every agent revocable with one call.</p><h2><strong>2) Agents need universal context, not scraped windows</strong></h2><p>Context management is a tax every builder is currently paying. Teams are burning a huge share of their engineering hours (and tokens) on undifferentiated plumbing - custom serialization, bespoke session stores, hand-rolled memory layers - just to keep an agent from forgetting its mission halfway through a multi-step task.</p><p>Worse, the context agents <em>can</em> get their hands on is usually siloed. A browser-based agent can see the open tab. A desktop wrapper can see the files a user happened to drag in. Neither of them can easily reason across the systems where the business actually lives - the CRM, the ERP, the data warehouse, the ticketing system, the transcripts, the project plans - at the same time.</p><p><strong>Agents need universal context that integrates at the platform level.</strong> If we don&#8217;t fix this, we should be honest that the ceiling of agentic AI is &#8220;slightly better spreadsheet autocomplete,&#8221; and we should stop writing vision pieces about it.</p><h1><strong>3) Agents need to survive your laptop closing</strong></h1><p>Here&#8217;s the uncomfortable version of this: a lot of what ships today as &#8220;an agent&#8221; isn&#8217;t yet ready to deploy across a business. </p><p>I want to be precise, because the frontier has genuinely moved in the last six months. Environments like Claude Code, OpenClaw, and similar platforms are capable - persistent task state, scheduled execution, multi-agent coordination, and long-running sessions that survive disconnects are no longer aspirational. These are not toys. The question has moved on.</p><p>The question now is whether an agent can run for a week instead of an hour. Whether it can cross three handoffs, two credential rotations, and an approval gate without a human babysitting the session. Whether the work it did on Tuesday is auditable on Friday by someone who wasn&#8217;t in the room. A session that survives a dropped WebSocket is table stakes. A mission that survives a quarter is the bar enterprises actually need.</p><p>Real work doesn&#8217;t fit in a session, and most of it doesn&#8217;t fit in a day either. A procurement workflow spans weeks and a dozen handoffs. A compliance audit runs for a month. An incident investigation outlives three on-call rotations. </p><p><strong>Most agents today hit a hard ceiling - sometimes time-based, sometimes token-based, sometimes governance-based - and when they hit it, the mission fails and a human picks up the pieces from wherever the transcript ended.</strong></p><p>Enterprise-grade autonomy requires durable, cloud-native execution with a much higher floor than &#8220;the session stayed up.&#8221; Concretely, that means:</p><ul><li><p><strong>State</strong> and <strong>checkpointing</strong> that survives restarts, disconnects, redeploys, and model version changes by default - not bolted on with a local Redis and a prayer.</p></li><li><p><strong>Context that outlives the window</strong>: long-horizon memory, summarization, and handoff between agent instances, so a multi-week task doesn&#8217;t die because a single run exhausted its tokens.</p></li><li><p><strong>Missions that outlive sessions</strong>: agents that stay on the job across days, handoffs, and credential rotations, with an auditable trail of what happened while you were asleep.</p></li><li><p><strong>First-class human-in-the-loop primitives,</strong> so the agent can pause and ask for permission to do something new instead of silently deciding it has the authority.</p></li></ul><p>Persistence with guardrails. That&#8217;s the bar. Anything less and you&#8217;re building demos that happen to run for a long time.</p><h1><strong>4) Agents need platforms</strong></h1><p>The pattern I see most often in strong teams is the saddest one: brilliant engineers draining their bandwidth into stack problems that do not differentiate their product. Custom memory. Bespoke eval harnesses. Homegrown observability. Handwritten retry logic. A tracing system that almost works. None of this is the hard part of the agentic era, and none of it is what your users are paying you for.</p><p>The real value lives in domain reasoning and business logic - the judgment calls that are specific to your company, your customers, your regulatory environment. Everything underneath should be the platform you <em>build on</em>, not the plumbing you <em>build</em>.</p><p>This is why the maturation of open primitives matters right now. Open-source orchestration frameworks exist precisely so the scaffolding isn&#8217;t locked behind any single vendor&#8217;s roadmap. The model that worked for cloud compute, containers, and CI/CD - start local on open primitives, graduate to a managed platform when you&#8217;re ready to scale - is the model agent platforms need to copy. </p><p><strong>Teams should be able to prototype on their laptop with the same building blocks they&#8217;ll run in production, and cross that boundary without a rewrite.</strong></p><p>That&#8217;s the engineering standard that lets teams stop fighting plumbing and get back to the product.</p><h2><strong>The five-year horizon</strong></h2><p>The teams that pull ahead in the next five years will not pull ahead by being smarter at writing boilerplate. They&#8217;ll pull ahead by <strong>choosing the right agent foundation</strong> and spending their engineering hours on the problems <em><strong>only they can solve</strong></em>.</p><p>Every month spent rebuilding the common stack - identity, context, persistence, orchestration - is a month not spent on the logic that actually makes your agents worth deploying. </p><p><strong>The agent stack has to become a solved problem.</strong> The only real question is whether you want to solve it yourself, again, or build on a foundation that was engineered for agents from the ground up.</p><p>My bet is on the latter. I think yours should be too.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w6mN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w6mN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 424w, https://substackcdn.com/image/fetch/$s_!w6mN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 848w, https://substackcdn.com/image/fetch/$s_!w6mN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!w6mN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w6mN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg" width="1456" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85329,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/194581773?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w6mN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 424w, https://substackcdn.com/image/fetch/$s_!w6mN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 848w, https://substackcdn.com/image/fetch/$s_!w6mN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!w6mN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618c5adc-46c0-4142-9254-4ed4c5ab0eca_2556x1632.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Is the IDE dead?]]></title><description><![CDATA[How Agent orchestration is replacing the editor as the center of developer work]]></description><link>https://addyo.substack.com/p/death-of-the-ide</link><guid isPermaLink="false">https://addyo.substack.com/p/death-of-the-ide</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Fri, 20 Mar 2026 14:31:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wgTu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>The </strong><em><strong>center</strong></em><strong> of developer work is moving.</strong> Not disappearing - moving. Away from continuous, line-by-line editing inside a single window, and <strong>toward supervising agents</strong> that can plan, rewrite files, run tests, and propose changes for review. IDEs as we know them may stop being the primary tool for software work, or heavily evolve.</p><p>Across the tools many developers including myself are already using daily - <a href="https://conductor.build/">Conductor</a>, <a href="https://code.claude.com/docs/en/claude-code-on-the-web">Claude Code Web</a>, <a href="https://github.com/copilot/agents">GitHub Copilot Agent</a>, <a href="http://jules.google">Jules</a>, <a href="https://www.vibekanban.com/">Vibe KanBan</a>, even cmux - the same shift keeps showing up: <strong>the control plane is becoming the primary surface, and the editor is becoming one of several instruments underneath it.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Av7X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Av7X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Av7X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Av7X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Av7X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Av7X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:466191,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/191542117?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Av7X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Av7X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Av7X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Av7X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbfdde15-b9fc-4cf8-a399-5769c44274e7_2400x1350.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Cursor just shipped <a href="https://cursor.com/glass">Glass</a> - a new interface explicitly built to make &#8220;working with agents clear, intuitive, and in your control&#8221; where agent management is the primary experience and the traditional editor is something you reach for when you need to go deeper. The <a href="https://x.com/F2aldi/status/2034801927041818823">reaction</a> from developers was immediate: </p><blockquote><p><em>Now Cursor feels more like an Agent Orchestrator than an IDE. Managing agents in parallel is easier</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P0AV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P0AV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!P0AV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!P0AV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!P0AV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P0AV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:619387,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/191542117?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P0AV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!P0AV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!P0AV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!P0AV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48500cf2-98ed-4280-8ae9-25e3ae8d0669_1920x1080.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But Glass is one data point in a much larger pattern. Terminal UIs like <a href="https://cmux.com/">cmux</a> highlight how the surfaces we&#8217;re used to are evolving to better manage agent workflows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BGo7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BGo7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 424w, https://substackcdn.com/image/fetch/$s_!BGo7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 848w, https://substackcdn.com/image/fetch/$s_!BGo7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 1272w, https://substackcdn.com/image/fetch/$s_!BGo7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BGo7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png" width="1456" height="843" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;cmux terminal app screenshot&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="cmux terminal app screenshot" title="cmux terminal app screenshot" srcset="https://substackcdn.com/image/fetch/$s_!BGo7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 424w, https://substackcdn.com/image/fetch/$s_!BGo7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 848w, https://substackcdn.com/image/fetch/$s_!BGo7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 1272w, https://substackcdn.com/image/fetch/$s_!BGo7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936597fd-b43d-4b9e-8e0c-f6ae8aaa9929_3840x2224.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>From editing files to steering workstreams</h2><p>Historically, IDEs optimized for a tight inner loop: open files &#8594; edit &#8594; build &#8594; debug &#8594; repeat. <strong>The &#8220;death&#8221; argument is that this loop is no longer the dominant unit of productivity once agents can execute most of it autonomously.</strong></p><p>The new loop looks like this: <strong>specify intent &#8594; delegate &#8594; observe &#8594; review diffs &#8594; merge</strong>. What makes it different from &#8220;autocomplete with a chat window&#8221; is tool-using autonomy combined with interfaces designed to make that autonomy governable.</p><p>You can see this playing out across tools already in heavy use. Claude Code Web (or Desktop) and Codex let developers hand off well-defined tasks to agents running in isolated cloud environments, with progress visible in a browser - no terminal, no local setup required. </p><p>GitHub Copilot&#8217;s Agents plan and implements multi-file changes independently, creates branches, runs tests, and surfaces a PR for review; the developer&#8217;s primary job becomes reviewing the outcome and iterating, not directing each step. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L3L8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L3L8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 424w, https://substackcdn.com/image/fetch/$s_!L3L8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 848w, https://substackcdn.com/image/fetch/$s_!L3L8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 1272w, https://substackcdn.com/image/fetch/$s_!L3L8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L3L8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png" width="549" height="302.779532967033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:803,&quot;width&quot;:1456,&quot;resizeWidth&quot;:549,&quot;bytes&quot;:549881,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/191542117?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L3L8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 424w, https://substackcdn.com/image/fetch/$s_!L3L8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 848w, https://substackcdn.com/image/fetch/$s_!L3L8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 1272w, https://substackcdn.com/image/fetch/$s_!L3L8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f9eef48-c593-4a33-a05c-cd2602ef85ff_3018x1664.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Conductor takes a different approach: a desktop app for running multiple Claude Code agents simultaneously in isolated workspaces, with live progress monitoring across all of them. And Google&#8217;s Jules handles asynchronous background tasks - you assign work, it runs, you review the result when it&#8217;s done. </p><p>What these tools share is a mental model: <strong>the agent is the unit of work, not the file</strong>. The interface worth optimizing is the one that helps you direct, monitor, and review agents - not the one that helps you type faster.</p><div><hr></div><h2>The orchestration layer taking shape</h2><p>The displacement story becomes persuasive only when you look at the specific interface patterns converging across tools.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uidv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uidv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 424w, https://substackcdn.com/image/fetch/$s_!Uidv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 848w, https://substackcdn.com/image/fetch/$s_!Uidv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 1272w, https://substackcdn.com/image/fetch/$s_!Uidv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uidv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif" width="562" height="398.72664835164835" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1033,&quot;width&quot;:1456,&quot;resizeWidth&quot;:562,&quot;bytes&quot;:136737,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/191542117?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uidv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 424w, https://substackcdn.com/image/fetch/$s_!Uidv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 848w, https://substackcdn.com/image/fetch/$s_!Uidv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 1272w, https://substackcdn.com/image/fetch/$s_!Uidv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9d9a9-4541-4381-b6b8-14d07e95a2c2_2500x1774.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Work isolation as a primitive.</strong> Parallel agents need to not step on each other. Virtually every serious tool in this space has landed on git worktrees (or similar) as the answer. Conductor maps each agent session to its own isolated workspace. Vibe Kanban (shown above) does the same for its kanban-driven agent workflow. The pattern is near ubiquitous because the problem is real: without isolation, parallel agents produce chaos.</p><p><strong>Planning and task state as the primary UI.</strong> Tools like Vibe Kanban have replaced &#8220;tabs and files&#8221; with &#8220;tasks and states&#8221; as the top-level mental model. You create task cards (a landing page, a backend service, an email integration), assign each to an agent and a model, and manage the whole effort like a lightweight project board - except the &#8220;team&#8221; is running autonomously. This is a project management surface that happens to have agents doing the implementation.</p><p><strong>Background agents and async-first design.</strong> Some of the most interesting tools in this space don&#8217;t even try to keep you in the loop during execution. Cursor, Copilot and Antigravity support background agents that run without requiring your presence - you define intent, step away, and review when they&#8217;re done. Jules works similarly: assign a task, come back to a diff. The implicit promise is that your attention is too valuable to spend watching a progress bar. That&#8217;s a significant departure from the IDE&#8217;s real-time, synchronous feedback loop.</p><p><strong>Attention management for parallel agents.</strong> When many agents run concurrently, the real bottleneck becomes knowing which one needs you <em>right now</em>. This is why tools like Conductor surface live progress across sessions and cmux introduced notification rings and unread badges for terminal panes. &#8220;Agent needs attention&#8221; is becoming a first-class event in the developer environment - something to route and triage, not just notice.</p><p><strong>Agents embedded into the software lifecycle.</strong> GitHub&#8217;s Copilot coding agent is asynchronous, secured by a control layer, and powered by GitHub Actions - attached to how code actually ships (issues &#8594; PRs &#8594; CI &#8594; merge), not just how it gets written. </p><p>None of these tools claim IDEs are obsolete - many still interoperate with them. But the repeated patterns (parallel workspaces, diff-first review, task state, background execution, lifecycle integration) are precisely what &#8220;death of the IDE&#8221; proponents mean when they talk about a center-of-gravity shift.</p><div><hr></div><h2>Why developers still reach for an IDE</h2><p><strong>The best critique of &#8220;the IDE is dead&#8221; is that the IDE </strong><em><strong>still</strong></em><strong> compresses several genuinely hard problems into a high-fidelity feedback loop</strong>: precise navigation, local reasoning, interactive debugging, and the ability to <em>understand</em> a system by directly manipulating it.</p><p>Even the most ambitious orchestration tools keep a manual-edit escape hatch. For example, reviewing diffs in-thread, commenting on changes, and then opening the result in your editor for manual adjustments. That&#8217;s an acknowledgment that human intervention is part of the intended workflow.</p><p>Agent tooling itself highlights where the limits still are. Multi-file refactorings in large repositories remain among the toughest challenges for software engineering agents. These are exactly the situations where interactive code navigation and human judgment still matter most - where you need to hold a mental model of the system that the agent can&#8217;t fully reconstruct from context alone.</p><p>The failure mode that keeps developers anchored to IDE-level inspection is agents being <em>almost</em> right. When something is 90% correct and subtly broken, the cost of finding the issue often exceeds what it would have taken to write it yourself. For high-stakes changes, the IDE remains the best instrument for that kind of deep, precise inspection.</p><div><hr></div><h2>The new costs: review fatigue and governance overhead</h2><p>If development becomes &#8220;run many agents in parallel&#8221; the workflow inherits problems that look less like text editing and more like distributed systems management - observability, permissions, isolation, and governance.</p><p>Agent workflows invert the labor. Instead of writing, you&#8217;re reviewing. That sounds like an improvement until you&#8217;re staring at twelve diffs from twelve parallel agents at the end of the day. Review fatigue is real, and it&#8217;s one of the reasons the most thoughtful tools in this space focus on attention routing, structured plans, and review-first gates rather than pushing for full autonomy by default.</p><p>The security surface also expands as agents gain access to more tools, repos, and external systems. As agents can browse the web, query databases, write to filesystems, and trigger deploys, what they&#8217;re <em>allowed</em> to do becomes as important as what they&#8217;re <em>capable</em> of doing.</p><p>On observability and control, IDE-integrated agent modes are already pushing toward explicit tool logs and approval gates. The governance question isn&#8217;t optional once agents act asynchronously and touch CI pipelines.</p><div><hr></div><h2>What survives: the IDE, the control plane, or both</h2><p>A clear reading of the landscape is that &#8220;death of the IDE&#8221; is directionally right about the <em>center of gravity</em>, but wrong as a literal forecast.</p><p>The strongest version of the claim is this: <strong>the IDE stops being the primary workspace and becomes one of several subordinate instruments</strong> - used for targeted inspection, debugging, and final edits - while planning, orchestration, review, and agent management move into dashboards, issue trackers, observability terminals, and cloud control planes. </p><p>The &#8220;bigger IDE&#8221; framing is equally well-supported. The new &#8220;IDE&#8221; is a system that provides multi-agent orchestration, isolated workspaces, permissions and audit logs, diff-first review, reliable tool connectivity, and attention routing. <strong>The file editor is still there. It&#8217;s just no longer the front door.</strong></p><p>The IDE isn&#8217;t dying. It&#8217;s being <em>de-centered</em>. The work is moving outward - into orchestration surfaces where humans define intent, delegate to parallel agent runtimes, and spend more time supervising, reviewing, and governing than typing. </p><p><strong>The IDE remains critical for correctness, comprehension, and the hard problems agents still struggle with. But its no longer the only place where programming happens - and for a growing number, it&#8217;s no longer the first place they go.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wgTu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wgTu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wgTu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wgTu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wgTu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wgTu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/191542117?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wgTu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wgTu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wgTu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wgTu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368a63a-b3fc-4358-a6c5-57f2a33c6fd8_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[14 More lessons from 14 years at Google]]></title><description><![CDATA[This time about teams, trust, and the systems around the code.]]></description><link>https://addyo.substack.com/p/14-more-lessons-from-14-years-at</link><guid isPermaLink="false">https://addyo.substack.com/p/14-more-lessons-from-14-years-at</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Thu, 12 Feb 2026 15:30:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4cMX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A while back, I wrote down <a href="https://addyo.substack.com/p/21-lessons-from-14-years-at-google">21 lessons from my time at Google</a>. The response caught me off guard because of <em>which</em> ones stuck. It wasn&#8217;t tech-specific advice. It was the stuff about people, decisions, and the messy reality of building things together.</p><p>That made me realize I&#8217;d left a lot on the table. The first list skewed toward individual craft - how to write better code, how to think about your career. But some of the hardest lessons I&#8217;ve learned aren&#8217;t about how you work. They&#8217;re about how teams work: how decisions actually get made, where coordination breaks down, what separates the groups that ship from the ones that spin.</p><p>These lessons pick up where the first left off. They&#8217;re less about being a better individual engineer and more about the systems around the engineering.</p><h2><strong>1. The best engineers pick the right problems to solve.</strong></h2><p>Every yes is an implicit no to something else.</p><p>I&#8217;ve watched talented engineers burn out because they said yes to everything - every bug, every feature request, every &#8220;quick favor.&#8221; Their calendar filled up with other people&#8217;s priorities, and their own roadmap became a graveyard of half-finished ideas.</p><p>Sometimes it&#8217;s just because they truly do care so much about the product. Protect your bandwidth from &#8220;nice to have&#8221; the same way you protect production from outages. The skill is doing the right things and letting the wrong things stay undone.</p><p>The engineers who create disproportionate impact aren&#8217;t necessarily faster or smarter. They&#8217;re more ruthless about what deserves their attention. They&#8217;ve learned that the opportunity cost of working on the wrong thing is working on the wrong thing.</p><h2><strong>2. If you can&#8217;t say what decision you&#8217;re asking for, you&#8217;re not ready for the meeting.</strong></h2><p>Most meetings fail not because they&#8217;re unnecessary, but because they&#8217;re disguised journaling. I&#8217;ve sat through hundreds of hours where smart people talked around a problem without ever naming what they needed. The meeting ends with vibes and no owner.</p><p>I learned to start with the ask: approve, choose, unblock, or inform.</p><p>Just those four words changed how I prepare for every meeting. If I can&#8217;t pick one, I&#8217;m not ready to take anyone&#8217;s time. And when I&#8217;m on the receiving end, I&#8217;ve started asking &#8220;what decision do you need from me?&#8221; within the first two minutes. It sounds blunt, but people are usually relieved - they often didn&#8217;t realize they hadn&#8217;t defined it themselves.</p><p>The hidden cost of vague meetings isn&#8217;t just the hour you lose. It&#8217;s the week of drift that follows while everyone waits for clarity that never came.</p><h2><strong>3. &#8220;We should&#8221; is not a plan. &#8220;On Tuesday, I will&#8221; is a plan.</strong></h2><p>The difference between motion and progress is specificity.</p><p>Teams drown in intentions. I&#8217;ve watched roadmaps fill up with &#8220;we should improve the onboarding flow&#8221; and &#8220;we should reduce latency&#8221; and &#8220;we should document the API.&#8221; Months later, the same items are still there, gathering dust and guilt. You might think that&#8217;s a solved problem now that we have <a href="https://addyosmani.com/blog/agentic-engineering/">agentic engineering</a>, but not quite.</p><p>Convert talk into the smallest next action someone can actually do, then put a name and a date on it. Not &#8220;we should improve onboarding&#8221; but &#8220;On Tuesday, Sarah will run three user sessions and document the top friction points.&#8221;</p><p>This is about respecting that humans need traction to make progress. Vague intentions create anxiety. Specific commitments create momentum. The plan doesn&#8217;t have to be perfect - it just has to be concrete enough that someone can actually start.</p><h2><strong>4. Slow code is sometimes a symptom. Slow decisions are always a problem.</strong></h2><p>Speed is about removing the friction that makes smart people hesitate. &#8220;Bias towards action&#8221; when you can.</p><p>When a project drags, the instinct is to blame velocity: people aren&#8217;t working hard enough, the codebase is messy, there aren&#8217;t enough engineers. But in my experience, slow code is often a symptom. Slow decisions are the disease.</p><p>If decisions routinely take weeks or months, look deeper. Missing context means people can&#8217;t evaluate tradeoffs. Unclear ownership means everyone&#8217;s waiting for someone else to decide. Fear of accountability means people hedge instead of commit.</p><p>The fastest engineering team I ever worked with wasn&#8217;t the one with the best programmers. It was the one where decisions happened in hours instead of weeks because the authority was clear, the context was shared, and being wrong wasn&#8217;t a career risk.</p><h2><strong>5. Reliability is a product feature. Treat it like one.</strong></h2><p>Users don&#8217;t praise reliability but they do notice its absence.</p><p>This creates a dangerous dynamic: reliability work is invisible until it fails, which means it&#8217;s perpetually under-resourced compared to shiny new features.</p><p>Error budgets are one way to make the tradeoff explicit. If your service has an SLO of 99.9% uptime, you have a &#8220;budget&#8221; of 0.1% downtime to spend on innovation. Burn through it, and you focus on reliability until you&#8217;ve earned it back. This is a framework for having honest conversations about risk.</p><p>The teams that maintain both velocity and reliability don&#8217;t do it through heroics. They do it by treating reliability as a first-class product feature with its own roadmap, its own metrics, and its own advocates. </p><p>You wouldn&#8217;t ship a feature without product review. Don&#8217;t ship a system without some kind of reliability discussion.</p><h2><strong>6. You can&#8217;t &#8220;communication&#8221; your way out of a bad interface between teams.</strong></h2><p>Team interaction modes exist for a reason: collaboration (working closely together), service (clear API and SLAs), or facilitation (one team helping another build capability). </p><p>Most cross-team pain isn&#8217;t about effort or good intentions. It&#8217;s about unclear boundaries and messy contracts. I&#8217;ve watched teams &#8220;improve communication&#8221; by adding more meetings, more Slack channels, more syncs - and it doesn&#8217;t make things better.</p><p>The problem isn&#8217;t that people aren&#8217;t talking. It&#8217;s that the interface between teams is undefined. Who owns what? What&#8217;s the contract? What can team A depend on team B for, and vice versa?</p><p>Choose deliberately, and you&#8217;ll need fewer meetings to make things work. Try to paper over a bad interface with communication, and you&#8217;ll burn out your most collaborative people while the underlying dysfunction remains.</p><h2><strong>7. The best escalation comes with a proposal.</strong></h2><p>&#8220;Here&#8217;s the problem&#8221; is half the job. I used to think my role was to identify issues and bring them to leadership. That&#8217;s necessary but insufficient.</p><p>&#8220;Here are two options, the tradeoffs, and what I recommend&#8221; is how you get unblocked and earn trust. It shows you&#8217;ve done the thinking. It gives decision-makers something super specific to react to instead of an open-ended problem to solve.</p><p>It makes their job easier, which makes them more likely to give you what you need.</p><p>The difference between &#8220;I need help&#8221; and &#8220;I need you to choose between A and B, and here&#8217;s why I lean toward B&#8221; is the difference between being a problem-raiser and being a problem-solver.</p><p>Both identify issues. Only one earns increasing trust and autonomy.</p><h2><strong>8. Avoid hero culture. Build systems that don&#8217;t require heroes.</strong></h2><p>The hero is burned out, undocumented, and a single point of failure.</p><p>If one person saving the day is a recurring pattern, that&#8217;s a failure mode rather than a badge of honor. I&#8217;ve seen teams celebrate their heroes while ignoring the dysfunction that made heroism necessary.</p><p>When they leave - and they always leave eventually - the team discovers that no one else really knows how things work. The celebration of heroism masks a systemic problem: the path for &#8220;normal humans on a normal day&#8221; doesn&#8217;t work.</p><p>Make the normal path the default. Document the system. Spread the knowledge. Design for the average Tuesday, not the exceptional crisis. Heroes should be unnecessary, and if they&#8217;re necessary, you should be working to make them unnecessary.</p><h2><strong>9. Make observability part of the feature.</strong></h2><p>A feature without telemetry is a liability in disguise.</p><p>If you ship a feature without knowing how it behaves in production, you shipped uncertainty. </p><p>I&#8217;ve watched teams celebrate launches only to discover weeks later that their feature was silently failing for 20% of users. They had no logs, no metrics, no dashboards but a gap where understanding should be. This can cause all kinds of pain if you want to fix it, including unshipping just to properly A/B test with observability in place.</p><p>Logs, traces, dashboards, and alerts aren&#8217;t &#8220;ops work.&#8221; They&#8217;re how you learn. They&#8217;re how you know whether the thing you built actually works for real people doing real things in real conditions.</p><p>The best engineers I know treat observability as part of the definition of done. Not &#8220;I wrote the code&#8221; but &#8220;I wrote the code and I can see it working.&#8221;</p><h2><strong>10. Small PRs are kindness. Especially if the PR is AI generated.</strong></h2><p>Small changes are easier to review, easier to reason about, and easier to revert.</p><p>I used to write large pull requests. I liked the idea of a complete feature being reviewable at once. I was optimizing for my convenience at the expense of my reviewers&#8217; sanity. Smaller PRs are often better for everyone.</p><p>They ship faster because they don&#8217;t sit in a review queue while someone tries to find an hour to understand your thousand-line diff. If you want teammates to trust your pace, make your work reviewable.</p><p>The hidden benefit is that small PRs force you to think in increments. Instead of one monolithic change, you build up capability piece by piece. Each piece gets feedback. Each piece can be rolled back independently. It&#8217;s slower per-PR but faster to actual production.</p><h2><strong>11. When you add a team, you add edges, not just nodes.</strong></h2><p>Coordination cost grows faster than headcount.</p><p>This is why &#8220;just throw more people at the problem&#8221; often fails, and why adding heads late in a project can make it later. Every new person adds communication overhead with everyone they need to coordinate with. The graph gets denser, not just larger.</p><p>I&#8217;ve seen managers genuinely puzzled when a team doubled in size but output barely changed. The answer is always the same: the new edges ate the new capacity. More people meant more alignment meetings, more context-sharing, more waiting for decisions that now required more stakeholders.</p><p>The solution isn&#8217;t to stop hiring. It&#8217;s to be intentional about reducing edges. Clear ownership. Autonomous teams with minimal dependencies. Interfaces that let people work in parallel instead of in lockstep. The best organizations aren&#8217;t the ones with the most people - they&#8217;re the ones with the most leverage per person.</p><h2><strong>12. The migration is never just a migration</strong></h2><p>Every migration is a negotiation between the system you have, the system you want, and the people who didn&#8217;t ask for either.</p><p>I&#8217;ve seen migrations estimated at one quarter stretch to years. Not because the technical work was wrong, but because nobody accounted for the human work: convincing teams to prioritize your migration over their roadmap, supporting the long tail of edge cases nobody knew existed, and maintaining two systems in parallel while the old one refuses to die.</p><p>The technical plan is the easy part. The hard part is designing for coexistence. You will run old and new simultaneously for longer than you think. You will discover that the &#8220;legacy&#8221; system encodes decisions nobody documented and workflows nobody remembers designing but everyone depends on. You will need a adoption strategy that doesn&#8217;t require every team to drop what they&#8217;re doing at once.</p><p>The migrations that actually finish share three traits: a sponsor who stays engaged past the kickoff, a team that really owns the migration instead of treating it as a side quest, and a clear deprecation date that people believe is real. Without all three, you get a migration that&#8217;s perpetually &#8220;almost done&#8221; - which is worse than not starting, because now you&#8217;re paying the cost of two systems indefinitely.</p><p>If you&#8217;re not willing to fund the finish, don&#8217;t start the migration.</p><h2><strong>13. AI makes drafts cheap. Taste becomes expensive.</strong></h2><p>Everyone can generate code now. The barrier to producing code, content, designs - it&#8217;s largely collapsing. AI will write you ten versions of anything in the time it used to take to write one.</p><p>The differentiator is choosing: what to build, what to delete, what to simplify, what not to ship, and what &#8220;good&#8221; looks like. Taste - the ability to distinguish between options and pick the right one - becomes the scarce resource.</p><p>Use AI to explore options fast, then apply judgment ruthlessly. The engineers who thrive in this environment won&#8217;t be the ones who generate the most. They&#8217;ll be the ones who curate the best.</p><p>Production is cheap. Editing is expensive. Selection is everything.</p><h2><strong>14. Trust is a latency optimization for teams.</strong></h2><p>This is the highest-leverage thing you can build. Not a system but credibility.</p><p>When people trust you, they don&#8217;t need five meetings to approve a decision. They assume competence, good intent, and follow-through. Decisions that would take weeks in a low-trust environment take hours in a high-trust one.</p><p>Every time you deliver on a promise, every time you&#8217;re honest about a mistake, every time you make someone else&#8217;s life easier, you&#8217;re depositing into an account that will pay dividends for years.</p><p>I&#8217;ve watched engineers with modest technical skills accomplish enormous things because everyone trusted them. I&#8217;ve watched brilliant engineers accomplish little because nobody would take their calls.</p><p>The code doesn&#8217;t matter if you can&#8217;t get anyone to ship it with you.</p><h2><strong>A final thought</strong></h2><p>The first time around, I said these lessons come down to staying curious, staying humble, and remembering that the work is about people. I still believe that.</p><p>But if this second list has a through-line, it&#8217;s something more specific: the work is about making it easier for normal people to do extraordinary things on a normal day. A career in engineering gives you plenty of time to learn these things the hard way and I&#8217;ve certainly learned a lot during my time at Google so far.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IMBS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IMBS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IMBS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IMBS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IMBS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IMBS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg" width="308" height="410.49280270956814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1574,&quot;width&quot;:1181,&quot;resizeWidth&quot;:308,&quot;bytes&quot;:234411,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/187716319?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IMBS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IMBS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IMBS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IMBS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F797bdb7f-ae19-4038-a929-14265678f331_1181x1574.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I hope a few of them save you a scar or two. And if they do, share what you&#8217;ve figured out with someone earlier in the journey. </p><p>That&#8217;s how the good lessons travel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4cMX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4cMX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!4cMX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!4cMX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!4cMX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4cMX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1246393,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/187716319?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4cMX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!4cMX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!4cMX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!4cMX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f101eb-1d56-49a1-8b88-1c8890a86dbb_7838x7838.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[The 80% Problem in Agentic Coding]]></title><description><![CDATA[Managing comprehension debt when leaning on AI to code]]></description><link>https://addyo.substack.com/p/the-80-problem-in-agentic-coding</link><guid isPermaLink="false">https://addyo.substack.com/p/the-80-problem-in-agentic-coding</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Wed, 28 Jan 2026 17:20:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!lmAD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Andrej Karpathy&quot;,&quot;id&quot;:23972309,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6d0938b-93a9-4ead-933f-26da5da1bafc_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;17d7fb74-2769-43c2-9344-1afaa83fa8c8&quot;}" data-component-name="MentionToDOM"></span> said something this week that made me pause: </p><blockquote><p><strong>&#8220;I rapidly went from about 80% manual+autocomplete coding and 20% agents to 80% agent coding and 20% edits+touchups. I really am mostly programming in English now.&#8221;</strong></p></blockquote><p>The inversion happened over a few weeks in late 2025. While this may apply to new (greenfield) or personal projects more than existing or legacy apps, I imagine <strong>how far AI takes you is still further than a year ago. </strong>You can thank models, specs, skills, MCPs and our workflows improving. </p><p>Boris Cherney, creator of Claude Code, has recently echoed similar sentiments:</p><blockquote><p><strong>&#8220;Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don&#8217;t even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude. I think most of the industry will see similar stats in the coming months - it will take more time for some vs others.&#8221;</strong></p></blockquote><p>Some time ago I wrote about &#8220;<a href="https://addyo.substack.com/p/the-70-problem-hard-truths-about">the 70% problem</a>&#8221; - where AI coding took you to 70% completion, then leave the final 30% last mile for humans. That framing may now be evolving. The percentage may shift to 80% or higher for certain kinds of projects, but the nature of the problem changed more dramatically than the numbers suggest.</p><p>Armin Ronacher&#8217;s <a href="https://x.com/mitsuhiko/status/2010446141817844207">poll</a> of 5,000 developers compliments this story: 44% now write less than 10% of their code manually. Another 26% are in the 10-50% range. We&#8217;ve crossed a threshold. But here&#8217;s what the triumphalist narrative misses: the problems didn&#8217;t disappear, they shifted. And some got worse.</p><p><strong>I want to caveat: I&#8217;ve definitely felt the shift to 80%+ agent coding on new side-projects, however, this is </strong><em><strong>very</strong></em><strong> different in large or existing apps, especially where teams are involved. Expectations differ, but this is a taste of where we&#8217;re headed.</strong></p><h2>The mistakes changed</h2><p><strong>AI errors evolved from syntax bugs to conceptual failures - the kind a sloppy, hasty junior may make under time pressure.</strong></p><p>Karpathy catalogs what still breaks: </p><blockquote><p><strong>&#8220;The models make wrong assumptions on your behalf and run with them without checking. They don&#8217;t manage confusion, don&#8217;t seek clarifications, don&#8217;t surface inconsistencies, don&#8217;t present tradeoffs, don&#8217;t push back when they should. They&#8217;re still a little too sycophantic.&#8221;</strong></p></blockquote><p><strong>Assumption propagation</strong>: The model misunderstands something early and builds an entire feature on faulty premises. You don&#8217;t notice until you&#8217;re five PRs deep and the architecture is cemented. This is kind of two-steps-back pattern.</p><p><strong>Abstraction bloat</strong>: Given free rein, agents can overcomplicate relentlessly. They&#8217;ll scaffold 1,000 lines where 100 would suffice, creating elaborate class hierarchies where a function would do. You have to actively push back: &#8220;Couldn&#8217;t you just...?&#8221; The response is always &#8220;Of course!&#8221; followed by immediate simplification. They&#8217;re optimizing for looking comprehensive, not for maintainability.</p><p><strong>Dead code accumulation</strong>: They often don&#8217;t clean up after themselves. Old implementations linger. Comments get removed as side effects. Code they don&#8217;t fully understand gets altered anyway because it was adjacent to the task.</p><p><strong>Sycophantic agreement</strong>: They don&#8217;t always push back. No &#8220;Are you sure?&#8221; or &#8220;Have you considered...?&#8221; Just enthusiastic execution of whatever you described, even if your description was incomplete or contradictory.</p><p><strong>It&#8217;s possible to mitigate some of this via Skills if you know what to watch for.</strong></p><p>These otherwise persist despite system prompts, despite CLAUDE.md instructions, despite plan mode. They&#8217;re not bugs to be fixed - they&#8217;re sometimes inherent to how these systems work. </p><p><strong>Agents optimize for coherent output, not for questioning your premises.</strong></p><p>I've watched this happen on my own teams - code that looks right in review but breaks three commits later when someone touches an adjacent system. </p><p>If you&#8217;re data minded, recent <a href="https://www.sonarsource.com/blog/ai-coding-trust-gap/">survey data</a> suggests &#8220;verification bottleneck&#8221; has emerged: only 48% of developers consistently check AI-assisted code before committing it, even though 38% find that reviewing AI-generated logic actually requires more effort than reviewing human-written code. <strong>We&#8217;re generating correct code faster, but may be accumulating technical debt even faster.</strong></p><h2>Comprehension debt: a hidden cost we don&#8217;t track</h2><p><strong>Generation (writing code) and discrimination (reading code) are different cognitive capabilities. You can review code competently even after your ability to write it from scratch has atrophied. But there&#8217;s a threshold where &#8220;review&#8221; becomes &#8220;rubber stamping.&#8221;</strong></p><p><a href="https://x.com/jeremytwei/status/2015886793955229705">Jeremy Twei</a> coined the perfect term for this: <em>comprehension debt</em>. It&#8217;s certainly tempting to just move on when the LLM one-shotted something that seems to work. This is the insidious part. The agent doesn&#8217;t get tired. It will sprint through implementation after implementation with unwavering confidence. The code looks plausible. The tests pass (or seem to). You&#8217;re under pressure to ship. You move on.</p><p><strong>Over time, you may understand less of your own codebase.</strong></p><p>I caught myself doing this last week. Claude implemented a feature I&#8217;d been putting off for days. The tests passed. I skimmed it, nodded, merged. Three days later I couldn&#8217;t explain how it worked.</p><p>Yoko Li <a href="https://x.com/stuffyokodraws/status/2013373307291340870">captured</a> the addiction loop perfectly: </p><blockquote><p><strong>&#8220;The agent implements an amazing feature and got maybe 10% of the thing wrong, and you&#8217;re like &#8216;hey I can fix this if I just prompt it for 5 more mins.&#8217; And that was 5 hrs ago.&#8221;</strong></p></blockquote><p>You&#8217;re always <em>almost</em> there. The final 10% feels tantalizingly close. Just one more prompt. Just one more iteration. The psychological hook is real.</p><p>Someone <a href="https://news.ycombinator.com/user?id=vibeprofessor">else</a> put it differently: </p><blockquote><p>&#8220;I spend most of my time babysitting agents. The AGI vibes are real, but so is the micromanagement tax. You&#8217;re not coding anymore, you&#8217;re supervising. Watching. Redirecting. It&#8217;s a different kind of exhausting.&#8221;</p></blockquote><p><strong>The dangerous part: it&#8217;s trivially easy to review code you can no longer write from scratch.</strong> If your ability to &#8220;read&#8221; doesn&#8217;t scale with the agent&#8217;s ability to &#8220;output,&#8221; you&#8217;re not engineering anymore. You&#8217;re hoping.</p><h2>The productivity paradox: More code, same throughput</h2><p><strong>Individual output surged 98% in high-adoption teams, but PR review time increased anywhere as high as 91%. </strong></p><p>The data from <a href="https://www.faros.ai/blog/key-takeaways-from-the-dora-report-2025">Faros AI</a> and Google&#8217;s <a href="https://dora.dev/research/2025/dora-report/">DORA report</a> are interesting:</p><ul><li><p>Teams with high AI adoption merged 98% more PRs</p></li><li><p>Those same teams saw review times balloon 91%</p></li><li><p>PR size increased 154% on average</p></li><li><p>Code review became the new bottleneck</p></li></ul><p>Atlassian&#8217;s 2025 survey found the paradox in stark terms: 99% of AI-using developers reported saving 10+ hours per week, yet most reported <em>no decrease in overall workload</em>. The time saved writing code was consumed by organizational friction - more context switching, more coordination overhead, managing the higher volume of changes.</p><p><strong>We got faster cars, but the roads got more congested.</strong></p><p>We're producing more code but spending more time reviewing it. The bottleneck just moved. When you make a resource cheaper (in this case, code generation), consumption increases faster than efficiency improves, and total resource use goes up. </p><p>We&#8217;re not writing less code. We&#8217;re writing <em>vastly</em> more code, and someone still has to understand much of it. There are of course groups of developers who feel this should no longer be the case if AI can do that.</p><h2>Where the 80/20 split actually works</h2><p><strong>The 80% threshold is most accessible in greenfield contexts where you control the entire stack and comprehension debt stays manageable through small team size.</strong></p><p>This actually works in a few contexts. </p><ul><li><p>Personal projects where you control everything</p></li><li><p>MVPs where &#8220;good enough&#8221; is actually good enough</p></li><li><p>Startups in greenfield territory without legacy constraints</p></li><li><p>Teams small enough that comprehension debt stays manageable</p></li></ul><p>In these environments, the agent&#8217;s weaknesses matter less. You can scaffold rapidly, refactor aggressively, throw away code without political friction. The pace of iteration outweighs occasional misdirection.</p><p>In mature codebases with complex invariants, the calculus inverts. The agent doesn&#8217;t know what it doesn&#8217;t know. It can&#8217;t intuit the unwritten rules. Its confidence scales inversely with context understanding.</p><p>Someone pointed out the obvious thing I was tiptoeing around: the first 90% might be easy, but the last 10% can take a long time. 90% accuracy is fine for non-mission-critical stuff. For the parts that actually matter, it's nowhere close. Self-driving cars work great until they don't, and that's why L2 is everywhere but L4 is still mostly vaporware.</p><p>For non-engineers, the wall is lower but still real. Tools like AI Studio, v0 and Bolt can turn sketches into working prototypes instantly. But hardening that prototype for production - handling real user data at scale, ensuring security and compliance - still requires engineering fundamentals. AI gets you 80% to an MVP; the last 20% requires patience, learning deeply or hiring engineers.</p><h2>Two different populations</h2><p><strong>We&#8217;re not seeing a smooth curve of adoption - we&#8217;re seeing a split between those who&#8217;ve crossed the threshold and everyone else. The gap between early adopters and the rest is widening, not closing.</strong></p><p>Armin&#8217;s poll revealed what raw adoption numbers obscure: 44% of developers still write over 90% of their code manually. We have a bimodal distribution, not a bell curve. On one side: people like Karpathy and the Claude Code team, shipping dozens of PRs daily with 100% AI-written code, iterating faster than ever before. On the other: the vast majority, incrementally adopting copilot-style tools but not fundamentally changing their workflow.</p><p>The age split may be visible in discourse too. Younger developers seem more willing to adapt workflow radically. Older developers are more skeptical - not because they can't use the tools, but because they've seen enough cycles to know the difference between a temporary productivity boost and a sustainable practice. Both might be right.</p><p>Stack Overflow&#8217;s 2025 survey showed only 16% reported &#8220;great&#8221; productivity improvements. Half saw modest gains. The top frustrations: &#8220;AI solutions that are almost right, but not quite&#8221; (66%) and &#8220;debugging AI code takes longer than writing it myself&#8221; (45%).</p><p>The engineers who <em>appear</em> to be thriving in 2026 aren&#8217;t just using better tools. They&#8217;ve reconceptualized their role from <em>implementer</em> to <em>orchestrator</em>. They&#8217;ve learned to think declaratively rather than imperatively. They&#8217;ve accepted that their job is now architectural oversight and quality control, not line-by-line coding.</p><p>Those struggling are trying to use AI as a faster typewriter. They haven&#8217;t adapted their workflow. They&#8217;re fighting the agent&#8217;s approach instead of redirecting its goals. They haven&#8217;t invested in learning to prompt effectively which is now as critical as writing good documentation or design specs ever was.</p><p>There's an uncomfortable truth here: <strong>orchestrating agents feels a lot like <a href="https://addyosmani.com/blog/coding-agents-manager/">management</a></strong>. Delegating tasks. Reviewing output. Redirecting when things go sideways. If you became an engineer because you didn't want to be a manager, this shift might feel like a betrayal. The role changed underneath you.</p><p><strong>The gap seems to be widening. The people who&#8217;ve figured out how to work with these tools are shipping stuff I can barely keep up with. Everyone else is... still figuring it out.</strong></p><p>This split may make some uncomfortable. I&#8217;ve always said I&#8217;m a builder, but I also enjoyed programming. The idea that these are now diverging paths - that you have to pick one - feels reductive. Like we&#8217;re forcing a binary on something more complicated. Someone in the comments said it perfectly: both viewpoints are valid, just different wiring. Neither is wrong.</p><h2>From imperative to declarative: The real leverage</h2><p><strong>Don&#8217;t tell the AI what to do - give it success criteria and watch it loop. The magic isn&#8217;t in the agent writing code, it&#8217;s in the agent iterating until it satisfies conditions you specify.</strong></p><p>Karpathy&#8217;s observation about leverage cuts to the core: </p><blockquote><p><strong>&#8220;LLMs are exceptionally good at looping until they meet specific goals and this is where most of the &#8216;feel the AGI&#8217; magic is to be found.&#8221;</strong></p></blockquote><p>The shift from imperative to declarative development:</p><p><strong>Old model (imperative)</strong>: &#8220;Write a function that takes X and returns Y. Use this library. Handle these edge cases. Make sure to...&#8221;</p><p><strong>New model (declarative)</strong>: &#8220;Here are the requirements. Here are the tests that must pass. Here&#8217;s the success criteria. Figure out how.&#8221;</p><p>This works because agents never get demoralized. They&#8217;ll try approaches you wouldn&#8217;t have patience for. They iterate relentlessly. If you specify the destination clearly, they&#8217;ll navigate there - even if it takes 30 failed attempts.</p><p>The patterns that work:</p><ul><li><p>Write tests first, let the agent iterate until they pass</p></li><li><p>Hook it up to a browser via MCP, let it verify behavior visually</p></li><li><p>Implement the naive correct version, then optimize while preserving correctness</p></li><li><p>Define the API contract, let it implement to spec</p></li></ul><p>But this only works if your success criteria are actually correct. Garbage in, garbage out scales with capability.</p><p>The developers succeeding with this approach spend 70% of their time on problem definition and verification strategy, 30% on execution. The ratios inverted from traditional development, but the total time decreased dramatically.</p><h2>The slopacolypse question</h2><p><strong>When anyone can generate thousands of lines of code in minutes, the ability to say &#8216;we don&#8217;t need this&#8217; becomes more valuable.</strong></p><p>Karpathy warned: </p><blockquote><p>&#8220;<strong>I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media.&#8221;</strong></p></blockquote><p>The concern is straightforward: when anyone can generate arbitrarily large volumes of plausible-looking code, content, papers, or posts, how do we maintain signal-to-noise ratio?</p><p>Boris Cherny offers a counterpoint: &#8220;My bet is that there will be no slopcopolypse because the model will become better at writing less sloppy code and at fixing existing code issues. In the meantime, what helps is having the model code review its code using a fresh context window.&#8221;</p><p>Both can be true simultaneously. The capability for slop exists at unprecedented scale. The tooling to prevent it is emerging. The question is which scales faster.</p><p><strong>The slopacolypse will be driven by people who mistake velocity for productivity.</strong> Agents are marathon runners with no sense of direction unless you give it to them. They will sprint ten miles into a brick wall if you don&#8217;t audit the &#8220;code actions&#8221; where necessary.</p><p>The teams I&#8217;ve seen handle this well tend to do a few things:</p><ul><li><p>Fresh-context code reviews help, though it feels weird asking the same model to critique its own code. It works, though - give it a clean slate and it catches its own mistakes.</p></li><li><p>Automated verification at every step (CI/CD, linters, type checkers, tests as guardrails)</p></li><li><p>Deliberate constraints on agent autonomy (bounded tasks, clear success criteria)</p></li><li><p>High emphasis on human-in-the-loop at architectural decision points</p></li></ul><p>The code quality problems Karpathy describes - overcomplication, abstraction bloat, dead code - these improve as models improve. But they won&#8217;t disappear. They&#8217;re emergent from how these systems approach problems.</p><h2>What actually works: practical patterns</h2><p><strong>The future belongs to those who can maintain a coherent mental model of the macro while agents handle tactical drudgery of the micro.</strong></p><p>After watching teams adapt over the past year, effective patterns have crystallized:</p><p><strong>1. Agent-first drafts with tight iteration loops</strong> </p><p>Don&#8217;t use AI for one-off suggestions. Generate entire first drafts, then refine. The Claude Code team practice: have the model review its own code with a fresh context window. This catches issues before human review.</p><p><strong>2. Declarative communication</strong> </p><p>Spend 70% of effort on problem definition, 30% on execution. Write comprehensive specs, define success criteria, provide test cases up front. Guide the agent&#8217;s goals, not its methods.</p><p><strong>3. Automated verification</strong> </p><p>If you repeatedly fix the same class of mistake, write a test or lint rule preemptively. Make the agent explain its code and flag potential problems before you review.</p><p><strong>4. Deliberate learning vs. just focusing so much on production</strong> </p><p>Use AI as a learning tool, not a crutch (you&#8217;ve heard this a few times now). When the agent writes something you don&#8217;t understand, that&#8217;s a signal to dig deeper. Treat AI-generated code like code from a mentor - review it to learn, not just to ship.</p><p><strong>5. Architectural hygiene</strong> </p><p>More modularization, clearer API boundaries. Well-documented style guides fed into prompts. High-level architecture descriptions provided before coding begins. The planning phase expanded; the coding phase compressed; the review phase focused on design rather than syntax.</p><p><strong>The developers who thrive won&#8217;t be those who generate the most code. They&#8217;ll be those who know which code to generate, when to question the output, and how to maintain comprehension even as their hands leave the keyboard.</strong></p><h2>The uncomfortable truth about skill development</h2><p><strong>If your ability to &#8220;read&#8221; doesn&#8217;t scale at the same rate as the agent&#8217;s ability to &#8220;output,&#8221; you aren&#8217;t engineering anymore. You&#8217;re rubber stamping.</strong></p><blockquote><p>&#8220;It&#8217;s been like the boiling frog for me. Started by copy-pasting more into ChatGPT. Then more in-IDE prompting. Then agent tools. Suddenly I barely hand code anymore. The transition was so gradual I didn&#8217;t notice until I was already there&#8221; [<a href="https://news.ycombinator.com/user?id=shawabawa3">HN</a>]</p></blockquote><p>There&#8217;s early evidence of skill atrophy in heavy AI users. Junior developers who rely on AI for everything report feeling less confident in problem-solving abilities over time. It&#8217;s the Google effect applied to coding - when you outsource constantly, your brain stops retaining.</p><p>I don&#8217;t know what the solution is, but I&#8217;ve been trying a few things:</p><ul><li><p>Use TDD: write tests (or think through test cases) before letting AI implement</p></li><li><p>Pair with seniors: discuss AI suggestions in real-time to learn the decision-making process</p></li><li><p>Ask for explanations: have the AI justify its approach, not just generate solutions</p></li><li><p>Alternate: write some features manually to maintain muscle memory</p></li></ul><p><strong>The risk is real: it&#8217;s dangerously easy to review code you can no longer write from scratch.</strong> When that happens, you&#8217;ve become dependent on the tool in a way that limits your growth.</p><p>The engineers who will thrive long-term are those who use AI to accelerate gaining experience, not to bypass it entirely. They maintain their fundamentals while leveraging AI to explore more territory faster.</p><h2>Where this leaves us</h2><p><strong>The shift from 70% to 80% isn&#8217;t about percentages - it&#8217;s about the gap between prototype and production-ready software. That gap is narrowing, but it hasn&#8217;t closed.</strong></p><p>Karpathy asks the right questions: </p><blockquote><p><strong>&#8220;What happens to the &#8216;10X engineer&#8217; - the ratio of productivity between the mean and the max engineer? It&#8217;s quite possible that this grows a lot. Armed with LLMs, do generalists increasingly outperform specialists?&#8221;</strong></p></blockquote><p>These questions will define the next few years.</p><p>One thing is certain: AI wrote 80% of code for early adopters in late 2025. Even if your percentage is much lower, it&#8217;s likely higher than a year ago. This places disproportionate emphasis on the human&#8217;s role: owning outcomes, maintaining quality bars, ensuring tests actually validate behavior.</p><p><strong>The danger isn&#8217;t that the agent fails. I think it&#8217;s that it succeeds so confidently in the wrong direction that you stop checking the compass.</strong></p><p>DORA&#8217;s 2025 report crystallized the reality: AI is an amplifier of your development practices. Good processes get better (high-performing teams saw 55-70% faster delivery). Bad processes get worse (accumulating debt at unprecedented speed). There is no silver bullet.</p><p>Karpathy&#8217;s final observation resonates most: </p><blockquote><p><strong>&#8220;I didn&#8217;t anticipate that with agents programming feels </strong><em><strong>more</strong></em><strong> fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck and I experience a lot more courage because there&#8217;s almost always a way to work hand in hand with it to make some positive progress.&#8221;</strong></p></blockquote><p>He also notes: &#8220;LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.&#8221;</p><p><strong>That&#8217;s probably the most insightful prediction about where this is headed.</strong></p><p>If you liked the act of writing code itself - the craft of it, the meditation of it - this transition might feel like loss. If you liked building things and code was the necessary means, this feels like liberation.</p><p>Neither response is wrong. But the tooling is optimizing for the latter.</p><h2>For the skeptics (you&#8217;re right to be skeptical)</h2><p>The productivity claims are often overhyped. AI still makes mistakes a competent junior wouldn&#8217;t. Comprehension debt is real and poorly understood. The slopacolypse risk is genuine.</p><p>But the shift is real. When Karpathy admits he barely writes code directly anymore, when the Claude Code team ships 20+ PRs daily with 100% AI-written code, we&#8217;re past the point of dismissing this as hype.</p><p><strong>As software engineers, our identity was never &#8220;the person who can write code&#8221; - it was &#8220;the person who can solve problems with software.&#8221;</strong></p><p>AI isn&#8217;t replacing engineers. It&#8217;s amplifying them - for better and for worse.</p><p>My advice: embrace the tools, but own the outcome. Use AI to accelerate learning, not skip it. Focus on fundamentals that matter more than ever: robust architecture, clean code, thorough tests, thoughtful UX. These remain as important as ever - maybe more so, since implementation is no longer the bottleneck.</p><p>I don&#8217;t know where this goes. Karpathy&#8217;s probably right that it&#8217;ll split people between those who liked coding and those who liked building. </p><p>We&#8217;re all figuring this out in public, one PR at a time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lmAD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lmAD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 424w, https://substackcdn.com/image/fetch/$s_!lmAD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 848w, https://substackcdn.com/image/fetch/$s_!lmAD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!lmAD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lmAD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9189021-cd66-44d9-8683-520663047835_2400x1350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2876019,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/185933546?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lmAD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 424w, https://substackcdn.com/image/fetch/$s_!lmAD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 848w, https://substackcdn.com/image/fetch/$s_!lmAD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!lmAD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9189021-cd66-44d9-8683-520663047835_2400x1350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[How to write a good spec for AI agents]]></title><description><![CDATA[How to structure, plan, and iterate for high-performance coding agents]]></description><link>https://addyo.substack.com/p/how-to-write-a-good-spec-for-ai-agents</link><guid isPermaLink="false">https://addyo.substack.com/p/how-to-write-a-good-spec-for-ai-agents</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Mon, 19 Jan 2026 15:31:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qALe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>TL;DR: Aim for a clear spec covering just enough nuance (this may include structure, style, testing, boundaries) to guide the AI without overwhelming it. Break large tasks into smaller ones vs. keeping everything in one large prompt. Plan first in read-only mode, then execute and iterate continuously.</strong></p><blockquote><p><em>&#8220;I&#8217;ve heard a lot about writing good specs for AI agents, but haven&#8217;t found a solid framework yet. I could write a spec that rivals an RFC, but at some point the context is too large and the model breaks down.&#8221;</em></p></blockquote><p>Many developers share this frustration. Simply throwing a massive spec at an AI agent doesn&#8217;t work - context window limits and the model&#8217;s &#8220;attention budget&#8221; get in the way. The key is to write smart specs: documents that guide the agent clearly, stay within practical context sizes, and evolve with the project. This guide distills best practices from my use of coding agents including Claude Code and Gemini CLI into a framework for spec-writing that keeps your AI agents focused and productive.</p><p>We&#8217;ll cover five principles for great AI agent specs, each starting with a bolded takeaway.</p><h2><strong>1. Start with a high-level vision and let the AI draft the details</strong></h2><p><strong>Kick off your project with a concise high-level spec, then have the AI expand it into a detailed plan.</strong></p><p>Instead of over-engineering upfront, begin with a clear goal statement and a few core requirements. Treat this as a &#8220;product brief&#8221; and let the agent generate a more elaborate spec from it. This leverages the AI&#8217;s strength in elaboration while you maintain control of the direction. This works well unless you already feel you have very specific technical requirements that must be met from the start.</p><p><strong>Why this works:</strong> LLM-based agents excel at fleshing out details when given a solid high-level directive, but they need a clear mission to avoid drifting off course. By providing a short outline or objective description and asking the AI to produce a full specification (e.g. a spec.md), you create a persistent reference for the agent. Planning in advance matters even more with an agent - you can iterate on the plan first, then hand it off to the agent to write the code. The spec becomes the first artifact you and the AI build together.</p><p><strong>Practical approach:</strong> Start a new coding session by prompting:</p><blockquote><p>&#8220;You are an AI software engineer. Draft a detailed specification for [project X] covering objectives, features, constraints, and a step-by-step plan.&#8221; </p><p>Keep your initial prompt high-level - e.g. &#8220;Build a web app where users can track tasks (to-do list), with user accounts, a database, and a simple UI&#8221;. </p></blockquote><p>The agent might respond with a structured draft spec: an overview, feature list, tech stack suggestions, data model, and so on. This spec then becomes the &#8220;source of truth&#8221; that both you and the agent can refer back to. GitHub&#8217;s AI team promotes <a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/">spec-driven development</a> where &#8220;specs become the shared source of truth&#8230; living, executable artifacts that evolve with the project&#8221;. Before writing any code, review and refine the AI&#8217;s spec. Make sure it aligns with your vision and correct any hallucinations or off-target details.</p><p><strong>Use Plan Mode to enforce planning-first:</strong> Tools like Claude Code offer a <a href="https://code.claude.com/docs/en/common-workflows">Plan Mode</a> that restricts the agent to read-only operations - it can analyze your codebase and create detailed plans but won&#8217;t write any code until you&#8217;re ready. This is ideal for the planning phase: start in Plan Mode (Shift+Tab in Claude Code), describe what you want to build, and let the agent draft a spec while exploring your existing code. Ask it to clarify ambiguities by questioning you about the plan. Have it review the plan for architecture, best practices, security risks, and testing strategy. The goal is to refine the plan until there&#8217;s no room for misinterpretation. Only then do you exit Plan Mode and let the agent execute. This workflow prevents the common trap of jumping straight into code generation before the spec is solid.</p><p><strong>Use the spec as context:</strong> Once approved, save this spec (e.g. as SPEC.md) and feed relevant sections into the agent as needed. Many developers using a strong model do exactly this - the spec file persists between sessions, anchoring the AI whenever work resumes on the project. This mitigates the forgetfulness that can happen when the conversation history gets too long or when you have to restart an agent. It&#8217;s akin to how one would use a Product Requirements Document (PRD) in a team: a reference that everyone (human or AI) can consult to stay on track. Experienced folks often &#8220;<a href="https://simonwillison.net/2025/Oct/7/vibe-engineering/">write good documentation first</a> and the model may be able to build the matching implementation from that input alone&#8221; as one engineer observed. The spec is that documentation.</p><p><strong>Keep it goal-oriented:</strong> A high-level spec for an AI agent should focus on what and why, more than the nitty-gritty how (at least initially). Think of it like the user story and acceptance criteria: Who is the user? What do they need? What does success look like? (e.g. &#8220;User can add, edit, complete tasks; data is saved persistently; the app is responsive and secure&#8221;). This keeps the AI&#8217;s detailed spec grounded in user needs and outcome, not just technical to-dos. As the <a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/">GitHub Spec Kit docs</a> put it, provide a high-level description of what you&#8217;re building and why, and let the coding agent generate a detailed specification focusing on user experience and success criteria. Starting with this big-picture vision prevents the agent from losing sight of the forest for the trees when it later gets into coding.</p><h2><strong>2. Structure the spec like a professional PRD (or SRS)</strong></h2><p><strong>Treat your AI spec as a structured document (PRD) with clear sections, not a loose pile of notes.</strong></p><p>Many developers treat specs for agents much like traditional Product Requirement Documents (PRDs) or System Design docs - comprehensive, well-organized, and easy for a &#8220;literal-minded&#8221; AI to parse. This formal approach gives the agent a blueprint to follow and reduces ambiguity.</p><p><strong>The six core areas:</strong> GitHub&#8217;s analysis of <a href="https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/">over 2,500 agent configuration files</a> revealed a clear pattern: the most effective specs cover six areas. Use this as a checklist for completeness:</p><p><strong>1. Commands:</strong> Put executable commands early - not just tool names, but full commands with flags: <code>npm test</code>, <code>pytest -v</code>, <code>npm run build</code>. The agent will reference these constantly.</p><p><strong>2. Testing:</strong> How to run tests, what framework you use, where test files live, and what coverage expectations exist.</p><p><strong>3. Project structure:</strong> Where source code lives, where tests go, where docs belong. Be explicit: &#8220;<code>src/</code> for application code, <code>tests/</code> for unit tests, <code>docs/</code> for documentation.&#8221;</p><p><strong>4. Code style:</strong> One real code snippet showing your style beats three paragraphs describing it. Include naming conventions, formatting rules, and examples of good output.</p><p><strong>5. Git workflow:</strong> Branch naming, commit message format, PR requirements. The agent can follow these if you spell them out.</p><p><strong>6. Boundaries:</strong> What the agent should never touch - secrets, vendor directories, production configs, specific folders. &#8220;Never commit secrets&#8221; was the single most common helpful constraint in the GitHub study.</p><p><strong>Be specific about your stack:</strong> Say &#8220;React 18 with TypeScript, Vite, and Tailwind CSS&#8221; not &#8220;React project.&#8221; Include versions and key dependencies. Vague specs produce vague code.</p><p><strong>Use a consistent format:</strong> Clarity is king. Many devs use Markdown headings or even XML-like tags in the spec to delineate sections, because AI models handle well-structured text better than free-form prose. For example, you might structure the spec as:</p><pre><code><code># Project Spec: My team's tasks app

## Objective
- Build a web app for small teams to manage tasks...

## Tech Stack
- React 18+, TypeScript, Vite, Tailwind CSS
- Node.js/Express backend, PostgreSQL, Prisma ORM

## Commands
- Build: `npm run build` (compiles TypeScript, outputs to dist/)
- Test: `npm test` (runs Jest, must pass before commits)
- Lint: `npm run lint --fix` (auto-fixes ESLint errors)

## Project Structure
- `src/` &#8211; Application source code
- `tests/` &#8211; Unit and integration tests
- `docs/` &#8211; Documentation

## Boundaries
- &#9989; Always: Run tests before commits, follow naming conventions
- &#9888;&#65039; Ask first: Database schema changes, adding dependencies
- &#128683; Never: Commit secrets, edit node_modules/, modify CI config
</code></code></pre><p>This level of organization not only helps you think clearly, it helps the AI find information. Anthropic engineers recommend <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">organizing prompts into distinct sections</a> (like &lt;background&gt;, &lt;instructions&gt;, &lt;tools&gt;, &lt;output_format&gt; etc.) for exactly this reason - it gives the model strong cues about which info is which. And remember, &#8220;minimal does not necessarily mean short&#8221; - don&#8217;t shy away from detail in the spec if it matters, but keep it focused.</p><p><strong>Integrate specs into your toolchain:</strong> Treat specs as &#8220;executable artifacts&#8221; tied to version control and CI/CD. The <a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/">GitHub Spec Kit</a> uses a four-phase, gated workflow that makes your specification the center of your engineering process. Instead of writing a spec and setting it aside, the spec drives the implementation, checklists, and task breakdowns. Your primary role is to steer; the coding agent does the bulk of the writing. Each phase has a specific job, and you don&#8217;t move to the next one until the current task is fully validated:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z2M7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z2M7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Z2M7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Z2M7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Z2M7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z2M7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg" width="1456" height="803" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:803,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Spec Driven Development Workflow&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Spec Driven Development Workflow" title="Spec Driven Development Workflow" srcset="https://substackcdn.com/image/fetch/$s_!Z2M7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Z2M7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Z2M7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Z2M7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49888e6f-2aaf-4689-aeab-22957a766bf9_2784x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>1. Specify:</strong> You provide a high-level description of what you&#8217;re building and why, and the coding agent generates a detailed specification. This isn&#8217;t about technical stacks or app design - it&#8217;s about user journeys, experiences, and what success looks like. Who will use this? What problem does it solve? How will they interact with it? Think of it as mapping the user experience you want to create, and letting the coding agent flesh out the details. This becomes a living artifact that evolves as you learn more.</p><p><strong>2. Plan:</strong> Now you get technical. You provide your desired stack, architecture, and constraints, and the coding agent generates a comprehensive technical plan. If your company standardizes on certain technologies, this is where you say so. If you&#8217;re integrating with legacy systems or have compliance requirements, all of that goes here. You can ask for multiple plan variations to compare approaches. If you make internal docs available, the agent can integrate your architectural patterns directly into the plan.</p><p><strong>3. Tasks:</strong> The coding agent takes the spec and plan and breaks them into actual work - small, reviewable chunks that each solve a specific piece of the puzzle. Each task should be something you can implement and test in isolation, almost like test-driven development for your AI agent. Instead of &#8220;build authentication,&#8221; you get concrete tasks like &#8220;create a user registration endpoint that validates email format.&#8221;</p><p><strong>4. Implement:</strong> Your coding agent tackles tasks one by one (or in parallel). Instead of reviewing thousand-line code dumps, you review focused changes that solve specific problems. The agent knows what to build (specification), how to build it (plan), and what to work on (task). Crucially, your role is to verify at each phase: Does the spec capture what you want? Does the plan account for constraints? Are there edge cases the AI missed? The process builds in checkpoints for you to critique, spot gaps, and course-correct before moving forward.</p><p>This gated workflow prevents what Willison calls &#8220;house of cards code&#8221; - fragile AI outputs that collapse under scrutiny. Anthropic&#8217;s Skills system offers a similar pattern, letting you define reusable Markdown-based behaviors that agents invoke. By embedding your spec in these workflows, you ensure the agent can&#8217;t proceed until the spec is validated, and changes propagate automatically to task breakdowns and tests.</p><p><strong>Consider agents.md for specialized personas:</strong> For tools like GitHub Copilot, you can create <a href="https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/">agents.md files</a> that define specialized agent personas - a @docs-agent for technical writing, a @test-agent for QA, a @security-agent for code review. Each file acts as a focused spec for that persona&#8217;s behavior, commands, and boundaries. This is particularly useful when you want different agents for different tasks rather than one general-purpose assistant.</p><p><strong>Design for Agent Experience (AX):</strong> Just as we design APIs for developer experience (DX), consider designing specs for &#8220;Agent Experience.&#8221; This means clean, parseable formats: OpenAPI schemas for any APIs the agent will consume, llms.txt files that summarize documentation for LLM consumption, and explicit type definitions. The Agentic AI Foundation (AAIF) is standardizing protocols like MCP (Model Context Protocol) for tool integration - specs that follow these patterns are easier for agents to consume and act on reliably.</p><p><strong>PRD vs SRS mindset:</strong> It helps to borrow from established documentation practices. For AI agent specs, you&#8217;ll often blend these into one document (as illustrated above), but covering both angles serves you well. Writing it like a PRD ensures you include user-centric context (&#8220;the why behind each feature&#8221;) so the AI doesn&#8217;t optimize for the wrong thing. Expanding it like an SRS ensures you nail down the specifics the AI will need to actually generate correct code (like what database or API to use). Developers have found that this extra upfront effort pays off by drastically reducing miscommunications with the agent later.</p><p><strong>Make the spec a &#8220;living document&#8221;:</strong> Don&#8217;t write it and forget it. Update the spec as you and the agent make decisions or discover new info. If the AI had to change the data model or you decided to cut a feature, reflect that in the spec so it remains the ground truth. Think of it as version-controlled documentation. In <a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/">spec-driven workflows</a>, the spec drives implementation, tests, and task breakdowns, and you don&#8217;t move to coding until the spec is validated. This habit keeps the project coherent, especially if you or the agent step away and come back later. Remember, the spec isn&#8217;t just for the AI - it helps you as the developer maintain oversight and ensure the AI&#8217;s work meets the real requirements.</p><h2><strong>3. Break tasks into modular prompts and context, not one big prompt</strong></h2><p><strong>Divide and conquer: give the AI one focused task at a time rather than a monolithic prompt with everything at once.</strong></p><p>Experienced AI engineers have learned that trying to stuff the entire project (all requirements, all code, all instructions) into a single prompt or agent message is a recipe for confusion. Not only do you risk hitting token limits, you also risk the model losing focus due to the &#8220;<a href="https://maxpool.dev/research-papers/curse_of_instructions_report.html">curse of instructions</a>&#8221; - too many directives causing it to follow none of them well. The solution is to design your spec and workflow in a modular way, tackling one piece at a time and pulling in only the context needed for that piece.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BNjq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BNjq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BNjq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BNjq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BNjq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BNjq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg" width="1456" height="803" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:803,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Modular AI Specs&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Modular AI Specs" title="Modular AI Specs" srcset="https://substackcdn.com/image/fetch/$s_!BNjq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BNjq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BNjq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BNjq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f49a9b2-c2fd-4e49-9452-73c9d3f88901_2784x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The curse of too much context/instructions:</strong> Research has confirmed what many devs anecdotally saw: as you pile on more instructions or data into the prompt, the model&#8217;s performance in adhering to each one <a href="https://openreview.net/pdf/848f1332e941771aa491f036f6350af2effe0513.pdf">drops significantly</a>. One study dubbed this the &#8220;curse of instructions&#8221;, showing that even GPT-4 and Claude struggle when asked to satisfy many requirements simultaneously. In practical terms, if you present 10 bullet points of detailed rules, the AI might obey the first few and start overlooking others. The better strategy is iterative focus. <a href="https://maxpool.dev/research-papers/curse_of_instructions_report.html">Guidelines from industry</a> suggest decomposing complex requirements into sequential, simple instructions as a best practice. Focus the AI on one sub-problem at a time, get that done, then move on. This keeps the quality high and errors manageable.</p><p><strong>Divide the spec into phases or components:</strong> If your spec document is very long or covers a lot of ground, consider splitting it into parts (either physically separate files or clearly separate sections). For example, you might have a section for &#8220;Backend API Spec&#8221; and another for &#8220;Frontend UI Spec.&#8221; You don&#8217;t need to always feed the frontend spec to the AI when it&#8217;s working on the backend, and vice versa. Many devs using multi-agent setups even create separate agents or sub-processes for each part - e.g. one agent works on database/schema, another on API logic, another on frontend - each with the relevant slice of the spec. Even if you use a single agent, you can emulate this by copying only the relevant spec section into the prompt for that task. Avoid context overload: Don&#8217;t mix authentication tasks with database schema changes in one go, as the <a href="https://docs.digitalocean.com/products/gradient-ai-platform/concepts/context-management/">DigitalOcean AI guide</a> warns. Keep each prompt tightly scoped to the current goal.</p><p><strong>Extended TOC / Summaries for large specs:</strong> One clever technique is to have the agent build an extended Table of Contents with summaries for the spec. This is essentially a &#8220;spec summary&#8221; that condenses each section into a few key points or keywords, and references where details can be found. For example, if your full spec has a section on &#8220;Security Requirements&#8221; spanning 500 words, you might have the agent summarize it to: &#8220;Security: use HTTPS, protect API keys, implement input validation (see full spec &#167;4.2)&#8221;. By creating a hierarchical summary in the planning phase, you get a bird&#8217;s-eye view that can stay in the prompt, while the fine details remain offloaded unless needed. This extended TOC acts as an index: the agent can consult it and say &#8220;aha, there&#8217;s a security section I should look at&#8221;, and you can then provide that section on demand. It&#8217;s similar to how a human developer skims an outline and then flips to the relevant page of a spec document when working on a specific part.</p><p>To implement this, you can prompt the agent after writing the spec: &#8220;Summarize the spec above into a very concise outline with each section&#8217;s key points and a reference tag.&#8221; The result might be a list of sections with one or two sentence summaries. That summary can be kept in the system or assistant message to guide the agent&#8217;s focus without eating up too many tokens. This <a href="https://addyo.substack.com/p/context-engineering-bringing-engineering">hierarchical summarization approach</a> is known to help LLMs maintain long-term context by focusing on the high-level structure. The agent carries a &#8220;mental map&#8221; of the spec.</p><p><strong>Utilize sub-agents or &#8220;skills&#8221; for different spec parts:</strong> Another advanced approach is using multiple specialized agents (what Anthropic calls subagents or what you might call &#8220;skills&#8221;). Each subagent is configured for a specific area of expertise and given the portion of the spec relevant to that area. For instance, you might have a Database Designer subagent that only knows about the data model section of the spec, and an API Coder subagent that knows the API endpoints spec. The main agent (or an orchestrator) can route tasks to the appropriate subagent automatically. The benefit is each agent has a smaller context window to deal with and a more focused role, which can <a href="https://10xdevelopers.dev/structured/claude-code-with-subagents/">boost accuracy and allow parallel work</a> on independent tasks. Anthropic&#8217;s Claude Code supports this by letting you define subagents with their own system prompts and tools. &#8220;Each subagent has a specific purpose and expertise area, uses its own context window separate from the main conversation, and has a custom system prompt guiding its behavior,&#8221; as their docs describe. When a task comes up that matches a subagent&#8217;s domain, Claude can delegate that task to it, with the subagent returning results independently.</p><p><strong>Parallel agents for throughput:</strong> Running multiple agents simultaneously is emerging as &#8220;the next big thing&#8221; for developer productivity. Rather than waiting for one agent to finish before starting another task, you can spin up parallel agents for non-overlapping work. Willison describes this as &#8220;<a href="https://simonwillison.net/2025/Oct/7/vibe-engineering/">embracing parallel coding agents</a>&#8221; and notes it&#8217;s &#8220;surprisingly effective, if mentally exhausting&#8221;. The key is scoping tasks so agents don&#8217;t step on each other - one agent codes a feature while another writes tests, or separate components get built concurrently. Orchestration frameworks like LangGraph or OpenAI Swarm can help coordinate these agents, and shared memory via vector databases (like Chroma) lets them access common context without redundant prompting.</p><p><strong>Single vs. multi-agent: when to use each</strong></p><p><strong>AspectSingle AgentParallel/Multi-AgentStrengths</strong>Simpler setup; lower overhead; easier to debug and followHigher throughput; handles complex interdependencies; specialists per domain<strong>Challenges</strong>Context overload on big projects; slower iteration; single point of failureCoordination overhead; potential conflicts; needs shared memory (e.g., vector DBs)<strong>Best For</strong>Isolated modules; small-to-medium projects; early prototypingLarge codebases; one codes + one tests + one reviews; independent features<strong>Tips</strong>Use spec summaries; refresh context per task; start fresh sessions oftenLimit to 2-3 agents initially; use MCP for tool sharing; define clear boundaries</p><p>In practice, using subagents or skill-specific prompts might look like: you maintain multiple spec files (or prompt templates) - e.g. SPEC_backend.md, SPEC_frontend.md - and you tell the AI, &#8220;For backend tasks, refer to SPEC_backend; for frontend tasks refer to SPEC_frontend.&#8221; Or in a tool like Cursor/Claude, you actually spin up a subagent for each. This is certainly more complex to set up than a single-agent loop, but it mimics what human developers do - we mentally compartmentalize a large spec into relevant chunks (you don&#8217;t keep the whole 50-page spec in your head at once; you recall the part you need for the task at hand, and have a general sense of the overall architecture). The challenge, as noted, is managing interdependencies: the subagents must still coordinate (the frontend needs to know the API contract from the backend spec, etc.). A central overview (or an &#8220;architect&#8221; agent) can help by referencing the sub-specs and ensuring consistency.</p><p><strong>Focus each prompt on one task/section:</strong> Even without fancy multi-agent setups, you can manually enforce modularity. For example, after the spec is written, your next move might be: &#8220;Step 1: Implement the database schema.&#8221; You feed the agent the Database section of the spec only, plus any global constraints from the spec (like tech stack). The agent works on that. Then for Step 2, &#8220;Now implement the authentication feature&#8221;, you provide the Auth section of the spec and maybe the relevant parts of the schema if needed. By refreshing the context for each major task, you ensure the model isn&#8217;t carrying a lot of stale or irrelevant information that could distract it. As one guide suggests: &#8220;<a href="https://docs.digitalocean.com/products/gradient-ai-platform/concepts/context-management/">Start fresh: begin new sessions</a> to clear context when switching between major features&#8221;. You can always remind the agent of critical global rules (from the spec&#8217;s Constraints section) each time, but don&#8217;t shove the entire spec in if it&#8217;s not all needed.</p><p><strong>Use in-line directives and code TODOs:</strong> Another modularity trick is to use your code or spec as an active part of the conversation. For instance, scaffold your code with // TODO comments that describe what needs to be done, and have the agent fill them one by one. Each TODO essentially acts as a mini-spec for a small task. This keeps the AI laser-focused (&#8220;implement this specific function according to this spec snippet&#8221;) and you can iterate in a tight loop. It&#8217;s similar to giving the AI a checklist item to complete rather than the whole checklist at once.</p><p>The bottom line: small, focused context beats one giant prompt. This improves quality and keeps the AI from getting &#8220;overwhelmed&#8221; by too much at once. As one set of best practices sums up, provide &#8220;One Task Focus&#8221; and &#8220;Relevant info only&#8221; to the model, and avoid dumping everything everywhere. By structuring the work into modules - and using strategies like spec summaries or sub-spec agents - you&#8217;ll navigate around context size limits and the AI&#8217;s short-term memory cap. Remember, a well-fed AI is like a well-fed function: give it only the <a href="https://addyo.substack.com/p/context-engineering-bringing-engineering">inputs it needs for the job at hand</a>.</p><h2><strong>4. Build in self-checks, constraints, and human expertise</strong></h2><p><strong>Make your spec not just a to-do list for the agent, but also a guide for quality control - and don&#8217;t be afraid to inject your own expertise.</strong></p><p>A good spec for an AI agent anticipates where the AI might go wrong and sets up guardrails. It also takes advantage of what you know (domain knowledge, edge cases, &#8220;gotchas&#8221;) so the AI doesn&#8217;t operate in a vacuum. Think of the spec as both coach and referee for the AI: it should encourage the right approach and call out fouls.</p><p><strong>Use three-tier boundaries:</strong> The <a href="https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/">GitHub analysis of 2,500+ agent files</a> found that the most effective specs use a three-tier boundary system rather than a simple list of don&#8217;ts. This gives the agent clearer guidance on when to proceed, when to pause, and when to stop:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7iHk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7iHk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7iHk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7iHk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7iHk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7iHk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg" width="1456" height="803" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:803,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Three tier boundaries for AI agent specs&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Three tier boundaries for AI agent specs" title="Three tier boundaries for AI agent specs" srcset="https://substackcdn.com/image/fetch/$s_!7iHk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7iHk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7iHk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7iHk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff789bc70-7a7d-498b-b2c8-8e003e43682a_2784x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>&#9989; Always do:</strong> Actions the agent should take without asking. &#8220;Always run tests before commits.&#8221; &#8220;Always follow the naming conventions in the style guide.&#8221; &#8220;Always log errors to the monitoring service.&#8221;</p><p><strong>&#9888;&#65039; Ask first:</strong> Actions that require human approval. &#8220;Ask before modifying database schemas.&#8221; &#8220;Ask before adding new dependencies.&#8221; &#8220;Ask before changing CI/CD configuration.&#8221; This tier catches high-impact changes that might be fine but warrant a human check.</p><p><strong>&#128683; Never do:</strong> Hard stops. &#8220;Never commit secrets or API keys.&#8221; &#8220;Never edit node_modules/ or vendor/.&#8221; &#8220;Never remove a failing test without explicit approval.&#8221; &#8220;Never commit secrets&#8221; was the single most common helpful constraint in the study.</p><p>This three-tier approach is more nuanced than a flat list of rules. It acknowledges that some actions are always safe, some need oversight, and some are categorically off-limits. The agent can proceed confidently on &#8220;Always&#8221; items, flag &#8220;Ask first&#8221; items for review, and hard-stop on &#8220;Never&#8221; items.</p><p><strong>Encourage self-verification:</strong> One powerful pattern is to have the agent verify its work against the spec automatically. If your tooling allows, you can integrate checks like unit tests or linting that the AI can run after generating code. But even at the spec/prompt level, you can instruct the AI to double-check: e.g. &#8220;After implementing, compare the result with the spec and confirm all requirements are met. List any spec items that are not addressed.&#8221; This pushes the LLM to reflect on its output relative to the spec, catching omissions. It&#8217;s a form of self-audit built into the process.</p><p>For instance, you might append to a prompt: &#8220;(After writing the function, review the above requirements list and ensure each is satisfied, marking any missing ones).&#8221; The model will then (ideally) output the code followed by a short checklist indicating if it met each requirement. This reduces the chance it forgets something before you even run tests. It&#8217;s not foolproof, but it helps.</p><p><strong>LLM-as-a-Judge for subjective checks:</strong> For criteria that are hard to test automatically - code style, readability, adherence to architectural patterns - consider using &#8220;LLM-as-a-Judge.&#8221; This means having a second agent (or a separate prompt) review the first agent&#8217;s output against your spec&#8217;s quality guidelines. Anthropic and others have found this effective for subjective evaluation. You might prompt: &#8220;Review this code for adherence to our style guide. Flag any violations.&#8221; The judge agent returns feedback that either gets incorporated or triggers a revision. This adds a layer of semantic evaluation beyond syntax checks.</p><p><strong>Conformance testing:</strong> Willison advocates building conformance suites - language-independent tests (often YAML-based) that any implementation must pass. These act as a contract: if you&#8217;re building an API, the conformance suite specifies expected inputs/outputs, and the agent&#8217;s code must satisfy all cases. This is more rigorous than ad-hoc unit tests because it&#8217;s derived directly from the spec and can be reused across implementations. Include conformance criteria in your spec&#8217;s Success section (e.g., &#8220;Must pass all cases in conformance/api-tests.yaml&#8221;).</p><p><strong>Leverage testing in the spec:</strong> If possible, incorporate a test plan or even actual tests in your spec and prompt flow. In traditional development, we use TDD or write test cases to clarify requirements - you can do the same with AI. For example, in the spec&#8217;s Success Criteria, you might say &#8220;These sample inputs should produce these outputs&#8230;&#8221; or &#8220;the following unit tests should pass.&#8221; The agent can be prompted to run through those cases in its head or actually execute them if it has that capability. Simon Willison noted that having a <a href="https://simonwillison.net/2025/Oct/7/vibe-engineering/">robust test suite</a> is like giving the agents superpowers - they can validate and iterate quickly when tests fail. In an AI coding context, writing a bit of pseudocode for tests or expected outcomes in the spec can guide the agent&#8217;s implementation. Additionally, you can use a dedicated &#8220;<a href="https://10xdevelopers.dev/structured/claude-code-with-subagents/">test agent</a>&#8221; in a subagent setup that takes the spec&#8217;s criteria and continuously verifies the &#8220;code agent&#8217;s&#8221; output.</p><p><strong>Bring your domain knowledge:</strong> Your spec should reflect insights that only an experienced developer or someone with context would know. For example, if you&#8217;re building an e-commerce agent and you know that &#8220;products&#8221; and &#8220;categories&#8221; have a many-to-many relationship, state that clearly (don&#8217;t assume the AI will infer it - it might not). If a certain library is notoriously tricky, mention pitfalls to avoid. Essentially, pour your mentorship into the spec. The spec can contain advice like &#8220;If using library X, watch out for memory leak issue in version Y (apply workaround Z).&#8221; This level of detail is what turns an average AI output into a truly robust solution, because you&#8217;ve steered the AI away from common traps.</p><p>Also, if you have preferences or style guidelines (say, &#8220;use functional components over class components in React&#8221;), encode that in the spec. The AI will then emulate your style. Many engineers even include small examples in the spec, e.g., &#8220;All API responses should be JSON. E.g. {&#8220;error&#8221;: &#8220;message&#8221;} for errors.&#8221; By giving a quick example, you anchor the AI to the exact format you want.</p><p><strong>Minimalism for simple tasks:</strong> While we advocate thorough specs, part of expertise is knowing when to keep it simple. For relatively simple, isolated tasks, an overbearing spec can actually confuse more than help. If you&#8217;re asking the agent to do something straightforward (like &#8220;center a div on the page&#8221;), you might just say, &#8220;Make sure to keep the solution concise and do not add extraneous markup or styles.&#8221; No need for a full PRD there. Conversely, for complex tasks (like &#8220;implement an OAuth flow with token refresh and error handling&#8221;), that&#8217;s when you break out the detailed spec. A good rule of thumb: adjust spec detail to task complexity. Don&#8217;t under-spec a hard problem (the agent will flail or go off-track), but don&#8217;t over-spec a trivial one (the agent might get tangled or use up context on unnecessary instructions).</p><p><strong>Maintain the AI&#8217;s &#8220;persona&#8221; if needed:</strong> Sometimes, part of your spec is defining how the agent should behave or respond, especially if the agent interacts with users. For example, if building a customer support agent, your spec might include guidelines like &#8220;Use a friendly and professional tone,&#8221; &#8220;If you don&#8217;t know the answer, ask for clarification or offer to follow up, rather than guessing.&#8221; These kind of rules (often included in system prompts) help keep the AI&#8217;s outputs aligned with expectations. They are essentially spec items for AI behavior. Keep them consistent and remind the model of them if needed in long sessions (LLMs can &#8220;drift&#8221; in style over time if not kept on a leash).</p><p><strong>You remain the exec in the loop:</strong> The spec empowers the agent, but you remain the ultimate quality filter. If the agent produces something that technically meets the spec but doesn&#8217;t feel right, trust your judgement. Either refine the spec or directly adjust the output. The great thing about AI agents is they don&#8217;t get offended - if they deliver a design that&#8217;s off, you can say, &#8220;Actually, that&#8217;s not what I intended, let&#8217;s clarify the spec and redo it.&#8221; The spec is a living artifact in collaboration with the AI, not a one-time contract you can&#8217;t change.</p><p>Simon Willison humorously likened working with AI agents to &#8220;a very weird form of management&#8221; and even &#8220;getting good results out of a coding agent feels <a href="https://simonwillison.net/2025/Oct/7/vibe-engineering/">uncomfortably close to managing a human intern</a>&#8221;. You need to provide clear instructions (the spec), ensure they have the necessary context (the spec and relevant data), and give actionable feedback. The spec sets the stage, but monitoring and feedback during execution are key. If an AI was a &#8220;weird digital intern who will absolutely cheat if you give them a chance&#8221;, the spec and constraints you write are how you prevent that cheating and keep them on task.</p><p>Here&#8217;s the payoff: a good spec doesn&#8217;t just tell the AI what to build, it also helps it self-correct and stay within safe boundaries. By baking in verification steps, constraints, and your hard-earned knowledge, you drastically increase the odds that the agent&#8217;s output is correct on the first try (or at least much closer to correct). This reduces iterations and those &#8220;why on Earth did it do that?&#8221; moments.</p><h2><strong>5. Test, iterate, and evolve the spec (and use the right tools)</strong></h2><p><strong>Think of spec-writing and agent-building as an iterative loop: test early, gather feedback, refine the spec, and leverage tools to automate checks.</strong></p><p>The initial spec is not the end - it&#8217;s the beginning of a cycle. The best outcomes come when you continually verify the agent&#8217;s work against the spec and adjust accordingly. Also, modern AI devs use various tools to support this process (from CI pipelines to context management utilities).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nsmd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nsmd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nsmd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nsmd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nsmd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nsmd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg" width="1024" height="459" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:459,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Spec iteration loop: Test, Feedback, Refine, Tools&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Spec iteration loop: Test, Feedback, Refine, Tools" title="Spec iteration loop: Test, Feedback, Refine, Tools" srcset="https://substackcdn.com/image/fetch/$s_!nsmd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nsmd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nsmd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nsmd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a7534e-6004-4a41-b28e-c3e437c692af_1024x459.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Continuous testing:</strong> Don&#8217;t wait until the end to see if the agent met the spec. After each major milestone or even each function, run tests or at least do quick manual checks. If something fails, update the spec or prompt before proceeding. For example, if the spec said &#8220;passwords must be hashed with bcrypt&#8221; and you see the agent&#8217;s code storing plain text - stop and correct it (and remind the spec or prompt about the rule). Automated tests shine here: if you provided tests (or write them as you go), let the agent run them. In many coding agent setups, you can have an agent run npm test or similar after finishing a task. The results (failures) can then feed back into the next prompt, effectively telling the agent &#8220;your output didn&#8217;t meet spec on X, Y, Z - fix it.&#8221; This kind of agentic loop (code -&gt; test -&gt; fix -&gt; repeat) is extremely powerful and is how tools like Claude Code or Copilot Labs are evolving to handle larger tasks. Always define what &#8220;done&#8221; means (via tests or criteria) and check for it.</p><p><strong>Iterate on the spec itself:</strong> If you discover that the spec was incomplete or unclear (maybe the agent misunderstood something or you realized you missed a requirement), update the spec document. Then explicitly re-sync the agent with the new spec: &#8220;I have updated the spec as follows&#8230; Given the updated spec, adjust the plan or refactor the code accordingly.&#8221; This way the spec remains the single source of truth. It&#8217;s similar to how we handle changing requirements in normal dev - but in this case you&#8217;re also the product manager for your AI agent. Keep version history if possible (even just via commit messages or notes), so you know what changed and why.</p><p><strong>Utilize context-management and memory tools:</strong> There&#8217;s a growing ecosystem of tools to help manage AI agent context and knowledge. For instance, retrieval-augmented generation (RAG) is a pattern where the agent can pull in relevant chunks of data from a knowledge base (like a vector database) on the fly. If your spec is huge, you could embed sections of it and let the agent retrieve the most relevant parts when needed, instead of always providing the whole thing. There are also frameworks implementing the Model Context Protocol (MCP), which automates feeding the right context to the model based on the current task. One example is <a href="https://docs.digitalocean.com/products/gradient-ai-platform/concepts/context-management/">Context7</a> (context7.com), which can auto-fetch relevant context snippets from docs based on what you&#8217;re working on. In practice, this might mean the agent notices you&#8217;re working on &#8220;payment processing&#8221; and it pulls the &#8220;Payments&#8221; section of your spec or documentation into the prompt. Consider leveraging such tools or setting up a rudimentary version (even a simple search in your spec document).</p><p><strong>Parallelize carefully:</strong> Some developers run multiple agent instances in parallel on different tasks (as mentioned earlier with subagents). This can speed up development - e.g., one agent generates code while another simultaneously writes tests, or two features are built concurrently. If you go this route, ensure the tasks are truly independent or clearly separated to avoid conflicts (the spec should note any dependencies). For example, don&#8217;t have two agents writing to the same file at once. One workflow is to have an agent generate code and another review it in parallel, or to have separate components built that integrate later. This is advanced usage and can be mentally taxing to manage (as Willison admitted, running multiple agents is <a href="https://simonwillison.net/2025/Oct/7/vibe-engineering/">surprisingly effective, if mentally exhausting</a>!). Start with at most 2-3 agents to keep things manageable.</p><p><strong>Version control and spec locks:</strong> Use Git or your version control of choice to track what the agent does. <a href="https://simonwillison.net/2025/Oct/7/vibe-engineering/">Good version control habits</a> matter even more with AI assistance. Commit the spec file itself to the repo. This not only preserves history, but the agent can even use git diff or blame to understand changes (LLMs are quite capable of reading diffs). Some advanced agent setups let the agent query the VCS history to see when something was introduced - surprisingly, models can be &#8220;fiercely competent at Git&#8221;. By keeping your spec in the repo, you allow both you and the AI to track evolution. There are tools (like GitHub Spec Kit mentioned earlier) that integrate spec-driven development into the git workflow - for instance, gating merges on updated specs or generating checklists from spec items. While you don&#8217;t need those tools to succeed, the takeaway is to treat the spec like code - maintain it diligently.</p><p><strong>Cost and speed considerations:</strong> Working with large models and long contexts can be slow and expensive. A practical tip is to use model selection and batching smartly. Perhaps use a cheaper/faster model for initial drafts or repetitions, and reserve the most capable (and expensive) model for final outputs or complex reasoning. Some developers use GPT-4 or Claude for planning and critical steps, but offload simpler expansions or refactors to a local model or a smaller API model. If using multiple agents, maybe not all need to be top-tier; a test-running agent or a linter agent could be a smaller model. Also consider throttling context size: don&#8217;t feed 20k tokens if 5k will do. As we discussed, <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">more tokens can mean diminishing returns</a>.</p><p><strong>Monitor and log everything:</strong> In complex agent workflows, logging the agent&#8217;s actions and outputs is essential. Check the logs to see if the agent is deviating or encountering errors. Many frameworks provide trace logs or allow printing the agent&#8217;s chain-of-thought (especially if you prompt it to think step-by-step). Reviewing these logs can highlight where the spec or instructions might have been misinterpreted. It&#8217;s not unlike debugging a program - except the &#8220;program&#8221; is the conversation/prompt chain. If something weird happens, go back to the spec/instructions to see if there was ambiguity.</p><p><strong>Learn and improve:</strong> Finally, treat each project as a learning opportunity to refine your spec-writing skill. Maybe you&#8217;ll discover that a certain phrasing consistently confuses the AI, or that organizing spec sections in a certain way yields better adherence. Incorporate those lessons into the next spec. The field of AI agents is rapidly evolving, so new best practices (and tools) emerge constantly. Stay updated via blogs (like the ones by Simon Willison, Andrej Karpathy, etc.), and don&#8217;t hesitate to experiment.</p><p>A spec for an AI agent isn&#8217;t &#8220;write once, done.&#8221; It&#8217;s part of a continuous cycle of instructing, verifying, and refining. The payoff for this diligence is substantial: by catching issues early and keeping the agent aligned, you avoid costly rewrites or failures later. As one AI engineer quipped, using these practices can feel like having &#8220;an army of interns&#8221; working for you, but you have to manage them well. A good spec, continuously maintained, is your management tool.</p><h2><strong>Avoid common pitfalls</strong></h2><p>Before wrapping up, it&#8217;s worth calling out anti-patterns that can derail even well-intentioned spec-driven workflows. The <a href="https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/">GitHub study of 2,500+ agent files</a> revealed a stark divide: &#8220;Most agent files fail because they&#8217;re too vague.&#8221; Here are the mistakes to avoid:</p><p><strong>Vague prompts:</strong> &#8220;Build me something cool&#8221; or &#8220;Make it work better&#8221; gives the agent nothing to anchor on. As Baptiste Studer puts it: &#8220;Vague prompts mean wrong results.&#8221; Be specific about inputs, outputs, and constraints. &#8220;You are a helpful coding assistant&#8221; doesn&#8217;t work. &#8220;You are a test engineer who writes tests for React components, follows these examples, and never modifies source code&#8221; does.</p><p><strong>Overlong contexts without summarization:</strong> Dumping 50 pages of documentation into a prompt and hoping the model figures it out rarely works. Use hierarchical summaries (as discussed in Principle 3) or RAG to surface only what&#8217;s relevant. Context length is not a substitute for context quality.</p><p><strong>Skipping human review:</strong> Willison has a personal rule: &#8220;I won&#8217;t commit code I couldn&#8217;t explain to someone else.&#8221; Just because the agent produced something that passes tests doesn&#8217;t mean it&#8217;s correct, secure, or maintainable. Always review critical code paths. The &#8220;house of cards&#8221; metaphor applies: AI-generated code can look solid but collapse under edge cases you didn&#8217;t test.</p><p><strong>Conflating vibe coding with production engineering:</strong> Rapid prototyping with AI (&#8220;vibe coding&#8221;) is great for exploration and throwaway projects. But shipping that code to production without rigorous specs, tests, and review is asking for trouble. I distinguish &#8220;vibe coding&#8221; from &#8220;AI-assisted engineering&#8221; - the latter requires the discipline this guide describes. Know which mode you&#8217;re in.</p><p><strong>Ignoring the &#8220;lethal trifecta&#8221;:</strong> Willison warns of three properties that make AI agents dangerous: speed (they work faster than you can review), non-determinism (same input, different outputs), and cost (encouraging corner-cutting on verification). Your spec and review process must account for all three. Don&#8217;t let speed outpace your ability to verify.</p><p><strong>Missing the six core areas:</strong> If your spec doesn&#8217;t cover commands, testing, project structure, code style, git workflow, and boundaries, you&#8217;re likely missing something the agent needs. Use the six-area checklist from Section 2 as a sanity check before handing off to the agent.</p><h2><strong>Conclusion</strong></h2><p>Writing an effective spec for AI coding agents requires solid software engineering principles combined with adaptation to LLM quirks. Start with clarity of purpose and let the AI help expand the plan. Structure the spec like a serious design document - covering the six core areas and integrating it into your toolchain so it becomes an executable artifact, not just prose. Keep the agent&#8217;s focus tight by feeding it one piece of the puzzle at a time (and consider clever tactics like summary TOCs, subagents, or parallel orchestration to handle big specs). Anticipate pitfalls by including three-tier boundaries (Always/Ask first/Never), self-checks, and conformance tests - essentially, teach the AI how to not fail. And treat the whole process as iterative: use tests and feedback to refine both the spec and the code continuously.</p><p>Follow these guidelines and your AI agent will be far less likely to &#8220;break down&#8221; under large contexts or wander off into nonsense.</p><p>Happy spec-writing!</p><div><hr></div><p><em>I&#8217;m excited to share I&#8217;ve released a new <a href="https://beyond.addy.ie/">AI-assisted engineering book</a> with O&#8217;Reilly. There are a number of free tips on the book site in case interested.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qALe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qALe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 424w, https://substackcdn.com/image/fetch/$s_!qALe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 848w, https://substackcdn.com/image/fetch/$s_!qALe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!qALe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qALe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:545033,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/184990361?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qALe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 424w, https://substackcdn.com/image/fetch/$s_!qALe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 848w, https://substackcdn.com/image/fetch/$s_!qALe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!qALe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f80e0c9-a1ae-468a-a20b-7bfdd64d1cab_2400x1350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Code Review in the Age of AI]]></title><description><![CDATA[AI writes faster. Humans still have to prove it works.]]></description><link>https://addyo.substack.com/p/code-review-in-the-age-of-ai</link><guid isPermaLink="false">https://addyo.substack.com/p/code-review-in-the-age-of-ai</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Mon, 05 Jan 2026 15:30:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ggv3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>AI did not kill code review. It made the burden of proof explicit. Ship changes with evidence like manual verification and automated tests, then use review for risk, intent, and accountability. Solo developers lean on automation to keep up with AI speed, while teams use review to build shared context and ownership.</strong></p><p>If your pull request doesn&#8217;t contain evidence that it works, you&#8217;re not shipping faster - you&#8217;re just moving work downstream.</p><p>By early 2026,<a href="https://www.infoworld.com/article/4049949/senior-developers-let-ai-do-more-of-the-coding-survey.html"> over 30% of senior developers</a> report shipping mostly AI-generated code. The challenge? AI excels at drafting features but falters on logic, security, and edge cases - <a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report">making errors 75% more common in logic alone</a>. This splits workflows: solos &#8220;vibe&#8221; at inference speed with test suites as backstops, while teams demand human eyes for context and compliance. Done right, both treat AI as an accelerator, but verification - who, what, and when - defines the difference.</p><p>As I&#8217;ve said before: if you haven&#8217;t seen the code do the right thing yourself, it doesn&#8217;t work. AI amplifies this rule, not excuses it.</p><h2><strong>How developers use AI for review</strong></h2><ul><li><p><strong>Ad-hoc LLM checks</strong>: Paste diffs into Claude, Gemini or GPT for quick bug/style scans before committing.</p></li><li><p><strong>IDE integrations</strong>: Tools like Cursor, Claude Code, or Gemini CLI for inline suggestions and refactors during coding.</p></li><li><p><strong>PR bots and scanners</strong>: GitHub Copilot or custom agents to flag issues in PRs; pair with static/dynamic analysis like Snyk for security.</p></li><li><p><strong>Automated testing loops</strong>: Use AI to generate and run tests, enforcing coverage &gt;70% as a gate.</p></li><li><p><strong>Multi-model reviews</strong>: Run code through different LLMs (e.g., Claude for generation, a security-focused model for audit) to catch biases.</p></li></ul><p>The workflow and mindset differ dramatically depending on whether you&#8217;re solo or working in a team where others maintain your code.</p><h2><strong>Solo vs. Team: A quick comparison</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NNHx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NNHx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 424w, https://substackcdn.com/image/fetch/$s_!NNHx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 848w, https://substackcdn.com/image/fetch/$s_!NNHx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 1272w, https://substackcdn.com/image/fetch/$s_!NNHx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NNHx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png" width="1456" height="516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:516,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:247807,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/183169342?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NNHx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 424w, https://substackcdn.com/image/fetch/$s_!NNHx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 848w, https://substackcdn.com/image/fetch/$s_!NNHx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 1272w, https://substackcdn.com/image/fetch/$s_!NNHx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d29555-37cb-44ed-b012-3fb4d5d73fb6_2528x896.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Solo Devs: Shipping at &#8220;inference speed&#8221;</strong></h2><p><strong>Solo developers increasingly &#8220;trust the vibe&#8221; of AI-generated code - shipping features rapidly by reviewing only the key parts and relying on tests to catch issues.</strong></p><p>This workflow treats coding agents as powerful interns that can handle massive refactors largely on their own. As<a href="https://blog.kilo.ai/p/senior-engineers-use-ai-now"> Peter Steinberger admits</a>: <em>&#8220;I don&#8217;t read much code anymore. I watch the stream and sometimes look at key parts, but most code I don&#8217;t read.&#8221;</em> The bottleneck becomes<a href="https://steipete.me/posts/2025/shipping-at-inference-speed"> inference time</a> - waiting for the AI to generate output - not typing.</p><p><strong>There&#8217;s a catch: perceived speed gains vanish without strong testing practices.</strong> Build those first. If you skip review, you don&#8217;t eliminate work - you defer it. The developers who succeed with AI at high velocity aren&#8217;t the ones who blindly trust it; they&#8217;re the ones who&#8217;ve built verification systems that catch issues before they reach production.</p><p>That isn&#8217;t to say solos throw caution to the wind. The responsible ones employ <strong>extensive automated testing as a safety net</strong> - aiming for high coverage (often &gt;70%) and using AI to generate tests that catch bugs in real-time. Modern coding agents are surprisingly good at designing sophisticated end-to-end tests.</p><p><strong>For solos, the game-changer is language-independent, data-driven tests.</strong> If comprehensive, they let an agent build (or fix) implementations in any language, verifying as it goes. I start projects with a spec.md the AI drafts, approve it, then loop: write &#8594; test &#8594; fix.</p><p>Crucially, solo coders still do <strong>manual testing and critical reasoning</strong> on the final product. Run the application, click through the UI, use the feature yourself. When higher stakes are involved, read more code and add extra checks. And despite moving fast, fix ugly code when you see it rather than letting the mess accumulate.</p><p>Even in this bleeding-edge paradigm:<a href="https://simonwillison.net/2025/Dec/18/code-proven-to-work/"> </a><em><a href="https://simonwillison.net/2025/Dec/18/code-proven-to-work/">your job is to deliver code you have proven to work</a>.</em></p><h2><strong>Teams: AI shifts review bottlenecks</strong></h2><p><strong>In team settings, AI is a powerful assistant for code review, but </strong><em><strong>cannot replace</strong></em><strong> the human judgment needed for quality, security, and maintainability.</strong></p><p>When multiple engineers collaborate, the cost of mistakes and longevity of code are much higher concerns. Teams have started using AI-based review bots for an initial pass on PRs, but they still require a human to sign off. As<a href="https://devclass.com/2025/03/19/graphite-debuts-diamond-ai-code-reviewer-insists-ai-will-never-replace-human-code-review/"> Greg Foster of Graphite</a> puts it: <em>&#8220;I don&#8217;t ever see [AI agents] becoming a stand-in for an actual human engineer signing off on a pull request.&#8221;</em></p><p><strong>The biggest practical problem isn&#8217;t that AI reviewers miss style issues - it&#8217;s that AI increases volume and shifts the burden onto humans.</strong><a href="https://jellyfish.co/blog/ai-assisted-pull-requests-are-18-larger/"> PRs are getting larger</a> (~18% more additions as AI adoption increases),<a href="https://www.cortex.io/post/ai-is-making-engineering-faster-but-not-better-state-of-ai-benchmark-2026"> incidents per PR are up ~24%, and change failure rates up ~30%</a>. When output increases faster than verification capacity, review becomes the rate limiter. As Foster notes: <em>&#8220;If we&#8217;re shipping code that&#8217;s never actually read or understood by a fellow human, we&#8217;re running a huge risk.&#8221;</em></p><p>In teams, AI floods volume, so enforce incrementalism: break agent output into digestible commits. Human sign-off isn&#8217;t going away - it&#8217;s evolving to focus on what AI misses, like roadmap alignment and institutional context that AI can&#8217;t grasp.</p><h3><strong>Security: AI&#8217;s predictable weaknesses</strong></h3><p><strong>One area where human oversight is absolutely non-negotiable is security.</strong><a href="https://www.veracode.com/blog/ai-generated-code-security-risks/"> Approximately 45% of AI-generated code contains security flaws</a>.<a href="https://dl.acm.org/doi/10.1145/3716848"> Logic errors appear at 1.75&#215; the rate of human-written code, and XSS vulnerabilities occur at 2.74&#215; higher frequency</a>.</p><p>Beyond code issues,<a href="https://www.tomshardware.com/tech-industry/cyber-security/researchers-uncover-critical-ai-ide-flaws-exposing-developers-to-data-theft-and-rce"> agentic tooling and AI-integrated IDEs have created new attack paths</a> - prompt injection, data exfiltration, even RCE vulnerabilities. AI expands attack surfaces, so hybrid approaches win: AI flags, humans verify.</p><p><strong>Rule: If code touches auth, payments, secrets, or untrusted input, treat AI as a high-speed intern and require a human threat model review plus a security tool pass before merge.</strong></p><h3><strong>Review as knowledge transfer</strong></h3><p><strong>Code review is also how teams share system context. If AI writes the code and nobody can explain it, on-call becomes expensive.</strong></p><p>When a developer submits AI-generated code they don&#8217;t fully understand, they&#8217;re breaking the knowledge transfer mechanism that makes teams resilient. If the original author can&#8217;t explain why the code works, how will the on-call engineer debug it at 2 AM?</p><p><a href="https://devclass.com/2025/11/27/ocaml-maintainers-reject-massive-ai-generated-pull-request/">The OCaml maintainers&#8217; rejection of a 13,000-line AI-generated PR</a> crystallizes this issue. The code wasn&#8217;t necessarily bad, but no one had bandwidth to review such a huge change, and reviewing AI-generated code is <em>&#8220;more taxing&#8221;</em> than reviewing human code. The lesson: <strong>AI can flood you with code, but teams must manage volume to avoid a review bottleneck.</strong></p><h3><strong>Making AI review tools work</strong></h3><p>User experiences with AI review tools are decidedly mixed. On the positive side, teams report catching 95%+ of bugs in some cases - null pointer exceptions, missing test coverage, anti-patterns. On the negative side, some developers dismiss AI review comments as &#8220;text noise&#8221; - generic observations that add no value.</p><p><strong>The lesson: AI review tools require thoughtful configuration.</strong> Tune sensitivity levels, disable unhelpful comment types, and establish clear opt-in/opt-out policies. Properly configured,<a href="https://graphite.dev/guides/what-is-ai-code-review"> AI reviewers can catch 70-80% of low-hanging fruit</a>, freeing humans to focus on architecture and business logic.</p><p>Many teams encourage <strong>smaller, stackable pull requests</strong> even if AI could do a giant change all at once.<a href="https://medium.com/@addyosmani/my-llm-coding-workflow-going-into-2026-52fe1681325e"> Commit early and often</a> - treat each self-contained change as a separate commit/PR with clear messages.</p><p>Importantly, <strong>teams maintain a hard line of human accountability.</strong> No matter how much AI contributed, a human must take responsibility. As an old IBM training saying goes: <em>&#8220;A computer can never be held accountable. That&#8217;s your job as the human in the loop.&#8221;</em></p><h2><strong>The PR Contract: What authors owe reviewers</strong></h2><p><strong>Whether solo or in a team, the emerging best practice is to<a href="https://addyo.substack.com/p/treat-ai-generated-code-as-a-draft"> treat AI-generated code as a helpful draft</a> that </strong><em><strong>must</strong></em><strong> be verified.</strong></p><p>The most successful teams have converged on a simple framework:</p><h3><strong>PR Contract</strong></h3><ol><li><p><strong>What/why</strong>: Intent in 1-2 sentences.</p></li><li><p><strong>Proof it works</strong>: Tests passed, manual steps (screenshots/logs).</p></li><li><p><strong>Risk + AI role</strong>: Tier and which parts were AI-generated (e.g., high=payments).</p></li><li><p><strong>Review focus</strong>: 1-2 areas for human input (e.g., architecture).</p></li></ol><p>This isn&#8217;t bureaucracy - it&#8217;s respect for reviewer time and a forcing function for author accountability. If you can&#8217;t fill this out, you don&#8217;t understand your own change well enough to ask someone else to approve it.</p><h3><strong>Core Principles</strong></h3><p><strong>Insist on proof, not promises.</strong> Make &#8220;working code&#8221; the baseline. Prompt AI agents to execute code or run unit tests after generation. Demand evidence: logs, screenshots, results. <strong>No PR goes up without either new tests or a demo of the change working.</strong></p><p><strong>Use AI as first-pass reviewer, not final arbiter.</strong> Treat AI review output as advisory - a dialog where one AI writes code, another reviews it, and the human orchestrates fixes. Think of AI reviews as spellcheck, not an editor.</p><p><strong>Focus human review on what AI misses.</strong> Does the change introduce a security hole? Does it duplicate existing code (a common AI flaw)? Is the approach maintainable? <strong>AI triages the easy stuff; humans tackle the hard stuff.</strong></p><p><strong>Enforce incremental development.</strong> Break work into small pieces - easier for AI to produce and for humans to review. Small commits with clear messages serve as checkpoints. <strong>Never commit code you can&#8217;t explain.</strong></p><p><strong>Maintain high testing standards.</strong><a href="https://medium.com/@addyosmani/my-llm-coding-workflow-going-into-2026-52fe1681325e"> Those who get the most out of coding agents</a> have strong testing practices. Ask AI to draft tests - it&#8217;s good at generating edge-case tests you might not think of.</p><h2><strong>Looking Ahead: The bottleneck has moved</strong></h2><p><strong>AI is transforming code review from line-by-line gatekeeping into higher-level quality control - but human judgment remains the safety-critical component.</strong></p><p>What we&#8217;re seeing is workflow evolution, not elimination. Code reviews now involve reviewing a <em>conversation</em> or <em>plan</em> between AI and author as much as the code diff itself. The human reviewer&#8217;s role becomes more like an editor or architect: focusing on what&#8217;s important and trusting automation for mundane checks.</p><p>For solo developers, the path ahead is exhilarating - new tools will further streamline development. Even then, the wise developer will &#8220;trust but verify.&#8221;</p><p>In larger teams, expect growing emphasis on AI governance. Companies will formalize policies about AI contributions, requiring sign-offs that code was reviewed by an employee. Roles like &#8220;AI code auditor&#8221; will emerge. Enterprise platforms will evolve to offer better multi-repository context and custom policy enforcement.</p><p><strong>No matter the advances, the core principle remains</strong>: code review ensures software meets requirements, is secure, robust, and maintainable. AI doesn&#8217;t change those fundamentals - it just changes how we get there.</p><p>The bottleneck moved from writing code to proving it works. The best code reviewers in the age of AI will embrace this shift - letting AI accelerate the mechanical work while holding the line on accountability. They&#8217;ll let AI <strong>accelerate</strong> the process, never <strong>abdicate</strong> it. As engineers are learning, it&#8217;s about<a href="https://blog.kilo.ai/p/senior-engineers-use-ai-now"> </a><em><a href="https://blog.kilo.ai/p/senior-engineers-use-ai-now">&#8220;proof over vibes&#8221;</a></em> in coding.</p><p>Code review isn&#8217;t dead but it&#8217;s becoming more <strong>strategic</strong>. And whether you&#8217;re a solo hacker deploying at 2 AM or a team lead signing off a critical system change, one truth holds: the <em>human</em> is ultimately responsible for what the AI delivers.</p><p>Embrace the AI, but never forget to <strong>double-check the work.</strong></p><div><hr></div><p><em>I&#8217;m excited to share I&#8217;ve released a new<a href="https://beyond.addy.ie/"> AI-assisted engineering book</a> with O&#8217;Reilly. There are a number of free tips on the book site in case interested.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ggv3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ggv3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!ggv3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!ggv3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!ggv3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ggv3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e894c835-4aef-4558-b047-83acb3be2053_7838x7838.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12838196,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/183169342?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ggv3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!ggv3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!ggv3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!ggv3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894c835-4aef-4558-b047-83acb3be2053_7838x7838.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[How Good Is AI at Coding React (Really)?]]></title><description><![CDATA[A data-driven look at what AI can and can&#8217;t do for React developers - and what you can do about it]]></description><link>https://addyo.substack.com/p/how-good-is-ai-at-coding-react-really</link><guid isPermaLink="false">https://addyo.substack.com/p/how-good-is-ai-at-coding-react-really</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Mon, 29 Dec 2025 15:31:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4Pug!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>tl;dr: AI coding benchmarks show models excel at isolated React tasks like scaffolding components or implementing explicit specs, achieving ~40% success in benchmarks, but drops to ~25% on multi-step integrations due to a &#8220;complexity cliff&#8221; in state management and design taste. The gap between &#8220;AI helped me ship&#8221; and &#8220;AI gave me a mess&#8221; is context engineering and explicit constraints. Deep React and domain knowledge enable you to spot when AI goes off the rails and understand </strong><em><strong>why</strong></em><strong> it repeats mistakes. Guide it without blindly accepting the output.</strong></p><p>This article is based on my closing keynote at React Summit by <a href="https://gitnation.com/contents/how-good-is-ai-at-coding-react-really">GitNation</a> (video).</p><div id="youtube2-jgAqpk3ZI6E" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;jgAqpk3ZI6E&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/jgAqpk3ZI6E?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><div><hr></div><h2><strong>The Question everyone&#8217;s asking (but nobody&#8217;s answering well)</strong></h2><p>Let me be direct: most conversations about AI and coding are stuck on vibes. Either AI is magic that will replace us all, or it&#8217;s garbage that can&#8217;t do anything useful, or it&#8217;s perpetually &#8220;one prompt away&#8221; from shipping production code. All three takes miss what&#8217;s actually interesting.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Pug!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Pug!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 424w, https://substackcdn.com/image/fetch/$s_!4Pug!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 848w, https://substackcdn.com/image/fetch/$s_!4Pug!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 1272w, https://substackcdn.com/image/fetch/$s_!4Pug!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Pug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/daf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1438853,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Pug!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 424w, https://substackcdn.com/image/fetch/$s_!4Pug!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 848w, https://substackcdn.com/image/fetch/$s_!4Pug!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 1272w, https://substackcdn.com/image/fetch/$s_!4Pug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdaf33944-9ed1-4ea4-9531-0b9d47869b1c_1880x1046.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After spending a year analyzing benchmarks, building with these tools at Google, and watching React developers struggle (and succeed) with AI assistants, here&#8217;s what I&#8217;ve learned: AI is <em>already</em> useful for React developers, but its usefulness is extremely uneven. The unevenness is predictable if you know what to look for.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9LSy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9LSy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 424w, https://substackcdn.com/image/fetch/$s_!9LSy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 848w, https://substackcdn.com/image/fetch/$s_!9LSy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!9LSy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9LSy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:553092,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9LSy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 424w, https://substackcdn.com/image/fetch/$s_!9LSy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 848w, https://substackcdn.com/image/fetch/$s_!9LSy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!9LSy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33d19e66-2f44-4dfb-b88c-bec57fb63470_1886x1054.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>More importantly - and this is the part most articles skip - you have far more control over outcomes than you think.</p><h2><strong>The Core Thesis: Two sides of the same coin</strong></h2><p>This article covers two critical angles:</p><p><strong>What the data tells us:</strong> Benchmarks like <a href="https://designarena.ai/">Design Arena</a>, <a href="https://lmarena.ai/leaderboard/webdev">Web Dev Arena</a>, <a href="https://openai.com/index/introducing-swe-bench-verified/">SWE-Bench</a>, and <a href="https://huggingface.co/spaces/bytedance-research/Web-Bench-Leaderboard">Web-Bench</a> reveal clear patterns about where AI excels (isolated components, scaffolding, implementing explicit requirements) and where it struggles (multi-step integration, design taste, complex state management). Understanding these patterns means you can predict what will work before you waste time.</p><p><strong>What you can control:</strong> The difference between &#8220;AI helped me ship&#8221; and &#8220;AI gave me a mess to untangle&#8221; almost never comes down to just model selection. It comes down to context engineering, prompt specificity, workflow structure, and guardrails. These are all in your power to fix.</p><p>Let&#8217;s start with the foundation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mv8Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mv8Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 424w, https://substackcdn.com/image/fetch/$s_!mv8Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 848w, https://substackcdn.com/image/fetch/$s_!mv8Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!mv8Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mv8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:540281,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mv8Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 424w, https://substackcdn.com/image/fetch/$s_!mv8Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 848w, https://substackcdn.com/image/fetch/$s_!mv8Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!mv8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f84c291-2bb3-448e-a0fe-09ca43f307a4_1878x1044.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>90% of developers use AI for coding in some way. In an AI-assisted world, <strong>the value of frameworks like React depends on how effectively AI can use them.</strong> If AI can&#8217;t handle a framework well, that means the <strong>quality of experiences </strong>you can build without a lot of manual work can be limited.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6tIw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6tIw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 424w, https://substackcdn.com/image/fetch/$s_!6tIw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 848w, https://substackcdn.com/image/fetch/$s_!6tIw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!6tIw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6tIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:605984,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6tIw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 424w, https://substackcdn.com/image/fetch/$s_!6tIw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 848w, https://substackcdn.com/image/fetch/$s_!6tIw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!6tIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F231f9c94-674d-4c2e-a1b8-d5b8aeeaa4e5_1884x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A lot of AI code quality comes down to context. <strong>You want to squeeze the most value out of the token budget in your context window</strong>. But the model, the tools that sit on top of it all of these layers play an important role.</p><h2><strong>AI Changes what is easy, not what is true</strong></h2><p>AI is a force multiplier. It amplifies everything: good requirements, good architecture, good taste. It also amplifies the bad: vague specs, messy state, and the temptation to ship something you haven&#8217;t really understood. Give it a weak brief, and it will happily hand you a 10,000-line maze you&#8217;ll later delete.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!--eg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!--eg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 424w, https://substackcdn.com/image/fetch/$s_!--eg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 848w, https://substackcdn.com/image/fetch/$s_!--eg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!--eg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!--eg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1151719,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!--eg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 424w, https://substackcdn.com/image/fetch/$s_!--eg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 848w, https://substackcdn.com/image/fetch/$s_!--eg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!--eg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c5eb255-4302-4d6c-8c2b-6090cb07cc6d_1878x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I draw a hard line between <strong><a href="https://addyo.substack.com/p/vibe-coding-is-not-the-same-as-ai">AI-assisted vibe coding</a></strong><a href="https://addyo.substack.com/p/vibe-coding-is-not-the-same-as-ai"> and </a><strong><a href="https://addyo.substack.com/p/vibe-coding-is-not-the-same-as-ai">AI-assisted engineering</a></strong>. Vibe coding is trusting high-level prompts and prioritizing speed over review. AI-assisted engineering is integrating AI inside a structured process where the human stays in control and accountable for the output.</p><p>Why does this distinction matter for React?</p><p>Because React apps aren&#8217;t just code. They&#8217;re product behavior, user experience, reliability, security, performance, accessibility, and long-term maintenance. AI can help with all of that - but only if you treat it like a teammate you&#8217;re pairing with and have <strong>oversight over</strong>, not a vending machine dispensing code.</p><h2><strong>The Monoculture problem (and opportunity)</strong></h2><p>One of the most under-discussed parts of the AI coding story: &#8220;how well AI codes&#8221; is not a universal property. It depends on what the model has seen in training, what tools it has access to, and what the ecosystem has standardized on.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d39I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d39I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 424w, https://substackcdn.com/image/fetch/$s_!d39I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 848w, https://substackcdn.com/image/fetch/$s_!d39I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!d39I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d39I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1100840,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d39I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 424w, https://substackcdn.com/image/fetch/$s_!d39I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 848w, https://substackcdn.com/image/fetch/$s_!d39I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!d39I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9dce94-8531-47b8-9cef-6171c199a55f_1878x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Large language models effectively set the ceiling for how much leverage you get out of a framework in an AI-assisted workflow. If AI struggles with a framework, you feel that as friction and quality limits.</p><p>Most AI tools converge on a stack that looks like: React, TypeScript, Tailwind, shadcn/ui. That stack dominates training data and tool optimization, so models are competent there and noticeably shakier off the beaten path.</p><p><strong>This has two implications for practicing React developers:</strong></p><ol><li><p><strong>If you&#8217;re on the mainstream stack, your &#8220;AI assistance ceiling&#8221; is higher.</strong> You&#8217;ll get better scaffolds, better component generation, and fewer hallucinated APIs.</p></li><li><p><strong>If you&#8217;re not, you need to compensate</strong> with better context, doc retrieval, and stricter constraints - or you&#8217;ll watch the model confidently build an alternate universe version of your app.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Iedm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Iedm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 424w, https://substackcdn.com/image/fetch/$s_!Iedm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 848w, https://substackcdn.com/image/fetch/$s_!Iedm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!Iedm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Iedm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png" width="1456" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1124916,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Iedm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 424w, https://substackcdn.com/image/fetch/$s_!Iedm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 848w, https://substackcdn.com/image/fetch/$s_!Iedm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!Iedm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c6d29-4dea-4be6-923b-f10ab41afee5_1880x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There&#8217;s also a second-order effect: monoculture can slow innovation. React skills are likely to stay very relevant (comforting), but it also means newer frameworks or alternative patterns can face headwinds until models and tools catch up.</p><p>The good news? If a framework gains traction, AI makers will fine-tune models on it. Docs MCPs can bridge gaps in the interim. But short-term, React&#8217;s position is extremely strong because AI &#8220;knows&#8221; it best.</p><div><hr></div><h2><strong>The Big reality check: The complexity cliff</strong></h2><p>If you remember nothing else from this article, remember this: <strong>AI handles simple tasks well and then falls off a cliff as complexity rises.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OchZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OchZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 424w, https://substackcdn.com/image/fetch/$s_!OchZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 848w, https://substackcdn.com/image/fetch/$s_!OchZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!OchZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OchZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png" width="1456" height="821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:513018,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OchZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 424w, https://substackcdn.com/image/fetch/$s_!OchZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 848w, https://substackcdn.com/image/fetch/$s_!OchZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!OchZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c900599-9f98-4fed-836c-bdadea0d05e3_1872x1056.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A form component, a utility function, a small isolated widget: great. Multi-step work across a real codebase: much less reliable.</p><p>I used a mix of <strong>objective</strong> and <strong>human-rated benchmarks</strong> to show this pattern. We need both, because pass/fail benchmarks tell you &#8220;can it solve the issue,&#8221; while human-rated arenas tell you something equally important for frontend work: &#8220;do humans actually want to use what it builds.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gofe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gofe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 424w, https://substackcdn.com/image/fetch/$s_!Gofe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 848w, https://substackcdn.com/image/fetch/$s_!Gofe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!Gofe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gofe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:841476,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gofe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 424w, https://substackcdn.com/image/fetch/$s_!Gofe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 848w, https://substackcdn.com/image/fetch/$s_!Gofe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!Gofe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fd3c45e-20ec-4b3f-abc4-3ba98b9941a5_1882x1056.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>What The Numbers Show</strong></h3><p><strong>On objective benchmarks, the complexity cliff is visible:</strong></p><ul><li><p><strong>Next.js eval tasks:</strong> Best models around 42% success - roughly 21 of 50 tasks completed. Even on framework-specific challenges, failures are common.</p></li><li><p><strong>Web-Bench multi-step full-stack tasks:</strong> Around 25% tasks solved. Many failures as steps chain together.</p></li><li><p><strong>SWE-Bench Pro:</strong> Around 20-43% on the Pro public set, versus jumping to over 70% on SWE-Bench Verified. Increasing complexity collapses performance.</p></li></ul><p>The gap between benchmark performance and your real codebase is the important thing to calibrate to.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!31x7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!31x7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 424w, https://substackcdn.com/image/fetch/$s_!31x7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 848w, https://substackcdn.com/image/fetch/$s_!31x7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!31x7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!31x7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:358686,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!31x7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 424w, https://substackcdn.com/image/fetch/$s_!31x7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 848w, https://substackcdn.com/image/fetch/$s_!31x7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!31x7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1842ca46-0566-4117-9ae4-8dbd7e30071c_1888x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uI5j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uI5j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 424w, https://substackcdn.com/image/fetch/$s_!uI5j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 848w, https://substackcdn.com/image/fetch/$s_!uI5j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!uI5j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uI5j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:405164,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uI5j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 424w, https://substackcdn.com/image/fetch/$s_!uI5j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 848w, https://substackcdn.com/image/fetch/$s_!uI5j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!uI5j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0098d15d-8b6a-43ea-a42c-87d6e25d8119_1880x1056.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>My practical translation of the complexity cliff for React developers:</strong></p><ul><li><p>AI is <strong>great</strong> at first drafts</p></li><li><p>AI is <strong>mediocre</strong> at integration</p></li><li><p>AI is <strong>unreliable</strong> at long multi-step changes unless you give it strong tooling and context</p></li><li><p>AI gets you to &#8220;it works&#8221; faster than it gets you to &#8220;it&#8217;s a codebase I want to own&#8221;</p></li></ul><div><hr></div><h2><strong>Design Arena and Web Dev Arena: Where React developers should pay attention</strong></h2><p>React developers spend a lot of time in the space between &#8220;the code runs&#8221; and &#8220;this is good.&#8221; That space includes UI quality, hierarchy, spacing, accessibility, and whether the end result feels intentional.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dCGE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dCGE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 424w, https://substackcdn.com/image/fetch/$s_!dCGE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 848w, https://substackcdn.com/image/fetch/$s_!dCGE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 1272w, https://substackcdn.com/image/fetch/$s_!dCGE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dCGE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png" width="1456" height="941" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:941,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:337700,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dCGE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 424w, https://substackcdn.com/image/fetch/$s_!dCGE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 848w, https://substackcdn.com/image/fetch/$s_!dCGE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 1272w, https://substackcdn.com/image/fetch/$s_!dCGE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dfa85d9-93db-47ee-9ef5-cec9d0f1d6a9_2294x1482.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Design Arena is interesting because it&#8217;s explicitly human preference&#8211;driven. Here&#8217;s how it works:</p><ul><li><p>Users come to the platform to explore and use the best AI-powered tools, like website generation and game generation</p></li><li><p>Design Arena presents multiple versions of the same experience (a website, agent, or builder), and users can save their favorite</p></li><li><p>Rankings emerge from these aggregated choices across categories like website generation, agents, and builders, reflecting real usage preferences rather than curated rubrics</p></li><li><p>Elo-style scores are calculated using a Bradley&#8211;Terry model, with models below a minimum comparison threshold filtered out.</p></li><li><p>The leaderboard updates live (every three hours, per their methodology), powered by interactions from over 850,000 users across 145 countries.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qEMX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qEMX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 424w, https://substackcdn.com/image/fetch/$s_!qEMX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 848w, https://substackcdn.com/image/fetch/$s_!qEMX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!qEMX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qEMX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:884780,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qEMX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 424w, https://substackcdn.com/image/fetch/$s_!qEMX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 848w, https://substackcdn.com/image/fetch/$s_!qEMX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!qEMX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8de171-6e05-437b-b8ff-eedd03ea2b15_1882x1058.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><em>By using a pairwise comparison system, DesignArena generates leaderboards that rank AI models based on human preferences, helping to measure and drive improvements in design quality, usability, and aesthetics.</em></p><p>Similarly, <a href="http://Design Arena">Web Dev Arena</a> is an open-source benchmarking platform from LMArena designed to evaluate LLMs based on their capability to build functional, interactive web applications. Users submit a prompt and compare anonymous AI models generating code side-by-side, contributing to a community-driven Elo leaderboard that ranks top models for complex web development tasks. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oS8c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oS8c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 424w, https://substackcdn.com/image/fetch/$s_!oS8c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 848w, https://substackcdn.com/image/fetch/$s_!oS8c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 1272w, https://substackcdn.com/image/fetch/$s_!oS8c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oS8c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png" width="1456" height="727" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:727,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:480229,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oS8c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 424w, https://substackcdn.com/image/fetch/$s_!oS8c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 848w, https://substackcdn.com/image/fetch/$s_!oS8c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 1272w, https://substackcdn.com/image/fetch/$s_!oS8c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bb5604d-1a65-4da2-8988-5739c3953e23_3012x1504.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So what do React developers learn from such arenas?</p><h3><strong>The Core Finding: AI has mastered logic, but not taste</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SbhK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SbhK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 424w, https://substackcdn.com/image/fetch/$s_!SbhK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 848w, https://substackcdn.com/image/fetch/$s_!SbhK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!SbhK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SbhK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:611725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SbhK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 424w, https://substackcdn.com/image/fetch/$s_!SbhK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 848w, https://substackcdn.com/image/fetch/$s_!SbhK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!SbhK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db0f7bd-5c10-44f6-9b76-93bae9c1e1e1_1882x1056.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This slide is the thesis of the whole talk. Models can solve hard reasoning problems and still produce UIs with basic design failures: off color choices, inconsistent spacing, weak hierarchy.</p><p>I call this the <strong>capability divide:</strong></p><ul><li><p>AI is <strong>strong</strong> at logic, data flow, and implementing explicit requirements</p></li><li><p>AI is <strong>weak</strong> at taste, usability awareness, and aesthetic judgment</p></li></ul><p><strong>If you&#8217;re a React developer, this should change how you delegate:</strong></p><ul><li><p>Delegate boilerplate and mechanical implementation</p></li><li><p>Keep design intent, API design, and architecture decisions human-led</p></li><li><p>Treat &#8220;pretty&#8221; as an explicit requirement, not a default outcome</p></li></ul><h3><strong>The Surprise: Tools and scaffolding matter more than you think</strong></h3><p>Design Arena also found something counterintuitive: <strong>general agents are more variable than specialists</strong>, and the scaffolding and workflow around the base model drives a lot of the performance spread.</p><p>Put differently: two products can wrap the same base model and feel wildly different because of tooling, retrieval, iteration loops, and guardrails.</p><p>This is great news, because it means <strong>you have leverage even when you don&#8217;t control the base model.</strong></p><div><hr></div><h2><strong>Arena by Arena: What React developers should steal from the data</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4uWz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4uWz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 424w, https://substackcdn.com/image/fetch/$s_!4uWz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 848w, https://substackcdn.com/image/fetch/$s_!4uWz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!4uWz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4uWz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:358564,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4uWz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 424w, https://substackcdn.com/image/fetch/$s_!4uWz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 848w, https://substackcdn.com/image/fetch/$s_!4uWz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!4uWz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc722a14-d454-4ea6-b7cb-44a71435f883_1874x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let me walk through five arenas and extract the practical lessons for each one.</p><h3><strong>1. Website Arena: Prompt to website (and why Purple keeps happening)</strong></h3><p>The Website Arena measures how well models generate complete single-page sites from a prompt, with instructions to add modern UI/UX practices, accessibility, and responsive design.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oRLU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oRLU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 424w, https://substackcdn.com/image/fetch/$s_!oRLU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 848w, https://substackcdn.com/image/fetch/$s_!oRLU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!oRLU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oRLU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png" width="1456" height="823" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:823,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:439242,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oRLU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 424w, https://substackcdn.com/image/fetch/$s_!oRLU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 848w, https://substackcdn.com/image/fetch/$s_!oRLU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!oRLU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7854a835-eee7-4830-ad6a-a4fa28576d26_1882x1064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The important nuance is how winners tend to win: it&#8217;s not always the flashiest layout, it&#8217;s often <strong>the most coherent and complete page.</strong></p><p>If your goal is something shippable, bias your prompts toward coherence and structure, not 'make it look cool.</p><h4><strong>Why is there so much purple?</strong></h4><p>I joked about this in the talk because once you see it, you can&#8217;t unsee it: models converge on safe, generic design patterns, and &#8220;purple gradient plus glassmorphism&#8221; is one of those defaults.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7cgp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7cgp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 424w, https://substackcdn.com/image/fetch/$s_!7cgp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 848w, https://substackcdn.com/image/fetch/$s_!7cgp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 1272w, https://substackcdn.com/image/fetch/$s_!7cgp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7cgp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:340995,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7cgp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 424w, https://substackcdn.com/image/fetch/$s_!7cgp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 848w, https://substackcdn.com/image/fetch/$s_!7cgp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 1272w, https://substackcdn.com/image/fetch/$s_!7cgp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036a6ef-4aa0-4f8e-a63d-9cd7e33f0cd2_1874x1046.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That&#8217;s not just a meme. It&#8217;s <strong>distributional convergence</strong>: under uncertainty, models gravitate toward common patterns in the data.</p><h4><strong>How do you fix it?</strong></h4><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vuc-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vuc-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 424w, https://substackcdn.com/image/fetch/$s_!vuc-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 848w, https://substackcdn.com/image/fetch/$s_!vuc-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!vuc-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vuc-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:846531,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!vuc-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 424w, https://substackcdn.com/image/fetch/$s_!vuc-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 848w, https://substackcdn.com/image/fetch/$s_!vuc-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!vuc-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b8dcfd-e8a0-4293-afef-771b1a00755c_1886x1060.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One approach is tooling rather than &#8220;better prompts forever.&#8221; Anthropic pushed some of this into Skills (markdown files that Claude reads on demand) instead of trying to brute force it through training. Their <a href="https://github.com/anthropics/claude-code/blob/main/plugins/frontend-design/skills/frontend-design/SKILL.md">frontend-design skill</a> is worth checking out.</p><p>Even if you never use Claude Skills specifically, the lesson is broader:</p><ul><li><p>Some failures are better solved by scaffolding and constraints than by model selection</p></li><li><p>You want repeatable, shareable &#8220;taste primitives&#8221; that don&#8217;t require rewriting your entire prompt every time</p></li></ul><h4><strong>My website generation checklist for React teams</strong></h4><p><strong>What I ask for up front:</strong></p><ul><li><p><strong>Anchor the layout first:</strong> Specify the page sections you want before code</p></li><li><p><strong>Specify stack and routing:</strong> Call out Next App Router, file names, and RSC vs client components so it doesn&#8217;t invent structure</p></li><li><p><strong>Describe content density:</strong> Minimal landing page vs long-form so spacing doesn&#8217;t default to sludge</p></li><li><p><strong>Ask for responsive constraints:</strong> Breakpoints and collapse behavior</p></li><li><p><strong>Bake in accessibility:</strong> Semantic landmarks, skip links, labels, safe contrast</p></li><li><p><strong>Convert HTML to real React files:</strong> Map sections to components and wire them up in page.tsx</p></li></ul><p><strong>What I do after generation, before trusting it:</strong></p><ul><li><p><strong>Strip inline scripts</strong> and move DOM logic into client components with hooks and typed props</p></li><li><p><strong>Normalize layout primitives</strong> and refactor div soup into your real Shell, Container, Stack components</p></li><li><p><strong>Run a11y and perf checks:</strong> Lint, Lighthouse, and add tests for critical flows</p></li><li><p><strong>Freeze the visual system:</strong> Snap palette, spacing, typography into Tailwind config or tokens</p></li><li><p><strong>Keep the model on a leash:</strong> Use it for slices and variants, not wholesale rewrites of a tuned page</p></li></ul><p><strong>Single sentence summary:</strong> Be radically explicit in your instructions, and enforce your design system and coding standards so the model can&#8217;t drift.</p><p><strong>Poor prompt:</strong></p><pre><code><code>Make a landing page for a SaaS product</code></code></pre><p><strong>Strong prompt:</strong></p><pre><code><code>Create a Next.js App Router landing page (app/page.tsx) for a developer tools SaaS:

Layout sections:
1. Hero with headline, subheadline, CTA
2. Features (3 columns, icon + title + description each)
3. Social proof (logos grid)
4. CTA

Stack: Next.js 15, TypeScript, Tailwind
Density: Spacious landing page (not cramped)
Colors: Avoid purple/pink gradients - use neutral gray with blue accent
Responsive: Stack features vertically below 768px

Accessibility:
- Semantic HTML (header, main, section)
- Alt text for all images
- Sufficient color contrast (WCAG AA)</code></code></pre><div><hr></div><h3><strong>2. Agent Arena: Most failures are context failures now</strong></h3><p>The Agent Arena is a step up: multi-step tasks like writing code, fixing bugs, running tests, running browsers, debugging. This is where &#8220;agent loops&#8221; show up.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-pUd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-pUd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 424w, https://substackcdn.com/image/fetch/$s_!-pUd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 848w, https://substackcdn.com/image/fetch/$s_!-pUd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!-pUd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-pUd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png" width="1456" height="815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:374430,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-pUd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 424w, https://substackcdn.com/image/fetch/$s_!-pUd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 848w, https://substackcdn.com/image/fetch/$s_!-pUd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!-pUd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87110bdc-9ffa-4bba-aa17-4dd41f10d7ee_1884x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here&#8217;s the biggest trap: when agents fail, it often looks like &#8220;the model is dumb.&#8221; Increasingly, that&#8217;s not true.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a_vj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a_vj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 424w, https://substackcdn.com/image/fetch/$s_!a_vj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 848w, https://substackcdn.com/image/fetch/$s_!a_vj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!a_vj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a_vj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:372295,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a_vj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 424w, https://substackcdn.com/image/fetch/$s_!a_vj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 848w, https://substackcdn.com/image/fetch/$s_!a_vj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!a_vj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f3b6d65-19dc-4171-a84d-e27daa2bfcfd_1872x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Most agent failures are context failures.</strong> If the agent doesn&#8217;t see the right logs, tests, or constraints, it makes confident but wrong changes. Fixing context is often higher leverage than switching models.</p><p>I also called out something that will resonate if you&#8217;ve ever spent time tweaking prompts: <strong>prompt engineering failures often come from context mismanagement</strong>, not &#8220;the wrong magic words.&#8221;</p><h4><strong>How I run agents like a React team lead</strong></h4><p><strong>Treat agents like a junior hire:</strong></p><ul><li><p>Give a written task brief, acceptance criteria, and constraints</p></li><li><p>Declare the sandbox: disposable branch, test DB, temporary env vars</p></li><li><p>Ask for a plan first: files it will touch, tools it will call, risks it sees</p></li><li><p>Cap blast radius: constrain write access to app, src, config</p></li><li><p>Require tests as part of fixes: reproduce bug first, then patch</p></li><li><p>Force small PRs: reviewable commits, not a mega diff</p></li></ul><p><strong>Then add operational guardrails:</strong></p><ul><li><p>Point them at logs and monitors: build logs, Sentry traces, Playwright failures</p></li><li><p>Snap to house style: ESLint config, prettier rules, naming conventions</p></li><li><p>Disable auto-merge and require human approval for agent changes</p></li></ul><p>That&#8217;s the difference between &#8220;agentic coding&#8221; and &#8220;outsourcing your codebase to a stochastic parrot.&#8221;</p><p><strong>Poor prompt:</strong></p><pre><code><code>Fix the bug in the checkout flow</code></code></pre><p><strong>Strong prompt:</strong></p><pre><code><code>Task: Fix abandoned cart bug in checkout

Context:
- File: app/checkout/page.tsx
- Error: Cart resets on page refresh
- Expected: Cart persists via localStorage
- Test: Run `npm test checkout.test.tsx` to verify

Plan required before implementation:
1. Identify where cart state is managed
2. Add localStorage persistence
3. Add hydration logic
4. Update tests
5. Verify in Playwright

Constraints:
- Only modify app/checkout/* and lib/cart.ts
- Maintain existing TypeScript types
- Follow our ESLint rules</code></code></pre><div><hr></div><h3><strong>3. Context Engineering: The highest leverage skill for agentic React</strong></h3><p>If context is the bottleneck, then <strong>context engineering</strong> is the discipline.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GJFk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GJFk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 424w, https://substackcdn.com/image/fetch/$s_!GJFk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 848w, https://substackcdn.com/image/fetch/$s_!GJFk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!GJFk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GJFk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:450343,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GJFk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 424w, https://substackcdn.com/image/fetch/$s_!GJFk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 848w, https://substackcdn.com/image/fetch/$s_!GJFk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!GJFk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3af5800-ea23-4d90-8117-3d88e3f6142c_1886x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the talk I described context engineering as the art and science of filling the context window with just the right information to guide the agent&#8217;s performance. It&#8217;s more than clever prompting.</p><p><strong>Two specific tips I want most React developers to internalize:</strong></p><ol><li><p><strong>Visual context is powerful.</strong> Screenshots can enable one-shot solutions for UI bugs or design tasks.</p></li><li><p><strong>Structure beats volume.</strong> Unstructured dumps confuse the model, competing information distracts it, and overload overwhelms it.</p></li></ol><p>Under the hood, this ties back to a core principle: <strong>&#8220;Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.&#8221;</strong></p><p>Every token you waste is context you cannot spend on:</p><ul><li><p>The actual API surface you need</p></li><li><p>The architectural constraints you care about</p></li><li><p>The failing test output that would prevent a bad patch</p></li></ul><div><hr></div><h3><strong>4. Tooling: If you can&#8217;t control the base model, control the layer around It</strong></h3><p>As I said in the &#8220;mastering the tools&#8221; section: you probably don&#8217;t control the base model, but you can absolutely steer the tooling around it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jWTi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jWTi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 424w, https://substackcdn.com/image/fetch/$s_!jWTi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 848w, https://substackcdn.com/image/fetch/$s_!jWTi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!jWTi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jWTi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png" width="1456" height="807" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:807,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:365503,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jWTi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 424w, https://substackcdn.com/image/fetch/$s_!jWTi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 848w, https://substackcdn.com/image/fetch/$s_!jWTi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!jWTi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ed042fe-f7de-4dee-b591-b75ef2509300_1872x1038.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A concrete example is doc and state retrieval. Let me show you two tools that demonstrate this pattern:</p><h4><strong>Context7 MCP</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FkZp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FkZp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 424w, https://substackcdn.com/image/fetch/$s_!FkZp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 848w, https://substackcdn.com/image/fetch/$s_!FkZp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!FkZp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FkZp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:304211,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FkZp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 424w, https://substackcdn.com/image/fetch/$s_!FkZp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 848w, https://substackcdn.com/image/fetch/$s_!FkZp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!FkZp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8dc6d9b-6818-4e43-946c-3ecdf64773ad_1886x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong><a href="https://context7.com/">Context7</a></strong> pulls fresh, version-specific docs and examples from source sites and injects them into the model&#8217;s working set, reducing guessing and stale snippets. You can nudge it toward topics like routing or hooks and cap how much to bring in.</p><h4><strong>Next.js DevTools MCP</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hEVK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hEVK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 424w, https://substackcdn.com/image/fetch/$s_!hEVK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 848w, https://substackcdn.com/image/fetch/$s_!hEVK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!hEVK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hEVK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:298030,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hEVK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 424w, https://substackcdn.com/image/fetch/$s_!hEVK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 848w, https://substackcdn.com/image/fetch/$s_!hEVK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!hEVK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d50070d-ffb6-4bf8-838b-dd65d93c4f16_1882x1058.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The modern Next dev server exposes a built-in MCP endpoint. The <strong><a href="https://nextjs.org/docs/app/guides/mcp">Next.js DevTools MCP server</a></strong> connects to it so an agent can ask for real data about your running app:</p><ul><li><p>Current build or runtime errors</p></li><li><p>Routes and layouts</p></li><li><p>Component metadata</p></li><li><p>Server actions and dev logs</p></li><li><p>Playwright paths for simple browser checks</p></li></ul><p>It also ships with a Next-specific knowledge base and helpers for common tasks like upgrades.</p><h4><strong>Chrome DevTools MCP</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hkSK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hkSK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 424w, https://substackcdn.com/image/fetch/$s_!hkSK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 848w, https://substackcdn.com/image/fetch/$s_!hkSK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!hkSK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hkSK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png" width="1456" height="812" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:457097,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hkSK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 424w, https://substackcdn.com/image/fetch/$s_!hkSK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 848w, https://substackcdn.com/image/fetch/$s_!hkSK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!hkSK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4afb67-0ecf-4177-9330-664d2be8a881_1882x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong><a href="https://github.com/ChromeDevTools/chrome-devtools-mcp">Chrome DevTools MCP</a></strong> gives the agent eyes and hands in a real browser. It can open pages, click through flows, read console and network logs, take screenshots, and record performance traces to investigate things like high LCP or blocking time. Under the hood it rides on Chrome DevTools and Puppeteer, so you get reliable automation instead of brittle scripts. Because it can see page content, you still want sensible flags and isolation from personal browsing, but treated as scoped tooling it is very powerful.</p><h4><strong>How MCPs fit together</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r5oN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r5oN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 424w, https://substackcdn.com/image/fetch/$s_!r5oN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 848w, https://substackcdn.com/image/fetch/$s_!r5oN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!r5oN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r5oN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png" width="1456" height="824" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:824,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:330312,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r5oN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 424w, https://substackcdn.com/image/fetch/$s_!r5oN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 848w, https://substackcdn.com/image/fetch/$s_!r5oN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!r5oN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbf2b9db-998e-41c8-9803-c7445bdade06_1860x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Context7 gives your assistant the right external knowledge. Next DevTools MCP gives it your app&#8217;s truth. Chrome DevTools MCP proves the result in a real browser. Used together, you turn a guessing assistant into a closed&#8209;loop coder and debugger that cites sources, places changes correctly, and verifies outcomes before you hit commit.</p><p>This is the pattern I expect more React teams to adopt: rather than hoping the model remembers today&#8217;s Next.js behavior, wire it to an always-correct source of truth.</p><div><hr></div><h3><strong>5. Builder Arena: Vibe Coding tools used responsibly</strong></h3><p>Builder tools are designed for rapid prompt-driven product creation, not just &#8220;write me a component.&#8221; They optimize for cohesion and perceived completeness.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BEzj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BEzj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 424w, https://substackcdn.com/image/fetch/$s_!BEzj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 848w, https://substackcdn.com/image/fetch/$s_!BEzj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 1272w, https://substackcdn.com/image/fetch/$s_!BEzj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BEzj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:369968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BEzj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 424w, https://substackcdn.com/image/fetch/$s_!BEzj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 848w, https://substackcdn.com/image/fetch/$s_!BEzj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 1272w, https://substackcdn.com/image/fetch/$s_!BEzj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ec6cea-d6ff-411a-9102-de485745452e_1882x1046.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Design Arena&#8217;s builder results were surprising precisely because builders are not just base models. They&#8217;re base models plus scaffolding plus UX and post-processing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!khIi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!khIi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 424w, https://substackcdn.com/image/fetch/$s_!khIi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 848w, https://substackcdn.com/image/fetch/$s_!khIi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!khIi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!khIi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:406806,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!khIi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 424w, https://substackcdn.com/image/fetch/$s_!khIi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 848w, https://substackcdn.com/image/fetch/$s_!khIi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!khIi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d42ef91-5ccb-4fbe-9f54-0b1c963cdb42_1878x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>My guidance for React developers:</strong></p><ul><li><p><strong>Use builders as idea generators.</strong> Harvest layout, copy, micro-interactions, then rebuild cleanly in your codebase</p></li><li><p><strong>Normalize APIs.</strong> Refactor generated fetch calls, hooks, stores to your patterns</p></li><li><p><strong>Consolidate CSS.</strong> Pull scattered styles into tokens and your component library to avoid spawning a second design system</p></li><li><p><strong>Archive failure cases.</strong> Save screenshots and diffs to refine prompts and tool settings over time</p></li></ul><p><strong>And if you want the &#8220;before you even start&#8221; checklist:</strong></p><ul><li><p>Start with a written product spec: features, user types, flows</p></li><li><p>Lock your design system: your existing shadcn, Radix, or in-house primitives</p></li><li><p>Describe the vibe in concrete terms: reference sites, adjectives, motion levels</p></li><li><p>Limit surface area: use builders for a single flow rather than your entire shell</p></li></ul><p>If you treat builder output as production code by default, you&#8217;ll end up maintaining a foreign codebase you never chose.</p><div><hr></div><h3><strong>6. UI Components Arena: where React developers win</strong></h3><p>The UI Components Arena is the most directly applicable to most React teams: generate isolated reusable components. Scope is focused, success rate is high, and output can be close to production-ready.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kUQw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kUQw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 424w, https://substackcdn.com/image/fetch/$s_!kUQw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 848w, https://substackcdn.com/image/fetch/$s_!kUQw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!kUQw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kUQw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:472086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kUQw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 424w, https://substackcdn.com/image/fetch/$s_!kUQw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 848w, https://substackcdn.com/image/fetch/$s_!kUQw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!kUQw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcda54739-ee8a-4dbc-b532-73d77639d270_1890x1060.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nNLz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nNLz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 424w, https://substackcdn.com/image/fetch/$s_!nNLz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 848w, https://substackcdn.com/image/fetch/$s_!nNLz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!nNLz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nNLz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png" width="1456" height="811" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:811,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:679071,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nNLz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 424w, https://substackcdn.com/image/fetch/$s_!nNLz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 848w, https://substackcdn.com/image/fetch/$s_!nNLz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!nNLz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd207b9b-7778-4047-befc-ee4a91ed9617_1888x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s also where the &#8220;logic not taste&#8221; lesson shows up cleanly: models can wire up props and state and still make ugly, inconsistent decisions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6zux!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6zux!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 424w, https://substackcdn.com/image/fetch/$s_!6zux!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 848w, https://substackcdn.com/image/fetch/$s_!6zux!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!6zux!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6zux!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:557841,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6zux!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 424w, https://substackcdn.com/image/fetch/$s_!6zux!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 848w, https://substackcdn.com/image/fetch/$s_!6zux!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!6zux!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32f037f5-be58-4556-86c3-a0d2c1dbe3ac_1888x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So I use AI here heavily, but with a specific protocol.</p><h4><strong>Start by forcing a Component contract</strong></h4><p>These are the things I want in the prompt before the model writes JSX:</p><ul><li><p>Define prop names, types, variants, and states</p></li><li><p>Provide examples and edge cases, including weird inputs</p></li><li><p>Demand accessibility by default: keyboard nav, ARIA, focus management, error messaging</p></li><li><p>Avoid anonymous div wrappers; use semantic structure where it matters</p></li><li><p>Separate styling concerns: Tailwind classes or your utility system, not inline styles</p></li><li><p>Request story files: Storybook or MDX usage examples</p></li></ul><h4><strong>Then do the React integration work AI is usually bad at</strong></h4><p>Once you have a plausible component, integrate it like a senior engineer:</p><ul><li><p><strong>Convert state to idiomatic React:</strong> Replace query selectors and global variables with hooks and props</p></li><li><p><strong>Make behavior composable:</strong> Refactor complex pieces into hooks you own</p></li><li><p><strong>Test the contract, not the implementation:</strong> Focus tests on props and events so internals can evolve</p></li><li><p><strong>Snap components into your design system</strong> before exposing them broadly</p></li></ul><p>This is the pattern I recommend to teams: let AI get you 70% of the way on structure, then deliberately take ownership of API shape, composition, and design tokens.</p><p><strong>Poor prompt:</strong></p><pre><code><code>Create a sign-up button component with different variants</code></code></pre><p><strong>Strong prompt:</strong></p><pre><code><code>Create a sign-up Button component with:

Props:
- variant: 'primary' | 'secondary' | 'ghost'
- size: 'sm' | 'md' | 'lg'
- disabled: boolean
- loading: boolean

Requirements:
- Use Tailwind classes
- Show loading spinner when loading=true
- Disable pointer events when disabled
- Support keyboard navigation (Enter/Space)
- Include focus-visible ring
- ARIA: use aria-disabled, aria-busy

Example usage:
&lt;Button variant="primary" size="md" loading={isSubmitting}&gt;
  Submit
&lt;/Button&gt;</code></code></pre><p>Then show a side-by-side of what each produces - the poor one generates inconsistent spacing, misses accessibility, uses inline styles. The strong one hits all requirements.</p><div><hr></div><h3><strong>7. 3D and Data Viz: Let AI generate assets and data, not your entire integration</strong></h3><p>The 3D and Data Viz arenas stress more structured generation tasks, relevant for interactive dashboards, WebGL, and data-heavy apps.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XKdS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XKdS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 424w, https://substackcdn.com/image/fetch/$s_!XKdS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 848w, https://substackcdn.com/image/fetch/$s_!XKdS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!XKdS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XKdS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:438825,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XKdS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 424w, https://substackcdn.com/image/fetch/$s_!XKdS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 848w, https://substackcdn.com/image/fetch/$s_!XKdS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!XKdS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ae074fe-d108-4a11-a586-158cec594707_1890x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2dCT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2dCT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 424w, https://substackcdn.com/image/fetch/$s_!2dCT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 848w, https://substackcdn.com/image/fetch/$s_!2dCT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!2dCT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2dCT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png" width="1456" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:471845,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2dCT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 424w, https://substackcdn.com/image/fetch/$s_!2dCT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 848w, https://substackcdn.com/image/fetch/$s_!2dCT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!2dCT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82714f0f-f3ac-4226-a70e-cbe700f5593c_1884x1056.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The lesson from these arenas is not &#8220;AI writes your Three.js app.&#8221; It&#8217;s:</p><ul><li><p><strong>Decide what AI should generate:</strong> Ask for geometries, datasets, configuration, not full integration code</p></li><li><p><strong>Specify the target library:</strong> React Three Fiber, Drei, Recharts, Victory, Visx</p></li><li><p><strong>Request low poly first</strong> and iterate toward fidelity once performance is proven</p></li><li><p><strong>Keep performance under control:</strong> Lazy load heavy assets, guard frame rate, keep fallbacks</p></li></ul><p>In practice, this is how you avoid an &#8220;AI demo&#8221; becoming a performance incident.</p><div><hr></div><h2><strong>React-Specific tips I want more teams to operationalize</strong></h2><p>The talk includes a slide of &#8220;React AI coding tips&#8221; that I keep coming back to because it captures what actually works in practice.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0aF5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0aF5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 424w, https://substackcdn.com/image/fetch/$s_!0aF5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 848w, https://substackcdn.com/image/fetch/$s_!0aF5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!0aF5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0aF5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png" width="1456" height="811" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:811,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:629248,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0aF5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 424w, https://substackcdn.com/image/fetch/$s_!0aF5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 848w, https://substackcdn.com/image/fetch/$s_!0aF5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!0aF5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eda09af-dbe7-40a3-91d9-243c1ecbd735_1884x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here are the ones I see paying off immediately on real teams:</p><ul><li><p><strong>Start prompts with the component API:</strong> Declare props, variants, states, and then tell the model to implement exactly that</p></li><li><p><strong>Name interactive states explicitly:</strong> hover, focus, loading, disabled</p></li><li><p><strong>Ask for a plan, then generate in steps:</strong> Small increments beat one big shot, and help avoid the complexity cliff</p></li><li><p><strong>Codify taste so AI can follow it:</strong> Lock spacing, colors, components in Tailwind config and your design system</p></li><li><p><strong>Be explicit about routes, layouts, server actions, loading and error boundaries</strong></p></li><li><p><strong>Bake conventions into the repo:</strong> Document App Router defaults, Server Components, Suspense so assistants align automatically</p></li><li><p><strong>Run checks only on what changed:</strong> Husky with lint-staged to run typecheck, lint, tests on staged files</p></li><li><p><strong>Control cache behavior explicitly:</strong> Fetch cache options and revalidation windows as part of the prompt, so the model doesn&#8217;t guess your policy</p></li></ul><p><strong>The meta-message is the same:</strong> The difference between &#8220;AI helped me ship&#8221; and &#8220;AI gave me a mess&#8221; is almost always the level of specificity and the strength of your guardrails.</p><div><hr></div><h2><strong>How I debug AI coding failures: it&#8217;s a pipeline, not a model</strong></h2><p>Once you accept the complexity cliff, the question becomes: how do you consistently get good outcomes?</p><p>I use a mental model I showed near the end of the talk: <strong>when AI code works or fails, it&#8217;s rarely &#8220;just the model.&#8221; It&#8217;s the whole pipeline.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!na-W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!na-W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 424w, https://substackcdn.com/image/fetch/$s_!na-W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 848w, https://substackcdn.com/image/fetch/$s_!na-W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!na-W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!na-W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:332597,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!na-W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 424w, https://substackcdn.com/image/fetch/$s_!na-W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 848w, https://substackcdn.com/image/fetch/$s_!na-W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!na-W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b51dd38-b014-4122-bc97-32ece34d1cb1_1870x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The pipeline:</p><ol><li><p>Base model</p></li><li><p>System prompt and instructions</p></li><li><p>Your user prompt</p></li><li><p>Fine-tuning and code training</p></li><li><p>Tools and retrieval (RAG)</p></li><li><p>Agent loops (iteration)</p></li><li><p>Post-processing</p></li></ol><p>If you&#8217;re disappointed, you can almost always point to a weak link: wrong model for task, vague prompt, missing context, no iteration.</p><p>When things work, it&#8217;s usually because multiple layers aligned well: strong model, good prompt, necessary context, and iteration to iron out kinks.</p><p><strong>This is actionable, because most of those layers are under your control as a user</strong>, even if you don&#8217;t own the base model.</p><div><hr></div><h2><strong>The workflow I recommend for agentic React coding</strong></h2><p>I summarized it in the deck as the <strong>&#8220;new flow state&#8221;:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uQ4k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uQ4k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 424w, https://substackcdn.com/image/fetch/$s_!uQ4k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 848w, https://substackcdn.com/image/fetch/$s_!uQ4k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!uQ4k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uQ4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:576421,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180999655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uQ4k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 424w, https://substackcdn.com/image/fetch/$s_!uQ4k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 848w, https://substackcdn.com/image/fetch/$s_!uQ4k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!uQ4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd674a8c-c225-46c6-b653-77a4ee119ea3_1864x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p>Define clear requirements (maybe write tests)</p></li><li><p>Prompt with context (stack, docs, examples)</p></li><li><p>Ask for plan, review</p></li><li><p>Generate code in small steps</p></li><li><p>Run, test, refine</p></li><li><p>Iterate until production-ready</p></li></ol><p>This sounds like &#8220;normal engineering,&#8221; and <strong>that&#8217;s the point.</strong> The best teams I see using AI are not doing anything mystical. They&#8217;re turning implicit engineering discipline into explicit instructions and then using AI to accelerate the boring parts.</p><p><strong>If you want one sentence:</strong> You&#8217;re not just typing code anymore, you&#8217;re orchestrating code creation.</p><div><hr></div><h2><strong>So, how good is AI at coding React, really?</strong></h2><p>Where I land after looking at these benchmarks and using these tools day to day:</p><p>&#9989; <strong>AI is genuinely strong at:</strong></p><ul><li><p>Isolated React components</p></li><li><p>Scaffolding</p></li><li><p>Converting clearly specified requirements into working code</p></li></ul><p>&#9888;&#65039; <strong>AI is still unreliable at:</strong></p><ul><li><p>Multi-step integration tasks without strong tooling, strong context, and iteration loops</p></li></ul><p>&#10060; <strong>AI is consistently weaker at:</strong></p><ul><li><p>Taste, hierarchy, and nuanced UX decisions than it is at &#8220;code that runs&#8221;</p></li><li><p>The aesthetic gap is real</p></li></ul><p>&#128161; <strong>The highest leverage strategy is not &#8220;pick the best model.&#8221;</strong> It&#8217;s:</p><ul><li><p>Reduce context failures</p></li><li><p>Codify your conventions</p></li><li><p>Force stepwise work</p></li></ul><p><strong>And the part I find most exciting:</strong> The opportunity keeps expanding. Models and tools change fast, but the underlying skills that make you effective don&#8217;t. <strong>You are still the architect.</strong></p><div><hr></div><h2><strong>Learn More</strong></h2><p>If you want to keep exploring this space:</p><ul><li><p><strong>Talk video:</strong> <a href="https://gitnation.com/contents/how-good-is-ai-at-coding-react-really">https://gitnation.com/contents/how-good-is-ai-at-coding-react-really</a></p></li><li><p><strong>Design Arena leaderboard:</strong> <a href="https://www.designarena.ai/leaderboard">https://www.designarena.ai/leaderboard</a></p></li><li><p><strong>Design Arena methodology:</strong> <a href="https://notes.designarena.ai/in-pursuit-of-a-benchmark-for-human-taste/">https://notes.designarena.ai/in-pursuit-of-a-benchmark-for-human-taste/</a></p></li></ul><p><em>And if you want to dive deeper into related topics, I&#8217;ve written two books on the topic: &#8220;<a href="https://beyond.addy.ie">Beyond Vibe Coding</a>&#8221; and &#8220;<a href="https://largeapps.dev/">Building large-scale web apps with React</a>&#8221;</em></p><p></p><div><hr></div><p></p>]]></content:encoded></item><item><title><![CDATA[My LLM coding workflow going into 2026]]></title><description><![CDATA[Best practices for staying in control while coding with AI]]></description><link>https://addyo.substack.com/p/my-llm-coding-workflow-going-into</link><guid isPermaLink="false">https://addyo.substack.com/p/my-llm-coding-workflow-going-into</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Thu, 18 Dec 2025 15:30:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ukkU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>AI coding assistants became game-changers this year, but harnessing them effectively takes skill and structure.</strong> These tools dramatically increased what LLMs can do for real-world coding, and many developers (myself included) embraced them.</p><p>At Anthropic, for example, engineers adopted Claude Code so heavily that <strong><a href="https://newsletter.pragmaticengineer.com/p/software-engineering-with-llms-in-2025#:~:text=,%E2%80%9D">today</a> ~90% of the code for Claude Code is written by Claude Code itself</strong>. Yet, using LLMs for programming is <em>not</em> a push-button magic experience - it&#8217;s &#8220;difficult and unintuitive&#8221; and getting great results requires learning new patterns. <a href="https://addyo.substack.com/p/critical-thinking-during-the-age">Critical thinking</a> remains key. Over a year of projects, I&#8217;ve converged on a workflow similar to what many experienced devs are discovering: treat the LLM as a powerful pair programmer that <strong>requires clear direction, context and oversight</strong> rather than autonomous judgment.</p><p>In this article, I&#8217;ll share how I plan, code, and collaborate with AI going into 2026, distilling tips and best practices from my experience and the community&#8217;s collective learning. It&#8217;s a more disciplined <strong>&#8220;AI-assisted engineering&#8221;</strong> approach - leveraging AI aggressively while <strong>staying proudly accountable for the software produced</strong>.</p><div id="youtube2-FoXHScf1mjA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;FoXHScf1mjA&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/FoXHScf1mjA?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>If you&#8217;re interested in more on my workflow, see &#8220;The AI-Native Software Engineer&#8221;, otherwise let&#8217;s dive straight into some of the lessons I learned.</p><h2>Start with a clear plan (specs before code)</h2><p><strong>Don&#8217;t just throw wishes at the LLM - begin by defining the problem and planning a solution.</strong> </p><p>One common mistake is diving straight into code generation with a vague prompt. In my workflow, and in many others&#8217;, the first step is <strong>brainstorming a detailed specification</strong> <em>with</em> the AI, then outlining a step-by-step plan, <em>before</em> writing any actual code. For a new project, I&#8217;ll describe the idea and ask the LLM to <strong>iteratively ask me questions</strong> until we&#8217;ve fleshed out requirements and edge cases. By the end, we compile this into a comprehensive <strong>spec.md</strong> - containing requirements, architecture decisions, data models, and even a testing strategy. This spec forms the foundation for development.</p><p>Next, I feed the spec into a reasoning-capable model and prompt it to <strong>generate a project plan</strong>: break the implementation into logical, bite-sized tasks or milestones. The AI essentially helps me do a mini &#8220;design doc&#8221; or project plan. I often iterate on this plan - editing and asking the AI to critique or refine it - until it&#8217;s coherent and complete. <em>Only then</em> do I proceed to coding. This upfront investment might feel slow, but it pays off enormously. As Les Orchard <a href="https://blog.lmorchard.com/2025/06/07/semi-automatic-coding/#:~:text=Accidental%20waterfall%20">put it</a>, it&#8217;s like doing a <strong>&#8220;waterfall in 15 minutes&#8221;</strong> - a rapid structured planning phase that makes the subsequent coding much smoother. </p><p>Having a clear spec and plan means when we unleash the codegen, both the human and the LLM know exactly what we&#8217;re building and why. In short, <strong>planning first</strong> forces you and the AI onto the same page and prevents wasted cycles. It&#8217;s a step many people are tempted to skip, but experienced LLM developers now treat a robust spec/plan as the cornerstone of the workflow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xGPR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xGPR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 424w, https://substackcdn.com/image/fetch/$s_!xGPR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 848w, https://substackcdn.com/image/fetch/$s_!xGPR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!xGPR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xGPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1468306,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xGPR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 424w, https://substackcdn.com/image/fetch/$s_!xGPR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 848w, https://substackcdn.com/image/fetch/$s_!xGPR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!xGPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F279d764d-c05b-4b6e-848e-4b481a8c0eeb_1894x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Break work into small, iterative chunks</h2><p><strong>Scope management is everything - feed the LLM manageable tasks, not the whole codebase at once.</strong> </p><p>A crucial lesson I&#8217;ve learned is to avoid asking the AI for large, monolithic outputs. Instead, we <strong>break the project into iterative steps or tickets</strong> and tackle them <a href="https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-in-september-2025/#:~:text=please%20write%20out%20this%20plan%2C,in%20full%20detail%2C%20into%20docs%2Fplans">one by one</a>. This mirrors good software engineering practice, but it&#8217;s even more important with AI in the loop. LLMs do best when given focused prompts: implement one function, fix one bug, add one feature at a time. For example, after planning, I will prompt the codegen model: <em>&#8220;Okay, let&#8217;s implement Step 1 from the plan&#8221;</em>. We code that, test it, then move to Step 2, and so on. Each chunk is small enough that the AI can handle it within context and you can understand the code it produces.</p><p>This approach guards against the model going off the rails. If you ask for too much in one go, it&#8217;s likely to get confused or produce a <strong>&#8220;jumbled mess&#8221;</strong> that&#8217;s hard to untangle. Developers <a href="https://albertofortin.com/writing/coding-with-ai#:~:text=No%20consistency%2C%20no%20overarching%20plan,the%20other%209%20were%20doing">report</a> that when they tried to have an LLM generate huge swaths of an app, they ended up with inconsistency and duplication - &#8220;like 10 devs worked on it without talking to each other,&#8221; one said. I&#8217;ve felt that pain; the fix is to <strong>stop, back up, and split the problem into smaller pieces</strong>. Each iteration, we carry forward the context of what&#8217;s been built and incrementally add to it. This also fits nicely with a <strong>test-driven development (TDD)</strong> approach - we can write or generate tests for each piece as we go (more on testing soon).</p><p>Several coding-agent tools now explicitly support this chunked workflow. For instance, I often generate a structured <strong>&#8220;prompt plan&#8221;</strong> file that contains a sequence of prompts for each task, so that tools like Cursor can execute them one by one. The key point is to <strong>avoid huge leaps</strong>. By iterating in small loops, we greatly reduce the chance of catastrophic errors and we can course-correct quickly. LLMs excel at quick, contained tasks - use that to your advantage.</p><h2>Provide extensive context and guidance</h2><p><strong>LLMs are only as good as the context you provide - </strong><em><strong>show them</strong></em><strong> the relevant code, docs, and constraints.</strong> </p><p>When working on a codebase, I make sure to <strong>feed the AI all the information it needs</strong> to perform well. That includes the code it should modify or refer to, the project&#8217;s technical constraints, and any known pitfalls or preferred approaches. Modern tools help with this: for example, Anthropic&#8217;s Claude can import an entire GitHub repo into its context in &#8220;Projects&#8221; mode, and IDE assistants like Cursor or Copilot auto-include open files in the prompt. But I often go further - I will either use an MCP like <a href="https://context7.com/">Context7</a> or manually copy important pieces of the codebase or API docs into the conversation if I suspect the model doesn&#8217;t have them.</p><p>Expert LLM users emphasize this &#8220;context packing&#8221; step. For example, doing a <strong>&#8220;brain dump&#8221;</strong> of everything the model should know before coding, including: high-level goals and invariants, examples of good solutions, and warnings about approaches to avoid. If I&#8217;m asking an AI to implement a tricky solution, I might tell it which naive solutions are too slow, or provide a reference implementation from elsewhere. If I&#8217;m using a niche library or a brand-new API, I&#8217;ll paste in the official docs or README so the AI isn&#8217;t flying blind. All of this upfront context dramatically improves the quality of its output, because the model isn&#8217;t guessing - it has the facts and constraints in front of it.</p><p>There are now utilities to automate context packaging. I&#8217;ve experimented with tools like <strong><a href="https://gitingest.com/">gitingest</a></strong> or <strong><a href="https://github.com/abinthomasonline/repo2txt">repo2txt</a></strong>, which essentially <strong>&#8220;dump&#8221; the relevant parts of your codebase into a text file for the LLM to read</strong>. These can be a lifesaver when dealing with a large project - you generate an output.txt bundle of key source files and let the model ingest that. The principle is: <strong>don&#8217;t make the AI operate on partial information</strong>. If a bug fix requires understanding four different modules, show it those four modules. Yes, we must watch token limits, but current frontier models have pretty huge context windows (tens of thousands of tokens). Use them wisely. I often selectively include just the portions of code relevant to the task at hand, and explicitly tell the AI what <em>not</em> to focus on if something is out of scope (to save tokens).</p><p>I think <strong><a href="https://github.com/anthropics/skills">Claude Skills</a></strong> have potential because they turn what used to be fragile repeated prompting into something <strong>durable and reusable</strong> by packaging instructions, scripts, and domain specific expertise into modular capabilities that tools can automatically apply when a request matches the Skill. This means you get more reliable and context aware results than a generic prompt ever could and you move away from one off interactions toward workflows that encode repeatable procedures and team knowledge for tasks in a consistent way. A number of community-curated <a href="https://www.x-cmd.com/skill/">Skills collections</a> exist, but one of my favorite examples is the <a href="https://x.com/trq212/status/1989061937590837678">frontend-design</a> skill which can &#8220;end&#8221; the purple design aesthetic prevalent in LLM generated UIs. Until more tools support Skills officially, <a href="https://github.com/intellectronica/skillz">workarounds</a> exist.</p><p>Finally, <strong>guide the AI with comments and rules inside the prompt</strong>. I might precede a code snippet with: &#8220;Here is the current implementation of X. We need to extend it to do Y, but be careful not to break Z.&#8221; These little hints go a long way. LLMs are <strong>literalists</strong> - they&#8217;ll follow instructions, so give them detailed, contextual instructions. By proactively providing context and guidance, we minimize hallucinations and off-base suggestions and get code that fits our project&#8217;s needs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gnQO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gnQO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 424w, https://substackcdn.com/image/fetch/$s_!gnQO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 848w, https://substackcdn.com/image/fetch/$s_!gnQO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!gnQO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gnQO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png" width="1456" height="811" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:811,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:752827,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gnQO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 424w, https://substackcdn.com/image/fetch/$s_!gnQO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 848w, https://substackcdn.com/image/fetch/$s_!gnQO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!gnQO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92ab8a8-fa3b-49cb-ad4c-7e3059a8e2de_1884x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Choose the right model (and use multiple when needed)</h2><p><strong>Not all coding LLMs are equal - pick your tool with intention, and don&#8217;t be afraid to swap models mid-stream.</strong> </p><p>In 2025 we&#8217;ve been spoiled with a variety of capable code-focused LLMs. Part of my workflow is <strong>choosing the model or service best suited to each task</strong>. Sometimes it can be valuable to even try two or more LLMs in parallel to cross-check how they might approach the same problem differently.</p><p>Each model has its own &#8220;personality&#8221;. The key is: <strong>if one model gets stuck or gives mediocre outputs, try another.</strong> I&#8217;ve literally copied the same prompt from one chat into another service to see if it can handle it better. This &#8220;<a href="https://blog.lmorchard.com/2025/06/07/semi-automatic-coding/#:~:text=I%20bounced%20between%20Claude%20Sonnet,Each%20had%20its%20own%20personality">model musical chairs</a>&#8221; can rescue you when you hit a model&#8217;s blind spot.</p><p>Also, make sure you&#8217;re using <em>the best version</em> available. If you can, use the newest &#8220;pro&#8221; tier models - because quality matters. And yes, it often means paying for access, but the productivity gains can justify it. Ultimately, pick the AI pair programmer whose <strong>&#8220;vibe&#8221; meshes with you</strong>. I know folks who prefer one model simply because they like how its responses <em>feel</em>. That&#8217;s valid - when you&#8217;re essentially in a constant dialogue with an AI, the UX and tone make a difference. </p><p>Personally I gravitate towards Gemini for a lot of coding work these days because the interaction feels more natural and it often understands my requests on the first try. But I will not hesitate to switch to another model if needed; sometimes a second opinion helps the solution emerge. In summary: <strong>use the best tool for the job, and remember you have an arsenal of AIs at your disposal.</strong></p><h2>Leverage AI coding across the lifecycle</h2><p><strong>Supercharge your workflow with coding-specific AI help across the SDLC.</strong> </p><p>On the command-line, new AI agents emerged. <strong>Claude Code, OpenAI&#8217;s Codex CLI</strong> and <strong>Google&#8217;s Gemini CLI</strong> are CLI tools where you can chat with them directly in your project directory - they can read files, run tests, and even multi-step fix issues. I&#8217;ve used Google&#8217;s <strong>Jules </strong>and GitHub&#8217;s <strong>Copilot Agent</strong> as well - these are <strong>asynchronous coding agents</strong> that actually clone your repo into a cloud VM and work on tasks in the background (writing tests, fixing bugs, then opening a PR for you). It&#8217;s a bit eerie to witness: you issue a command like &#8220;refactor the payment module for X&#8221; and a little while later you get a pull request with code changes and passing tests. We are truly living in the future. You can read more about this in <a href="https://addyo.substack.com/p/conductors-to-orchestrators-the-future">conductors to orchestrators</a>.</p><p>That said, <strong>these tools are not infallible, and you must understand their limits</strong>. They accelerate the mechanical parts of coding - generating boilerplate, applying repetitive changes, running tests automatically - but they still benefit greatly from your guidance. For instance, when I use an agent like Claude or Copilot to implement something, I often supply it with the plan or to-do list from earlier steps so it knows the exact sequence of tasks. If the agent supports it, I&#8217;ll load up my spec.md or plan.md in the context before telling it to execute. This keeps it on track.</p><p><strong>We&#8217;re not at the stage of letting an AI agent code an entire feature unattended</strong> and expecting perfect results. Instead, I use these tools in a supervised way: I&#8217;ll let them generate and even run code, but I keep an eye on each step, ready to step in when something looks off. There are also orchestration tools like <strong>Conductor</strong> that let you run multiple agents in parallel on different tasks (essentially a way to scale up AI help) - some engineers are experimenting with running 3-4 agents at once on separate features. I&#8217;ve dabbled in this &#8220;massively parallel&#8221; approach; it&#8217;s surprisingly effective at getting a lot done quickly, but it&#8217;s also mentally taxing to monitor multiple AI threads! For most cases, I stick to one main agent at a time and maybe a secondary one for reviews (discussed below).</p><p>Just remember these are power tools - you still control the trigger and guide the outcome.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F31O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F31O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 424w, https://substackcdn.com/image/fetch/$s_!F31O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 848w, https://substackcdn.com/image/fetch/$s_!F31O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!F31O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F31O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:388469,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F31O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 424w, https://substackcdn.com/image/fetch/$s_!F31O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 848w, https://substackcdn.com/image/fetch/$s_!F31O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!F31O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8398cc-f2c9-4c04-b3a1-37437dd3d82a_1888x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A full overview of where AI can improve the developer experience. This spans design, inner, submit, and outer loops - highlighting every point where AI can meaningfully reduce toil.</em></p><h2>Keep a human in the loop - verify, test, and review everything</h2><p><strong>AI will happily produce plausible-looking code, but </strong><em><strong>you</strong></em><strong> are responsible for quality - always review and test thoroughly.</strong> One of my cardinal rules is never to blindly trust an LLM&#8217;s output. As Simon Willison aptly <a href="https://simonwillison.net/2025/Mar/11/using-llms-for-code/#:~:text=Instead%2C%20use%20them%20to%20augment,on%20tedious%20tasks%20without%20complaint">says</a>, think of an LLM pair programmer as <strong>&#8220;over-confident and prone to mistakes&#8221;</strong>. It writes code with complete conviction - including bugs or nonsense - and won&#8217;t tell you something is wrong unless you catch it. So I treat every AI-generated snippet as if it came from a junior developer: I read through the code, run it, and test it as needed. <strong>You absolutely have to test what it writes</strong> - run those unit tests, or manually exercise the feature, to ensure it does what it claims. Read more about this in <a href="https://addyo.substack.com/p/vibe-coding-is-not-an-excuse-for">vibe coding is not an excuse for low-quality work</a>.</p><p>In fact, I weave testing into the workflow itself. My earlier planning stage often includes generating a list of tests or a testing plan for each step. If I&#8217;m using a tool like Claude Code, I&#8217;ll instruct it to run the test suite after implementing a task, and have it debug failures if any occur. This kind of tight feedback loop (write code &#8594; run tests &#8594; fix) is something AI excels at <em>as long as the tests exist</em>. It&#8217;s no surprise that those who get the most out of coding agents tend to be those with strong testing practices. An agent like Claude can &#8220;fly&#8221; through a project with a good test suite as safety net. Without tests, the agent might blithely assume everything is fine (&#8220;sure, all good!&#8221;) when in reality it&#8217;s broken several things. So, <strong>invest in tests</strong> - it amplifies the AI&#8217;s usefulness and confidence in the result.</p><p>Even beyond automated tests, <strong>do code reviews - both manual and AI-assisted</strong>. I routinely pause and review the code that&#8217;s been generated so far, line by line. Sometimes I&#8217;ll spawn a second AI session (or a different model) and ask <em>it</em> to critique or review code produced by the first. For example, I might have Claude write the code and then ask Gemini, &#8220;Can you review this function for any errors or improvements?&#8221; This can catch subtle issues. The key is to <em>not</em> skip the review just because an AI wrote the code. If anything, AI-written code needs <strong>extra scrutiny</strong>, because it can sometimes be superficially convincing while hiding flaws that a human might not immediately notice.</p><p>I also use <a href="https://github.com/chromeDevTools/chrome-devtools-mcp/">Chrome DevTools MCP</a>, built with my last team, for my <strong>debugging and quality loop</strong> to bridge the gap between static code analysis and live browser execution. It &#8220;gives your agent eyes&#8221;. It lets me grant my AI tools direct access to see what the browser can, inspect the DOM, get rich performance traces, console logs or network traces. This integration eliminates the friction of manual context switching, allowing for automated UI testing directly through the LLM. It means bugs can be diagnosed and fixed with high precision based on actual runtime data. </p><p>The dire consequences of skipping human oversight have been documented. One developer who leaned heavily on AI generation for a rush project <a href="https://albertofortin.com/writing/coding-with-ai#:~:text=No%20consistency%2C%20no%20overarching%20plan,the%20other%209%20were%20doing">described</a> the result as an inconsistent mess - duplicate logic, mismatched method names, no coherent architecture. He realized he&#8217;d been &#8220;building, building, building&#8221; without stepping back to really see what the AI had woven together. The fix was a painful refactor and a vow to never let things get that far out of hand again. I&#8217;ve taken that to heart. <strong>No matter how much AI I use, I remain the accountable engineer</strong>.</p><p>In practical terms, that means I only merge or ship code after I&#8217;ve understood it. If the AI generates something convoluted, I&#8217;ll ask it to add comments explaining it, or I&#8217;ll rewrite it in simpler terms. If something doesn&#8217;t feel right, I dig in - just as I would if a human colleague contributed code that raised red flags.</p><p>It&#8217;s all about mindset: <strong>the LLM is an assistant, not an autonomously reliable coder</strong>. I am the senior dev; the LLM is there to accelerate me, not replace my judgment. Maintaining this stance not only results in better code, it also protects your own growth as a developer. (I&#8217;ve heard some express concern that relying too much on AI might dull their skills - I think as long as you stay in the loop, actively reviewing and understanding everything, you&#8217;re still sharpening your instincts, just at a higher velocity.) In short: <strong>stay alert, test often, review always.</strong> It&#8217;s still your codebase at the end of the day.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-yfn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-yfn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 424w, https://substackcdn.com/image/fetch/$s_!-yfn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 848w, https://substackcdn.com/image/fetch/$s_!-yfn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 1272w, https://substackcdn.com/image/fetch/$s_!-yfn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-yfn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png" width="1456" height="763" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:763,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:259438,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-yfn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 424w, https://substackcdn.com/image/fetch/$s_!-yfn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 848w, https://substackcdn.com/image/fetch/$s_!-yfn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 1272w, https://substackcdn.com/image/fetch/$s_!-yfn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd35bc06-47df-4660-9f2f-d0beaa489165_1858x974.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Commit often and use version control as a safety net. Never commit code you can&#8217;t explain.</h2><p><strong>Frequent commits are your save points - they let you undo AI missteps and understand changes.</strong> </p><p>When working with an AI that can generate a lot of code quickly, it&#8217;s easy for things to veer off course. I mitigate this by adopting ultra-granular version control habits. I commit early and often, even more than I would in normal hand-coding. After each small task or each successful automated edit, I&#8217;ll make a git commit with a clear message. This way, if the AI&#8217;s next suggestion introduces a bug or a messy change, I have a recent checkpoint to revert to (or cherry-pick from) without losing hours of work. One practitioner likened it to treating commits as <strong>&#8220;save points in a game&#8221;</strong> - if an LLM session goes sideways, you can always roll back to the last stable commit. I&#8217;ve found that advice incredibly useful. It&#8217;s much less stressful to experiment with a bold AI refactor when you know you can undo it with a git reset if needed.</p><p>Proper version control also helps when collaborating with the AI. Since I can&#8217;t rely on the AI to remember everything it&#8217;s done (context window limitations, etc.), the git history becomes a valuable log. I often scan my recent commits to brief the AI (or myself) on what changed. In fact, LLMs themselves can leverage your commit history if you provide it - I&#8217;ve pasted git diffs or commit logs into the prompt so the AI knows what code is new or what the previous state was. Amusingly, LLMs are <em>really</em> good at parsing diffs and using tools like git bisect to find where a bug was introduced. They have infinite patience to traverse commit histories, which can augment your debugging. But this only works if you have a tidy commit history to begin with.</p><p>Another benefit: small commits with good messages essentially document the development process, which helps when doing code review (AI or human). If an AI agent made five changes in one go and something broke, having those changes in separate commits makes it easier to pinpoint which commit caused the issue. If everything is in one giant commit titled &#8220;AI changes&#8221;, good luck! So I discipline myself: <em>finish task, run tests, commit.</em> This also meshes well with the earlier tip about breaking work into small chunks - each chunk ends up as its own commit or PR.</p><p>Finally, don&#8217;t be afraid to <strong>use branches or worktrees</strong> to isolate AI experiments. One advanced workflow I&#8217;ve adopted (inspired by folks like Jesse Vincent) is to spin up a fresh git worktree for a new feature or sub-project. This lets me run multiple AI coding sessions in parallel on the same repo without them interfering, and I can later merge the changes. It&#8217;s a bit like having each AI task in its own sandbox branch. If one experiment fails, I throw away that worktree and nothing is lost in main. If it succeeds, I merge it in. This approach has been crucial when I&#8217;m, say, letting an AI implement Feature A while I (or another AI) work on Feature B simultaneously. Version control is what makes this coordination possible. In short: <strong>commit often, organize your work with branches, and embrace git</strong> as the control mechanism to keep AI-generated changes manageable and reversible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OfO1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OfO1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 424w, https://substackcdn.com/image/fetch/$s_!OfO1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 848w, https://substackcdn.com/image/fetch/$s_!OfO1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 1272w, https://substackcdn.com/image/fetch/$s_!OfO1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OfO1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png" width="1456" height="804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:804,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1039878,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OfO1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 424w, https://substackcdn.com/image/fetch/$s_!OfO1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 848w, https://substackcdn.com/image/fetch/$s_!OfO1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 1272w, https://substackcdn.com/image/fetch/$s_!OfO1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2702481a-ef4d-4dc7-9247-0b331eb70568_1886x1042.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Customize the AI&#8217;s behavior with rules and examples</h2><p><strong>Steer your AI assistant by providing style guides, examples, and even &#8220;rules files&#8221; - a little upfront tuning yields much better outputs.</strong></p><p>One thing I learned is that you don&#8217;t have to accept the AI&#8217;s default style or approach - you can influence it heavily by giving it guidelines. For instance, I have a <strong>CLAUDE.md</strong> file that I update periodically, which contains process rules and preferences for Claude (Anthropic&#8217;s model) to follow (and similarly a GEMINI.md when using Gemini CLI). This includes things like &#8220;write code in our project&#8217;s style, follow our lint rules, don&#8217;t use certain functions, prefer functional style over OOP,&#8221; etc. When I start a session, I feed this file to Claude to align it with our conventions. It&#8217;s surprising how well this works to keep the model &#8220;on track&#8221; as Jesse Vincent <a href="https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-in-september-2025/#:~:text=I%27m%20still%20primarily%20using%20Claude,Code">noted</a> - it reduces the tendency of the AI to go off-script or introduce patterns we don&#8217;t want.</p><p>Even without a fancy rules file, you can <strong>set the tone with custom instructions or system prompts</strong>. GitHub Copilot and Cursor both introduced features to let you configure the AI&#8217;s behavior <a href="https://benjamincongdon.me/blog/2025/02/02/How-I-Use-AI-Early-2025/#:~:text=stuck,my%20company%E2%80%99s%20%2F%20team%E2%80%99s%20codebase">globally</a> for your project. I&#8217;ve taken advantage of that by writing a short paragraph about our coding style, e.g. &#8220;Use 4 spaces indent, avoid arrow functions in React, prefer descriptive variable names, code should pass ESLint.&#8221; With those instructions in place, the AI&#8217;s suggestions adhere much more closely to what a human teammate might write. Ben Congdon <a href="https://benjamincongdon.me/blog/2025/02/02/How-I-Use-AI-Early-2025/#:~:text=roughly%20on%20par,get%20past%20a%20logical%20impasse">mentioned</a> how shocked he was that few people use <strong>Copilot&#8217;s custom instructions</strong>, given how effective they are - he could guide the AI to output code matching his team&#8217;s idioms by providing some examples and preferences upfront. I echo that: take the time to teach the AI your expectations.</p><p>Another powerful technique is providing <strong>in-line examples</strong> of the output format or approach you want. If I want the AI to write a function in a very specific way, I might first show it a similar function already in the codebase: &#8220;Here&#8217;s how we implemented X, use a similar approach for Y.&#8221; If I want a certain commenting style, I might write a comment myself and ask the AI to continue in that style. Essentially, <em>prime</em> the model with the pattern to follow. LLMs are great at mimicry - show them one or two examples and they&#8217;ll continue in that vein.</p><p>The community has also come up with creative &#8220;rulesets&#8221; to tame LLM behavior. You might have heard of the <a href="https://harper.blog/2025/04/17/an-llm-codegen-heros-journey/#:~:text=repository,it%20in%20a%20few%20steps">&#8220;Big Daddy&#8221; rule</a> or adding a &#8220;no hallucination/no deception&#8221; clause to prompts. These are basically tricks to remind the AI to be truthful and not overly fabricate code that doesn&#8217;t exist. For example, I sometimes prepend a prompt with: &#8220;If you are unsure about something or the codebase context is missing, ask for clarification rather than making up an answer.&#8221; This reduces hallucinations. Another rule I use is: &#8220;Always explain your reasoning briefly in comments when fixing a bug.&#8221; This way, when the AI generates a fix, it will also leave a comment like &#8220;// Fixed: Changed X to Y to prevent Z (as per spec).&#8221; That&#8217;s super useful for later review.</p><p>In summary, <strong>don&#8217;t treat the AI as a black box - tune it</strong>. By configuring system instructions, sharing project docs, or writing down explicit rules, you turn the AI into a more specialized developer on your team. It&#8217;s akin to onboarding a new hire: you&#8217;d give them the style guide and some starter tips, right? Do the same for your AI pair programmer. The return on investment is huge: you get outputs that need less tweaking and integrate more smoothly with your codebase.</p><h2>Embrace testing and automation as force multipliers</h2><p><strong>Use your CI/CD, linters, and code review bots - AI will work best in an environment that catches mistakes automatically.</strong> </p><p>This is a corollary to staying in the loop and providing context: a well-oiled development pipeline enhances AI productivity. I ensure that any repository where I use heavy AI coding has a robust <strong>continuous integration setup</strong>. That means automated tests run on every commit or PR, code style checks (like ESLint, Prettier, etc.) are enforced, and ideally a staging deployment is available for any new branch. Why? Because I can let the AI trigger these and evaluate the results. For instance, if the AI opens a pull request via a tool like Jules or GitHub Copilot Agent, our CI will run tests and report failures. I can feed those failure logs back to the AI: &#8220;The integration tests failed with XYZ, let&#8217;s debug this.&#8221; It turns bug-fixing into a collaborative loop with quick feedback, which AIs handle quite well (they&#8217;ll suggest a fix, we run CI again, and iterate).</p><p>Automated code quality checks (linters, type checkers) also guide the AI. I actually include linter output in the prompt sometimes. If the AI writes code that doesn&#8217;t pass our linter, I&#8217;ll copy the linter errors into the chat and say &#8220;please address these issues.&#8221; The model then knows exactly what to do. It&#8217;s like having a strict teacher looking over the AI&#8217;s shoulder. In my experience, once the AI is aware of a tool&#8217;s output (like a failing test or a lint warning), it will try very hard to correct it - after all, it &#8220;wants&#8221; to produce the right answer. This ties back to providing context: give the AI the results of its actions in the environment (test failures, etc.) and it will learn from them.</p><p>AI coding agents themselves are increasingly incorporating automation hooks. Some agents will refuse to say a code task is &#8220;done&#8221; until all tests pass, which is exactly the diligence you want. Code review bots (AI or otherwise) act as another filter - I treat their feedback as additional prompts for improvement. For example, if CodeRabbit or another reviewer comments &#8220;This function is doing X which is not ideal&#8221; I will ask the AI, &#8220;Can you refactor based on this feedback?&#8221;</p><p>By combining AI with automation, you start to get a virtuous cycle. The AI writes code, the automated tools catch issues, the AI fixes them, and so forth, with you overseeing the high-level direction. It feels like having an extremely fast junior dev whose work is instantly checked by a tireless QA engineer. But remember, <em>you</em> set up that environment. If your project lacks tests or any automated checks, the AI&#8217;s work may slip through with subtle bugs or poor quality until much later. </p><p>So as we head into 2026, one of my goals is to bolster the quality gates around AI code contribution: more tests, more monitoring, perhaps even AI-on-AI code reviews. It might sound paradoxical (AIs reviewing AIs), but I&#8217;ve seen it catch things one model missed. Bottom line: <strong>an AI-friendly workflow is one with strong automation - use those tools to keep the AI honest</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T25F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T25F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 424w, https://substackcdn.com/image/fetch/$s_!T25F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 848w, https://substackcdn.com/image/fetch/$s_!T25F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!T25F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T25F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png" width="1456" height="811" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:811,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2234413,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!T25F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 424w, https://substackcdn.com/image/fetch/$s_!T25F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 848w, https://substackcdn.com/image/fetch/$s_!T25F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!T25F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01a7a816-646f-4723-a44e-7177e2dbc2ae_1882x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Continuously learn and adapt (AI amplifies your skills)</h2><p><strong>Treat every AI coding session as a learning opportunity - the more you know, the more the AI can help you, creating a virtuous cycle.</strong></p><p>One of the most exciting aspects of using LLMs in development is how much <em>I</em> have learned in the process. Rather than replacing my need to know things, AIs have actually exposed me to new languages, frameworks, and techniques I might not have tried on my own.</p><p>This pattern holds generally: if you come to the table with solid software engineering fundamentals, the AI will <strong>amplify</strong> your productivity multifold. If you lack that foundation, the AI might just amplify confusion. Seasoned devs have observed that LLMs &#8220;reward existing best practices&#8221; - things like writing clear specs, having good tests, doing code reviews, etc., all become even more powerful when an AI is involved. In my experience, the AI lets me operate at a higher level of abstraction (focusing on design, interface, architecture) while it churns out the boilerplate, but I need to <em>have</em> those high-level skills first. As Simon Willison notes, almost everything that makes someone a <strong>senior engineer</strong> (designing systems, managing complexity, knowing what to automate vs hand-code) is what now yields the best outcomes with AI. So using AIs has actually pushed me to <strong>up my engineering game</strong> - I&#8217;m more rigorous about planning and more conscious of architecture, because I&#8217;m effectively &#8220;managing&#8221; a very fast but somewhat na&#239;ve coder (the AI).</p><p>For those worried that using AI might degrade their abilities: I&#8217;d argue the opposite, if done right. By reviewing AI code, I&#8217;ve been exposed to new idioms and solutions. By debugging AI mistakes, I&#8217;ve deepened my understanding of the language and problem domain. I often ask the AI to explain its code or the rationale behind a fix - kind of like constantly interviewing a candidate about their code - and I pick up insights from its answers. I also use AI as a research assistant: if I&#8217;m not sure about a library or approach, I&#8217;ll ask it to enumerate options or compare trade-offs. It&#8217;s like having an encyclopedic mentor on call. All of this has made me a more knowledgeable programmer.</p><p>The big picture is that <strong>AI tools amplify your expertise</strong>. Going into 2026, I&#8217;m not afraid of them &#8220;taking my job&#8221; - I&#8217;m excited that they free me from drudgery and allow me to spend more time on creative and complex aspects of software engineering. But I&#8217;m also aware that for those without a solid base, AI can lead to Dunning-Kruger on steroids (it may <em>seem</em> like you built something great, until it falls apart). So my advice: continue honing your craft, and use the AI to accelerate that process. Be intentional about periodically coding without AI too, to keep your raw skills sharp. In the end, the developer + AI duo is far more powerful than either alone, and the <em>developer</em> half of that duo has to hold up their end.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1K_1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1K_1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 424w, https://substackcdn.com/image/fetch/$s_!1K_1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 848w, https://substackcdn.com/image/fetch/$s_!1K_1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!1K_1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1K_1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:427794,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1K_1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 424w, https://substackcdn.com/image/fetch/$s_!1K_1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 848w, https://substackcdn.com/image/fetch/$s_!1K_1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!1K_1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6831a588-c75e-4992-9813-84dee28de46d_1876x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Conclusion</h2><p>As we enter 2026, I&#8217;ve fully embraced AI in my development workflow - but in a considered, expert-driven way. My approach is essentially <strong>&#8220;AI-augmented software engineering&#8221;</strong> rather than AI-automated software engineering.</p><p>I&#8217;ve learned: <strong>the best results come when you apply classic software engineering discipline to your AI collaborations</strong>. It turns out all our hard-earned practices - design before coding, write tests, use version control, maintain standards - not only still apply, but are even more important when an AI is writing half your code.</p><p>I&#8217;m excited for what&#8217;s next. The tools keep improving and my workflow will surely evolve alongside them. Perhaps fully autonomous &#8220;AI dev interns&#8221; will tackle more grunt work while we focus on higher-level tasks. Perhaps new paradigms of debugging and code exploration will emerge. No matter what, I plan to stay <em>in the loop</em> - guiding the AIs, learning from them, and amplifying my productivity responsibly.</p><p>The bottom line for me: <strong>AI coding assistants are incredible force multipliers, but the human engineer remains the director of the show.</strong></p><p>With that&#8230;happy building in 2026! &#128640;</p><p><em>I&#8217;m excited to share I&#8217;ve released a new <a href="https://beyond.addy.ie/">AI-assisted engineering book</a> with O&#8217;Reilly. There are a number of free tips on the book site in case interested.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ukkU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ukkU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!ukkU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!ukkU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!ukkU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ukkU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2203490,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/181957927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ukkU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!ukkU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!ukkU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!ukkU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc5ca0ef-614c-42e0-85f3-3663e9871580_7838x7838.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[21 Lessons from 14 Years at Google]]></title><description><![CDATA[On code, careers, and the human side of engineering]]></description><link>https://addyo.substack.com/p/21-lessons-from-14-years-at-google</link><guid isPermaLink="false">https://addyo.substack.com/p/21-lessons-from-14-years-at-google</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Thu, 04 Dec 2025 15:30:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-kh4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When I joined Google ~14 years ago, I thought the job was about writing great code. I was partly right. But the longer I&#8217;ve stayed, the more I&#8217;ve realized that the engineers who thrive aren&#8217;t necessarily the best programmers - they&#8217;re the ones who&#8217;ve figured out how to navigate everything around the code: the people, the politics, the alignment, the ambiguity.</p><p>These lessons are what I wish I&#8217;d known earlier. Some would have saved me months of frustration. Others took years to fully understand. None of them are about specific technologies - those change too fast to matter. They&#8217;re about the patterns that keep showing up, project after project, team after team.</p><p>I&#8217;m sharing them because I&#8217;ve benefited enormously from engineers who did the same for me. Consider this my attempt to pay it forward.</p><h2>1. The best engineers are obsessed with solving user problems.</h2><p>It&#8217;s seductive to fall in love with a technology and go looking for places to apply it. I&#8217;ve done it. Everyone has. But the engineers who create the most value work backwards: they become obsessed with understanding user problems deeply, and let solutions emerge from that understanding.</p><p>User obsession means spending time in support tickets, talking to users, watching users struggle, asking &#8220;why&#8221; until you hit bedrock. The engineer who truly understands the problem often finds that the elegant solution is simpler than anyone expected.</p><p>The engineer who starts with a solution tends to build complexity in search of a justification.</p><h2>2. Being right is cheap. Getting to right together is the real work.</h2><p>You can win every technical argument and lose the project. I&#8217;ve watched brilliant engineers accrue silent resentment by always being the smartest person in the room. The cost shows up later as &#8220;mysterious execution issues&#8221; and &#8220;strange resistance.&#8221;</p><p>The skill isn&#8217;t being right. It&#8217;s entering discussions to align on the problem, creating space for others, and remaining skeptical of your own certainty.</p><p>Strong opinions, weakly held - not because you lack conviction, but because decisions made under uncertainty shouldn&#8217;t be welded to identity.</p><h2>3. Bias towards action. Ship. You can edit a bad page, but you can&#8217;t edit a blank one.</h2><p>The quest for perfection is paralyzing. I&#8217;ve watched engineers spend weeks debating the ideal architecture for something they&#8217;ve never built. The perfect solution rarely emerges from thought alone - it emerges from contact with reality. AI can in many ways help here.</p><p>First do it, then do it right, then do it better. Get the ugly prototype in front of users. Write the messy first draft of the design doc. Ship the MVP that embarrasses you slightly. You&#8217;ll learn more from one week of real feedback than a month of theoretical debate.</p><p>Momentum creates clarity. Analysis paralysis creates nothing.</p><h2>4. Clarity is seniority. Cleverness is overhead.</h2><p>The instinct to write clever code is almost universal among engineers. It feels like proof of competence. </p><p>But software engineering is what happens when you add time and other programmers. In that environment, clarity isn&#8217;t a style preference - it&#8217;s operational risk reduction.</p><p>Your code is a strategy memo to strangers who will maintain it at 2am during an outage. Optimize for their comprehension, not your elegance. The senior engineers I respect most have learned to trade cleverness for clarity, every time.</p><h2>5. Novelty is a loan you repay in outages, hiring, and cognitive overhead.</h2><p>Treat your technology choices like an organization with a small &#8220;innovation token&#8221; budget. Spend one each time you adopt something materially non-standard. You can&#8217;t afford many.</p><p>The punchline isn&#8217;t &#8220;never innovate.&#8221; It&#8217;s &#8220;innovate only where you&#8217;re uniquely paid to innovate.&#8221; Everything else should default to boring, because boring has known failure modes. </p><p>The &#8220;best tool for the job&#8221; is often the &#8220;least-worst tool across many jobs&#8221;-because operating a zoo becomes the real tax.</p><div><hr></div><h2>6. Your code doesn&#8217;t advocate for you. People do.</h2><p>Early in my career, I believed great work would speak for itself. I was wrong. Code sits silently in a repository. Your manager mentions you in a meeting, or they don&#8217;t. A peer recommends you for a project, or someone else.</p><p>In large organizations, decisions get made in meetings you&#8217;re not invited to, using summaries you didn&#8217;t write, by people who have five minutes and twelve priorities. If no one can articulate your impact when you&#8217;re not in the room, your impact is effectively optional.</p><p>This isn&#8217;t strictly about self-promotion. It&#8217;s about making the value chain legible to everyone- including yourself.</p><h2>7. The best code is the code you never had to write.</h2><p>We celebrate creation in engineering culture. Nobody gets promoted for deleting code, even though deletion often improves a system more than addition. Every line of code you don&#8217;t write is a line you never have to debug, maintain, or explain.</p><p>Before you build, exhaust the question: &#8220;What would happen if we just&#8230; didn&#8217;t?&#8221; Sometimes the answer is &#8220;nothing bad,&#8221; and that&#8217;s your solution. </p><p>The problem isn&#8217;t that engineers can&#8217;t write code or use AI to do so. It&#8217;s that we&#8217;re so good at writing it that we forget to ask whether we should.</p><div><hr></div><h2>8. At scale, even your bugs have users.</h2><p>With enough users, every observable behavior becomes a dependency - regardless of what you promised. Someone is scraping your API, automating your quirks, caching your bugs.</p><p>This creates a career-level insight: you can&#8217;t treat compatibility work as &#8220;maintenance&#8221; and new features as &#8220;real work.&#8221; Compatibility is product. </p><p>Design your deprecations as migrations with time, tooling, and empathy. Most &#8220;API design&#8221; is actually &#8220;API retirement.&#8221;</p><h2>9. Most &#8220;slow&#8221; teams are actually misaligned teams.</h2><p>When a project drags, the instinct is to blame execution: people aren&#8217;t working hard enough, the technology is wrong, there aren&#8217;t enough engineers. Usually none of that is the real problem.</p><p>In large companies, teams are your unit of concurrency, but coordination costs grow geometrically as teams multiply. Most slowness is actually alignment failure - people building the wrong things, or the right things in incompatible ways. </p><p>Senior engineers spend more time clarifying direction, interfaces, and priorities than &#8220;writing code faster&#8221; because that&#8217;s where the actual bottleneck lives.</p><h2>10. Focus on what you can control. Ignore what you can&#8217;t.</h2><p>In a large company, countless variables are outside your control - organizational changes, management decisions, market shifts, product pivots. Dwelling on these creates anxiety without agency.</p><p>The engineers who stay sane and effective zero in on their sphere of influence. You can&#8217;t control whether a reorg happens. You can control the quality of your work, how you respond, and what you learn. When faced with uncertainty, break problems into pieces and identify the specific actions available to you. </p><p>This isn&#8217;t passive acceptance but it is strategic focus. Energy spent on what you can&#8217;t change is energy stolen from what you can.</p><h2>11. Abstractions don&#8217;t remove complexity. They move it to the day you&#8217;re on call.</h2><p>Every abstraction is a bet that you won&#8217;t need to understand what&#8217;s underneath. Sometimes you win that bet. But something always leaks, and when it does, you need to know what you&#8217;re standing on.</p><p>Senior engineers keep learning &#8220;lower level&#8221; things even as stacks get higher. Not out of nostalgia, but out of respect for the moment when the abstraction fails and you&#8217;re alone with the system at 3am. Use your stack. </p><p>But keep a working model of its underlying failure modes.</p><h2>12. Writing forces clarity. The fastest way to learn something better is to try teaching it.</h2><p>Writing forces clarity. When I explain a concept to others - in a doc, a talk, a code review comment, even just chatting with AI - I discover the gaps in my own understanding. The act of making something legible to someone else makes it more legible to me.</p><p>This doesn&#8217;t mean that you&#8217;re going to learn how to be a surgeon by teaching it, but the premise still holds largely true in the software engineering domain.</p><p>This isn&#8217;t just about being generous with knowledge. It&#8217;s a selfish learning hack. If you think you understand something, try to explain it simply. The places where you stumble are the places where your understanding is shallow. </p><p>Teaching is debugging your own mental models.</p><h2>13. The work that makes other work possible is priceless - and invisible.</h2><p>Glue work - documentation, onboarding, cross-team coordination, process improvement - is vital. But if you do it unconsciously, it can stall your technical trajectory and burn you out. The trap is doing it as &#8220;helpfulness&#8221; rather than treating it as deliberate, bounded, visible impact.</p><p>Timebox it. Rotate it. Turn it into artifacts: docs, templates, automation. And make it legible as impact, not as personality trait. </p><p>Priceless and invisible is a dangerous combination for your career.</p><h2>14. If you win every debate, you&#8217;re probably accumulating silent resistance.</h2><p>I&#8217;ve learned to be suspicious of my own certainty. When I &#8220;win&#8221; too easily, something is usually wrong. People stop fighting you not because you&#8217;ve convinced them, but because they&#8217;ve given up trying - and they&#8217;ll express that disagreement in execution, not meetings.</p><p>Real alignment takes longer. You have to actually understand other perspectives, incorporate feedback, and sometimes change your mind publicly. </p><p>The short-term feeling of being right is worth much less than the long-term reality of building things with willing collaborators.</p><h2>15. When a measure becomes a target, it stops measuring.</h2><p>Every metric you expose to management will eventually be gamed. Not through malice, but because humans optimize for what&#8217;s measured. </p><p>If you track lines of code, you&#8217;ll get more lines. If you track velocity, you&#8217;ll get inflated estimates.  </p><p>The senior move: respond to every metric request with a pair. One for speed. One for quality or risk. Then insist on interpreting trends, not worshiping thresholds. The goal is insight, not surveillance.</p><h2>16. Admitting what you don&#8217;t know creates more safety than pretending you do.</h2><p>Senior engineers who say &#8220;I don&#8217;t know&#8221; aren&#8217;t showing weakness - they&#8217;re creating permission. When a leader admits uncertainty, it signals that the room is safe for others to do the same. The alternative is a culture where everyone pretends to understand and problems stay hidden until they explode.</p><p>I&#8217;ve seen teams where the most senior person never admitted confusion, and I&#8217;ve seen the damage. Questions don&#8217;t get asked. Assumptions don&#8217;t get challenged. Junior engineers stay silent because they assume everyone else gets it. </p><p>Model curiosity, and you get a team that actually learns.</p><h2>17. Your network outlasts every job you&#8217;ll ever have.</h2><p>Early in my career, I focused on the work and neglected networking. In hindsight, this was a mistake. Colleagues who invested in relationships - inside and outside the company - reaped benefits for decades. </p><p>They heard about opportunities first, could build bridges faster, got recommended for roles, and co-founded ventures with people they&#8217;d built trust with over years.</p><p>Your job isn&#8217;t forever, but your network is. Approach it with curiosity and generosity, not transactional hustle. </p><p>When the time comes to move on, it&#8217;s often relationships that open the door.</p><h2>18. Most performance wins come from removing work, not adding cleverness.</h2><p>When systems get slow, the instinct is to add: caching layers, parallel processing, smarter algorithms. Sometimes that&#8217;s right. But I&#8217;ve seen more performance wins from asking &#8220;what are we computing that we don&#8217;t need?&#8221;</p><p>Deleting unnecessary work is almost always more impactful than doing necessary work faster. The fastest code is code that never runs. </p><p>Before you optimize, question whether the work should exist at all.</p><h2>19. Process exists to reduce uncertainty, not to create paper trails.</h2><p>The best process makes coordination easier and failures cheaper. The worst process is bureaucratic theater - it exists not to help but to assign blame when things go wrong.</p><p>If you can&#8217;t explain how a process reduces risk or increases clarity, it&#8217;s probably just overhead. </p><p>And if people are spending more time documenting their work than doing it, something has gone deeply wrong.</p><h2>20. Eventually, time becomes worth more than money. Act accordingly.</h2><p>Early in your career, you trade time for money - and that&#8217;s fine. But at some point, the calculus inverts. You start to realize that time is the non-renewable resource.</p><p>I&#8217;ve watched senior engineers burn out chasing the next promo level, optimizing for a few more percentage points of compensation. Some of them got it. Most of them wondered, afterward, if it was worth what they gave up.</p><p>The answer isn&#8217;t &#8220;don&#8217;t work hard.&#8221; It&#8217;s &#8220;know what you&#8217;re trading, and make the trade deliberately.&#8221;</p><h2>21. There are no shortcuts, but there is compounding.</h2><p>Expertise comes from deliberate practice - pushing slightly beyond your current skill, reflecting, repeating. For years. There&#8217;s no condensed version.</p><p>But here&#8217;s the hopeful part: learning compounds when it creates new options, not just new trivia. Write - not for engagement, but for clarity. Build reusable primitives. Collect scar tissue into playbooks.</p><p>The engineer who treats their career as compound interest, not lottery tickets, tends to end up much further ahead.</p><h2>A final thought</h2><p>Twenty-one lessons sounds like a lot, but they really come down to a few core ideas: stay curious, stay humble, and remember that the work is always about people - the users you&#8217;re building for and the teammates you&#8217;re building with.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-JAK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-JAK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-JAK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-JAK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-JAK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-JAK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg" width="360" height="360.24725274725273" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1457,&quot;width&quot;:1456,&quot;resizeWidth&quot;:360,&quot;bytes&quot;:4066966,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180675155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-JAK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-JAK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-JAK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-JAK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff24d22-2b08-4900-b733-bb857e7e4459_2736x2737.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A career in engineering is long enough to make plenty of mistakes and still come out ahead. The engineers I admire most aren&#8217;t the ones who got everything right - they&#8217;re the ones who learned from what went wrong, shared what they discovered, and kept showing up.</p><p>If you&#8217;re early in your journey, know that it gets richer with time. If you&#8217;re deep into it, I hope some of these resonate.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-kh4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-kh4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!-kh4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!-kh4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!-kh4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-kh4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1216515,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/180675155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-kh4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 424w, https://substackcdn.com/image/fetch/$s_!-kh4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 848w, https://substackcdn.com/image/fetch/$s_!-kh4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 1272w, https://substackcdn.com/image/fetch/$s_!-kh4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746cd1ff-8111-4f8f-b7f9-84db223f998f_7838x7838.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Treat AI-Generated code as a draft]]></title><description><![CDATA[Keep human eyes, judgment, and ownership at the center of AI written code]]></description><link>https://addyo.substack.com/p/treat-ai-generated-code-as-a-draft</link><guid isPermaLink="false">https://addyo.substack.com/p/treat-ai-generated-code-as-a-draft</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Tue, 25 Nov 2025 16:43:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!esjQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>tl;dr:</strong> <strong>Treat AI-generated code as a draft. It can write the first version, but never outsource the reading.</strong> No human review means no reliable trace from behavior back to intent. When you stop reviewing AI drafts, you stop knowing why the code works at all. Practically, hold AI-written code to the same standards as human team mates.</p><h2><strong>Never outsource the reading - always review AI&#8217;s first draft</strong></h2><p><strong>AI can write a first version of code, but </strong><em><strong>humans must do the reading and reviewing</strong></em><strong> to ensure intent and quality.</strong> </p><p>If you stop reviewing AI-generated drafts, you stop knowing why the code works (or if it truly does) &#8211; there&#8217;s no reliable trace from behavior back to intent. In other words, <em>LLMs don&#8217;t ship bad code, teams do</em>. When no one takes responsibility for checking AI-written code, bad code slips through not because the model failed, but because the workflow failed to demand a higher standard <a href="https://asymm.com/the-new-rules-of-ai-generated-code-accountability/#:~:text=LLMs%20Don%E2%80%99t%20Ship%20Bad%20Code%2C,Teams%20Do">[1]</a>. </p><p>Treat the AI&#8217;s output as <strong>untrusted input</strong> &#8211; it might be syntactically correct and even pass tests, but it hasn&#8217;t earned your trust until a human verifies it. AI models often produce <em>plausible-looking but subtly flawed code</em>, including hallucinated functions or insecure patterns <a href="https://www.metacto.com/blogs/establishing-code-review-standards-for-ai-generated-code#:~:text=The%20Phantom%20Menace%20of%20%E2%80%9CHallucinated%E2%80%9D,Code">[2]</a>. So never merge code that hasn&#8217;t been read and understood by a human. As one engineer put it, blindly trusting AI output without verification risks immediate bugs <em>and</em> &#8220;systematically degrades our ability to catch these errors&#8221; because the very skills needed to validate code atrophy from disuse <a href="https://codebytom.blog/2025/07/09/the-hidden-cost-of-ai-reliance/comment-page-1/#:~:text=When%20we%20blindly%20trust%20AI,ones%20that%20atrophy%20from%20disuse">[3]</a>. </p><p>In short, always insist on a human-in-the-loop: AI can draft, but only a human can ensure the code&#8217;s behavior matches the intended purpose.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QEAT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QEAT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 424w, https://substackcdn.com/image/fetch/$s_!QEAT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 848w, https://substackcdn.com/image/fetch/$s_!QEAT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!QEAT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QEAT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:264046,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/179880341?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QEAT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 424w, https://substackcdn.com/image/fetch/$s_!QEAT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 848w, https://substackcdn.com/image/fetch/$s_!QEAT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!QEAT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf803e1f-1e2a-4e71-b3da-3ee98dc891b6_1892x1058.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://addyo.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Elevate is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>Blind reliance on AI erodes critical thinking and skills</strong></h2><p><strong>Engineering leaders worry that developers who blindly accept AI-generated code will lose their critical thinking abilities.</strong> </p><p>The concern isn&#8217;t hypothetical &#8211; early research bears it out. Studies have found that heavy use of AI assistants correlates with <em>lower brain engagement and reduced critical thinking performance </em><a href="https://codebytom.blog/2025/07/09/the-hidden-cost-of-ai-reliance/comment-page-1/#:~:text=Research%20consistently%20points%20to%20concerning,These">[4]</a>. In practice, developers dependent on AI may skip fundamental tasks like reading documentation or debugging errors themselves. One veteran engineer confessed that using AI&#8217;s instant answers made him &#8220;worse at his own craft.&#8221; </p><p>He stopped reading docs (&#8220;why bother when an LLM can explain it instantly?&#8221;) and even stopped analyzing errors &#8211; instead, he&#8217;d copy-paste stack traces into the AI and paste the AI&#8217;s answers back into the code. &#8220;I&#8217;ve become a human clipboard,&#8221; he lamented <a href="https://codebytom.blog/2025/07/09/the-hidden-cost-of-ai-reliance/comment-page-1/#:~:text=What%20does%20this%20look%20like,and%20solutions%20back%20to%20code">[5]</a>. This kind of cognitive offloading means the developer isn&#8217;t reasoning through problems anymore; the AI is doing the thinking, and the human is just transcribing. The result is not only diminished skill, but also less vigilance &#8211; if developers assume the AI is always right, they may miss subtle bugs or security issues they would have caught before. In fact, the ease and polish of AI output can lull engineers into a false sense of security, lowering their skepticism during reviews <a href="https://asymm.com/the-new-rules-of-ai-generated-code-accountability/#:~:text=1,Skepticism">[6]</a>. </p><p>The irony is that AI was supposed to boost productivity, but over-reliance can make individuals <em>less</em> capable. &#8220;We&#8217;re not becoming 10&#215; developers with AI, we&#8217;re becoming 10&#215; dependent on AI,&#8221; as one author observed &#8211; trading long-term understanding for short-term speed <a href="https://codebytom.blog/2025/07/09/the-hidden-cost-of-ai-reliance/comment-page-1/#:~:text=When%20we%20consistently%20choose%20the,that%20leads%20to%20breakthrough%20innovations">[7]</a>. The takeaway: to maintain your engineering sharpness, you must stay intellectually engaged with the code. Use AI as a tool, not a crutch &#8211; always challenge and verify its solutions rather than accepting them blindly.</p><h2><strong>Skipping the learning process in favor of speed hurts growth</strong></h2><p><strong>Many teams have leapt straight into using AI for speed, bypassing the learning and understanding that should accompany its use.</strong> </p><p>The promise of AI coding tools is high velocity &#8211; <em>generate, generate, generate</em> &#8211; but this often comes at the expense of developers truly grasping what they&#8217;re building. When you rely on AI to write code you don&#8217;t fully understand, you are <em>skipping the essential learning process</em> that makes you a better engineer <a href="https://matthewmartin.dev/posts/20250202-dont-outsource-what-you-dont-understand/#:~:text=Worse%20still%2C%20is%20that%20this,because%20you%20use%20AI%20without">[8]</a>. </p><p>The mistakes, trial-and-error, and research that traditionally accompany coding aren&#8217;t just hurdles &#8211; they are the training ground where critical skills develop. By outsourcing the heavy lifting to AI, junior devs in particular may never acquire the depth of knowledge to assess or improve the code being produced. </p><p>This creates a vicious cycle: <em>you produce poor code because you use AI without experience, and you never gain experience because you keep using AI</em><a href="https://matthewmartin.dev/posts/20250202-dont-outsource-what-you-dont-understand/#:~:text=Worse%20still%2C%20is%20that%20this,because%20you%20use%20AI%20without">[8]</a>. As one commentator bluntly asked, <em>if your role is reduced to just prompting AI for code you don&#8217;t understand, what value are you adding?</em><a href="https://matthewmartin.dev/posts/20250202-dont-outsource-what-you-dont-understand/#:~:text=The%20mistakes%2C%20the%20trial%20and,value%20are%20you%20really%20adding">[9]</a>.</p><p>We&#8217;ve largely skipped the phase where AI could be used as a learning aid or tutor, and jumped straight to using it as an auto-coder for output. Ideally, developers would use AI to <strong>improve understanding</strong> &#8211; for example, asking an AI to explain a tricky piece of code, or to suggest why a solution works &#8211; and even do a local &#8220;self review&#8221; with the AI before handing code to others. But in practice, many are just hitting &#8220;accept&#8221; on suggestions and moving on. This means they might deliver a feature faster, but with only shallow knowledge of how it works or why certain patterns were used. </p><p>Over time, that lack of understanding accumulates into a serious skill gap. Senior engineers worry about newcomers who can pump out code with AI assistance yet struggle to debug or extend it, because they never <em>learned</em> the underlying concepts. Indeed, engineering leaders report that while juniors now ship features faster than ever, when something breaks &#8220;they struggle to debug code they don&#8217;t understand&#8221; <a href="https://www.softwareseni.com/why-ai-coding-speed-gains-disappear-in-code-reviews/#:~:text=AI,debug%20code%20they%20don%E2%80%99t%20understand">[10]</a>. </p><p>The craft of software engineering is about far more than producing code that <em>runs</em> &#8211; it&#8217;s about knowing <em>why</em> the code is written that way, and how to evolve it. If we sidestep that journey, we risk creating a generation of programmers who can only operate with an AI on autopilot. To counteract this, treat AI output as an opportunity to learn: don&#8217;t just copy-paste answers, <strong>read them, question them, and ensure you could explain them</strong> to a colleague. Use AI to accelerate your work, not bypass your growth as an engineer <a href="https://matthewmartin.dev/posts/20250202-dont-outsource-what-you-dont-understand/#:~:text=Worse%20still%2C%20is%20that%20this,because%20you%20use%20AI%20without">[11]</a><a href="https://matthewmartin.dev/posts/20250202-dont-outsource-what-you-dont-understand/#:~:text=The%20mistakes%2C%20the%20trial%20and,AI%20for%20code%20you%20don%E2%80%99t">[12]</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z54T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z54T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 424w, https://substackcdn.com/image/fetch/$s_!z54T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 848w, https://substackcdn.com/image/fetch/$s_!z54T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!z54T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z54T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:779761,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/179880341?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z54T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 424w, https://substackcdn.com/image/fetch/$s_!z54T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 848w, https://substackcdn.com/image/fetch/$s_!z54T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!z54T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb041a203-0570-4199-b680-9d464a63ab3f_1824x1020.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>I&#8217;ve heard of seniors sending back PRs where its clear AI was used but the person didn&#8217;t understand what they were doing. When a junior submits an AI-generated PR, the review becomes the primary venue for mentorship. Ask Socratic questions that force them to explain the AI&#8217;s output. This ensures understanding, not just functionality. Reviews become about comprehension, not just correctness.</em></p><h2><strong>Code reviews are straining under AI-generated code</strong></h2><p><strong>Traditional code review practices are struggling to cope with AI-generated code, leaving teams unsure how to maintain quality.</strong> </p><p>Code reviews have always been the safety net for catching errors and ensuring code quality. But AI assistance changes the game: AI can produce <em>much larger diffs</em> in an instant, often touching many lines or files, which means reviewers face more volume and potentially more complexity in each pull request. In fact, studies found that pull requests heavy with Copilot-generated code take about <strong>26% longer to review on average</strong>, because reviewers must untangle unfamiliar patterns and double-check for AI-specific mistakes <a href="https://thenewstack.io/how-to-measure-the-roi-of-ai-coding-assistants/">[13]</a>. </p><p>Reviewers also report a psychological effect: when examining code they didn&#8217;t write, especially if it&#8217;s syntactically polished, their confidence drops &#8211; they take longer to validate logic and may second-guess their understanding <a href="https://www.softwareseni.com/why-ai-coding-speed-gains-disappear-in-code-reviews/#:~:text=requests%2C%20unfamiliar%20code%20patterns%2C%20and,take%20longer%20to%20validate%20logic">[14]</a>. AI can churn out code that <em>looks</em> clean and modern (consistent naming, proper formatting) which can lower reviewers&#8217; skepticism <a href="https://asymm.com/the-new-rules-of-ai-generated-code-accountability/#:~:text=1,Skepticism">[6]</a>. It&#8217;s easy to assume the code is sound if it &#8220;looks professional,&#8221; making it more likely that subtle bugs or design flaws slip through.</p><p>Another complication is <strong>lost intent</strong>. In a traditional review, the reviewer can discuss &#8220;what the author meant to do&#8221; &#8211; there&#8217;s a human intention to compare against the implementation. With AI-generated code, the code&#8217;s author might not fully grasp the intent behind every line, because they didn&#8217;t <em>write</em> it in the conventional sense. The original prompt given to the AI is essentially the spec, but reviewers often don&#8217;t see that prompt <a href="https://asymm.com/the-new-rules-of-ai-generated-code-accountability/#:~:text=2,the%20Prompt%20Asked%20For">[15]</a>. This means a reviewer is left guessing at the requirements and whether the AI&#8217;s solution actually meets them, rather than just reviewing whether the code works. </p><p>As one report noted, <em>reviewers are no longer assessing what the developer meant to do, but rather what the model actually did </em><a href="https://asymm.com/the-new-rules-of-ai-generated-code-accountability/#:~:text=functionally%20equivalent%20to%20writing%20a,inputs%2C%20implicit%20assumptions%2C%20or%20insecure">[16]</a>. Traditional code review checklists (focused on style, obvious logic errors, etc.) aren&#8217;t enough, because AI code can fail in non-traditional ways &#8211; e.g. using an outdated algorithm that a junior dev wouldn&#8217;t know, or introducing an edge-case bug that isn&#8217;t immediately obvious.</p><p>Teams are also encountering <strong>review overload</strong>. An AI pair programmer can generate code faster than a human, which means a single developer can open very large pull requests or many pull requests in short time. This &#8220;velocity&#8221; can overwhelm the team&#8217;s capacity to give thorough reviews. It&#8217;s akin to slop in code form &#8211; flooding the reviewer with so much output that it&#8217;s hard to pinpoint the issues <a href="https://www.reddit.com/r/SoftwareEngineering/comments/1kjwiso/maintaining_code_quality_with_widespread_ai/#:~:text=The%20code%20lacks%20clear%20architecture%3F,Suggest%20refactoring">[17]</a>. In such cases, some organizations have instituted new policies: for example, if a PR is more than 30% AI-generated (by lines or content), it might trigger a required extra level of review or a more senior reviewer <a href="https://www.softwareseni.com/why-ai-coding-speed-gains-disappear-in-code-reviews/#:~:text=Harness%E2%80%99s%20engineering%20teams%20report%20that,pattern%20usage%20and%20architectural%20misalignment">[18]</a>. </p><p>The idea is to acknowledge that AI-heavy code needs <em>different</em> scrutiny levels, not business-as-usual. Another emerging practice is labeling AI contributions: explicitly marking in the pull request or commit message that &#8220;this code was assisted by AI.&#8221; This can cue reviewers to be extra vigilant. Indeed, experts recommend <strong>tagging and tracking AI-generated code</strong> for accountability &#8211; it helps reviewers know what to look for and helps teams trace bugs later (&#8220;was this bug from AI-written code?&#8221;)<a href="https://asymm.com/the-new-rules-of-ai-generated-code-accountability/#:~:text=4,Contributions">[19]</a>.</p><p>However, openly tagging AI involvement comes with a cultural challenge: developers must feel <strong>psychologically safe</strong> to disclose AI usage. If people fear judgment for using AI (&#8220;will my team think I&#8217;m lazy or less competent?&#8221;), they may hide it &#8211; and that&#8217;s worse for the team. Hidden AI usage means the team doesn&#8217;t know where potential risk lies and can&#8217;t adjust their reviews accordingly <a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=">[20]</a>. To counter this, forward-thinking teams encourage transparency without stigma. </p><p>Using AI should be treated like using any tool &#8211; it&#8217;s fine to use it, but you must own the output. As one guide put it, <em>never blame the AI for bugs</em> or quality issues; the engineer who committed the code owns it, period <a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=What%20developers%20must%20do%3A">[21]</a>. If everyone embraces that mindset, then saying &#8220;I used Cursor to help with this module&#8221; is simply a factual statement, not an admission of guilt. It allows the team to collectively ensure the AI-generated sections get proper attention. </p><p>Right now, our code review tools and norms are still catching up to these needs. We don&#8217;t yet have widespread automated detectors for AI code in PRs, and most diff viewers don&#8217;t show the AI&#8217;s prompt or reasoning. So, we need to rely on process and team agreements to fill the gap &#8211; explicitly calling out AI-written code, reviewing tests more rigorously, and possibly setting size limits to what we&#8217;ll accept from an AI without breakpoints for human review. </p><p><strong>If questionable code is making it past PR unchallenged, the issue is not just AI &#8211; it&#8217;s that the review process isn&#8217;t robust enough</strong> to catch these problems <a href="https://www.reddit.com/r/SoftwareEngineering/comments/1kjwiso/maintaining_code_quality_with_widespread_ai/#:~:text=darknessgp">[22]</a>. It&#8217;s a call to action that code review practices must evolve alongside AI adoption.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7oR8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7oR8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 424w, https://substackcdn.com/image/fetch/$s_!7oR8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 848w, https://substackcdn.com/image/fetch/$s_!7oR8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 1272w, https://substackcdn.com/image/fetch/$s_!7oR8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7oR8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:460018,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/179880341?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7oR8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 424w, https://substackcdn.com/image/fetch/$s_!7oR8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 848w, https://substackcdn.com/image/fetch/$s_!7oR8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 1272w, https://substackcdn.com/image/fetch/$s_!7oR8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72ae4b8-4580-4898-98c4-a9ba25b5d2a0_1824x1026.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>66% of developers in a <a href="https://survey.stackoverflow.co/2025">Stack Overflow survey</a> said the most common frustration with AI assistants is that the code is &#8220;almost right, but not quite. 45% of developers in the Stack Overflow survey reported that time spent debugging AI-generated code was their biggest time sink. Quality gates and validation are now the critical path</em></p><h2><strong>Best practices: treating AI-generated code as a draft</strong></h2><p>To use AI coding tools effectively, we must adjust our habits and processes. Think of AI output as a <em>first draft from a junior developer</em> &#8211; valuable, but in need of careful review and refinement. Here are some pragmatic best practices to ensure that AI-generated code boosts productivity <em>without</em> sacrificing quality or understanding:</p><ul><li><p><strong>Never merge code you don&#8217;t understand.</strong> If an AI helped produce some code, the onus is on <em>you</em> (the developer) to read every line and make sure you get it. You should be able to explain what the code does and why. If there&#8217;s any part of the AI-generated snippet that you can&#8217;t follow, treat that as a red flag &#8211; either refine the prompt, have the AI explain it, or rewrite that part yourself. Some open-source projects explicitly require that contributors <em>certify they understand the code they submit</em>, even if AI wrote it. In professional settings, the same principle applies: take full ownership of any code you commit, regardless of who (or what) authored it<a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=What%20developers%20must%20do%3A">[21]</a>. In practice, this means <strong>running the code, writing or reviewing tests for it, and stepping through its logic</strong> before it ever hits your team&#8217;s repository.</p></li><li><p><strong>Treat AI code like an intern&#8217;s code &#8211; don&#8217;t trust, verify.</strong> AI doesn&#8217;t possess context or wisdom; it&#8217;s more like a very fast, eager junior developer. It will confidently produce a solution, but that solution might be overly simplistic, miss edge cases, or use patterns that are out of place for your codebase. As a best practice, approach AI contributions with healthy skepticism. Check boundary conditions, look for off-by-one errors, thread safety issues, or other corner cases that a less-experienced coder might overlook <a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=AI,It%20may%20work%2C%20but">[23]</a><a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=,that%20can%20not%20be%20tested">[24]</a>. Often, AI will do exactly what you asked, <em>not necessarily what you truly need</em>. So cross-verify the output against the requirements. If it&#8217;s a complex or critical piece of code, consider manually reimplementing it after seeing the AI&#8217;s draft &#8211; you might catch nuances the AI missed. Remember the mantra for AI output: <strong>&#8220;Don&#8217;t trust. Verify.&#8221; </strong><a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=Best%20practice%3A%20Don%E2%80%99t%20trust">[25]</a></p></li><li><p><strong>Use AI as a coding assistant, not an author &#8211; incorporate it into your own thinking.</strong> Instead of just asking AI to spit out code and blindly pasting it, use it in a conversational, explanatory way. For example, you can ask the AI to <em>explain</em> the code it just suggested, or to generate comments for it. You can have it suggest test cases for the code, which you then run to see if the code truly works. AI can also help by summarizing a large diff or identifying potential problem areas in a PR (some advanced code review tools now offer AI-generated summaries). All these uses keep you, the human, in the driver&#8217;s seat. You&#8217;re leveraging AI to augment your understanding, not replace it. One recommended practice is to <strong>review tests first</strong> for AI-generated changes <a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=1,Mindset%3A%20Review%20Tests%20First">[26]</a> &#8211; ensure there&#8217;s a solid test suite covering the new code. If tests are weak or missing, that&#8217;s your cue to write more before trusting the code. Also, use strict linting and static analysis on AI code: AI might not follow your team&#8217;s idioms out-of-the-box, so enforce style and architecture rules with automated tools <a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=2,Enhanced%20Linting%20Rules">[27]</a><a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=,rules%20to%20reflect%20your%20norms">[28]</a>. If the AI suggests something that doesn&#8217;t fit your usual patterns, don&#8217;t hesitate to refactor it. Essentially, make AI your <em>pair programmer</em> who writes draft code and gives ideas, but <em>you</em> still make all final edits and decisions.</p></li><li><p><strong>Thoroughly test and secure AI-generated code.</strong> It&#8217;s crucial to apply the same (or higher) level of testing to AI-written code as you would to handmade code. Write unit tests and integration tests to cover the functionality. Specifically look for edge cases and potential failure modes &#8211; AI is notorious for handling the &#8220;happy path&#8221; but ignoring unusual inputs or error handling. Also consider security: common vulnerabilities like SQL injection, XSS, insecure deserialization, etc., might slip in if the AI drew from a code example with a flaw <a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=4">[29]</a><a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=,functions">[30]</a>. Use security linters or scanners (tools like Semgrep or Bandit can catch obvious issues <a href="https://blog.bonfy.ai/code-review-in-the-age-of-ai-best-practices-for-reviewing-ai-generated-code#:~:text=">[31]</a>). If the AI generated any dependency or configuration, ensure you review those for secrets or insecure defaults. Treat the AI&#8217;s code as if you hired a contractor whose work you don&#8217;t fully trust &#8211; double-check everything, because ultimately <strong>your team is accountable for any bugs or security holes</strong>, no matter who wrote the code.</p></li><li><p><strong>Leverage AI for self-review before seeking peer review.</strong> One productive pattern is to ask the AI to critique its own output <em>before</em> you open a pull request. For example, after getting a code suggestion, you might prompt, &#8220;What potential issues do you see in this code? Any edge cases or improvements?&#8221; The AI might point out a condition you didn&#8217;t consider or a more idiomatic approach. It&#8217;s like a spell-check for logic &#8211; not infallible, but it can catch low-hanging fruit. This doesn&#8217;t replace a human review, but it can help you <strong>clean up the draft</strong> so that your peers aren&#8217;t distracted by obvious problems. Think of it as you collaborating with the AI to polish the code, then handing it to your team. This also helps you learn, as the AI&#8217;s review comments can highlight areas you need to think about. Just remember to verify any AI feedback; sometimes it might &#8220;hallucinate&#8221; problems that aren&#8217;t real, so use your judgment.</p></li><li><p><strong>If an AI-generated change is too large or confusing, break it down.</strong> Don&#8217;t let the AI&#8217;s speed force you into merging giant, monolithic changes. If Cursor spews out 500 lines of mixed modifications, it might be better to treat that as a prototype. Perhaps run the code to see if the approach works, then <em>reimplement the solution in smaller, comprehensible pieces</em>. One developer likened an initial AI-generated draft to a <strong>spike solution</strong> &#8211; a quick and dirty implementation to prove a concept<a href="https://www.reddit.com/r/SoftwareEngineering/comments/1kjwiso/maintaining_code_quality_with_widespread_ai/#:~:text=In%20my%20experience%2C%20it%20tends,as%20a%20kind%20of%20spike">[32]</a>. You wouldn&#8217;t merge a spike into production; you&#8217;d refine it. Similarly, take the AI draft and iteratively improve it: maybe split that big PR into multiple commits or pull requests that are easier to review. Often the second draft (written with the insight gained from the first) is much cleaner and more maintainable<a href="https://www.reddit.com/r/SoftwareEngineering/comments/1kjwiso/maintaining_code_quality_with_widespread_ai/#:~:text=In%20my%20experience%2C%20it%20tends,as%20a%20kind%20of%20spike">[32]</a>. This disciplined approach prevents the &#8220;gish gallop&#8221; effect where the AI dumps so much code that reviewers can&#8217;t effectively review it. By breaking it down, you ensure that each piece gets adequate human attention.</p></li><li><p><strong>Document and label AI contributions when sharing with the team.</strong> In your pull request description or code comments, it can be helpful to note which parts were generated by AI or if you relied heavily on an AI for a solution. For example: &#8220;Used Gemini/Opus/GPT to generate the initial implementation of this sorting algorithm; reviewed and modified the result.&#8221; This kind of transparency helps reviewers know where to focus. It&#8217;s not about blaming the AI or you but about <em>context</em>. In fact, marking AI-generated code with clear comments or annotations is encouraged as a way to create accountability and traceability<a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=The%20problem%3A%20When%20developers%20hide,which%20AI%20tool%20generated%20it">[33]</a>. If an odd bug appears later, the team can trace it back and see, &#8220;Oh, this chunk was AI-written based on prompt X&#8221; and that might make debugging easier. Of course, do this in a supportive culture (see next section) &#8211; the goal is to collectively safeguard quality, not to call someone out. Some teams even keep a log of AI-assisted changes for auditing purposes <a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=The%20problem%3A%20When%20developers%20hide,which%20AI%20tool%20generated%20it">[33]</a>. At the very least, consider sharing the prompt you used with your reviewers, e.g. in a PR comment. That way the reviewer understands <em>what you asked for</em> and can judge if the AI&#8217;s code actually matches the intent<a href="https://asymm.com/the-new-rules-of-ai-generated-code-accountability/#:~:text=2,the%20Prompt%20Asked%20For">[15]</a>. This prompt-as-spec technique can bridge the gap between intention and implementation.</p></li></ul><p>In summary, treating AI code as a draft means <em>applying all the same rigor you would to a human novice&#8217;s code</em>: you review it deeply, test it thoroughly, and don&#8217;t assume anything is correct until proven. The AI can drastically speed up writing boilerplate and even suggest solutions, but <strong>you are the engineer</strong> &#8211; you must integrate those suggestions into the codebase responsibly.</p><h2><strong>Establish team agreements for AI-generated code</strong></h2><p><strong>To successfully integrate AI into development, teams should set clear guidelines &#8211; essentially a &#8220;contract&#8221; &#8211; on how to handle AI-generated code.</strong> This is a new frontier, and misalignment can cause friction or quality issues. A team working agreement might include rules, responsibilities, and cultural norms around AI usage. Here are some key elements teams are adopting:</p><ul><li><p><strong>Ensure accountability doesn&#8217;t lapse.</strong> Make it explicit that whoever integrates AI-generated code into the codebase is responsible for it, full stop. No pointing fingers at the AI. If a bug is introduced, it&#8217;s treated like any other bug you&#8217;d introduce. This principle, supported by industry guides, says developers must <em>take full ownership of any code they commit, regardless of who wrote it, and test AI-generated code as thoroughly as their own </em><a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=What%20developers%20must%20do%3A">[21]</a>. Management should reinforce that using AI is not an excuse for lower quality. Code reviewers and approvers also share responsibility &#8211; if you approve a change, you&#8217;re vouching for it as usual. Essentially, AI doesn&#8217;t change the definition of &#8220;code owner.&#8221;</p></li><li><p><strong>Define how and when AI should be used.</strong> As a team, discuss what types of tasks are appropriate for AI assistance. For example, you might agree that AI is great for generating unit tests, boilerplate, scaffolding, or exploring multiple approaches &#8211; but perhaps you&#8217;ll avoid using it for core complex algorithms without additional review. Some teams may forbid AI use for security-sensitive code or critical algorithms, unless a senior engineer supervises closely. Others might say it&#8217;s fine to use AI for anything as long as you follow the other rules (understand it, test it, etc.). The key is to set expectations. This also ties into <strong>ethical and legal considerations</strong> (e.g. ensuring AI output doesn&#8217;t include copied licensed code, or doesn&#8217;t introduce biases), but that&#8217;s another essay in itself. The point is, an agreed policy prevents misunderstandings like one dev merging huge AI-written chunks that others aren&#8217;t comfortable with.</p></li><li><p><strong>Emphasize transparency and psychological safety.</strong> The team contract should encourage developers to be open about AI involvement. For instance, a guideline could be: &#8220;If AI assisted significantly in a change, mention it in the PR.&#8221; Leaders must foster an environment where this admission is seen positively (as due diligence), not negatively. A lack of transparency can lead to &#8220;shadow AI&#8221; in your codebase &#8211; code that is AI-written but nobody realizes it, making debugging and maintenance harder <a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=">[20]</a>. To avoid that, make transparency the norm. One practice is adding a simple comment in the code like // Code generated with AI assistance or using a tag in PRs. The team might also agree on documenting prompts in the project wiki or in the code review for future reference <a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=The%20problem%3A%20When%20developers%20hide,which%20AI%20tool%20generated%20it">[33]</a>. If someone feels they don&#8217;t fully understand an AI-generated section, they should feel safe to say so and ask for help or extra review <a href="https://jellyfish.co/library/ai-in-software-development/responsibility-of-developers-generative-ai/#:~:text=The%20problem%3A%20When%20developers%20hide,which%20AI%20tool%20generated%20it">[33]</a>. It&#8217;s far better to admit &#8220;I&#8217;m not 100% confident in what Copilot produced here&#8221; than to pretend everything is fine. Psychological safety ensures people speak up, which ultimately protects the code quality and the developers&#8217; growth.</p></li><li><p><strong>Integrate AI-awareness into the review process.</strong> Teams should update their code review checklists or definitions-of-done to account for AI. For example, a review checklist might add items like &#8220;If code was AI-generated, has the author provided the prompt or described the intent?&#8221; or &#8220;For AI-generated code, double-check for common issues (edge cases, security, style consistency).&#8221; Some organizations formalize this by requiring an extra pair of eyes on AI-heavy code, as noted earlier<a href="https://www.softwareseni.com/why-ai-coding-speed-gains-disappear-in-code-reviews/#:~:text=To%20address%20this%20challenge%2C%20some,code%20requires%20different%20scrutiny%20levels">[34]</a>. Training sessions can help too &#8211; a team might do a brownbag meeting on &#8220;typical AI mistakes&#8221; so all reviewers know what to watch for (e.g. unnecessary complexity, missing null checks, etc.). The team could also adopt tools to assist, like AI-powered code analysis that flags likely problematic code patterns. Ultimately, the whole review culture may shift to treat AI contributions with a bit more rigor. As a shared rule, you might say: <em>No AI-generated code gets merged without thorough human review, no exceptions</em>. It seems obvious, but stating it sets the tone that speed will not trump quality.</p></li><li><p><strong>Support continuous learning and skill development.</strong> To address the critical thinking atrophy issue, a team agreement can explicitly encourage practices that keep skills sharp. For instance, pair programming sessions where one person doesn&#8217;t use AI and explains their thought process, or rotations on challenging bug fixes without AI. Or even simply encouraging developers to occasionally implement things &#8220;the hard way&#8221; first, before using AI to optimize. Some companies have gone as far as tracking how AI impacts debugging time and making sure employees still know how to troubleshoot without the tool<a href="https://www.softwareseni.com/why-ai-coding-speed-gains-disappear-in-code-reviews/#:~:text=AI,debug%20code%20they%20don%E2%80%99t%20understand">[10]</a>. An agreement could be: &#8220;We use AI to speed up routine tasks, but we still expect engineers to understand and be able to manually handle the complex parts.&#8221; By acknowledging this in your team principles, you validate the importance of human expertise. Leads and managers in particular should lead by example &#8211; demonstrating in code reviews that they scrutinize AI-generated code just as they would any code, asking thoughtful questions. Junior devs will take cues from that and learn that AI is not a get-out-of-thinking-free card.</p></li></ul><p>In essence, a team&#8217;s AI code agreement is about <strong>maintaining quality, clarity, and trust</strong>. Everyone should know how AI is being used and agree on the standards its output must meet. This &#8220;contract&#8221; might be a living document that evolves as you gain experience. The goal is to prevent the scenario where AI quietly degrades your codebase or your engineers&#8217; skills. Instead, with rules in place, AI can be harnessed as a powerful accelerator <em>with guardrails</em>. It forces conversations now about topics that were previously implicit (like &#8220;do you understand what you committed?&#8221;) &#8211; now we make them explicit.</p><h2><strong>Conclusion: AI is not a replacement for understanding</strong></h2><p>AI coding tools are here to stay, and they <strong>excel at generating drafts</strong> &#8211; the scaffolding, the boilerplate, even complex code that might take a human much longer to write from scratch. Embracing them can lead to huge gains in productivity and free developers from drudgery. But the moment we start treating AI-generated code as &#8220;fire-and-forget,&#8221; we undermine the very benefits we seek. </p><p>The true value of AI in software engineering comes when we pair its speed with our judgment. <strong>That means always reviewing AI output with a critical eye, staying curious about </strong><em><strong>why</strong></em><strong> the code works, and insisting on clarity and correctness.</strong> When you treat AI-generated code as a draft, you acknowledge it&#8217;s a work in progress &#8211; to be massaged and perfected by human insight.</p><p>By maintaining high standards for code quality and developer education, we ensure that AI is a tool that <strong>augments our capabilities rather than atrophying them</strong>. We keep the &#8220;why&#8221; and &#8220;how&#8221; in focus even as the &#8220;what&#8221; is delivered to us on a platter. In practical terms: don&#8217;t stop reading code. </p><p>Whether written by an intern, an AI, or a seasoned colleague, code must be understood to be trusted. If you never outsource the reading and thinking, you retain the ability to connect a code&#8217;s behavior back to the intent behind it &#8211; which is the essence of software engineering. </p><p><strong>Use AI to move faster, by all means, but </strong><em><strong>keep your hands on the wheel</strong></em><strong>.</strong> </p><p>The code that lands in production should always have a human&#8217;s eyes (and heart) behind it. That way, we get the best of both worlds: the efficiency of AI-generated first drafts and the reliability of human-reviewed, well-understood final code.</p><p><em>I&#8217;m excited to share I&#8217;ve released a new <a href="https://beyond.addy.ie/">AI-assisted engineering book</a> with O&#8217;Reilly. There are a number of free tips on the book site in case interested.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!esjQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!esjQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!esjQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!esjQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!esjQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!esjQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:353186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/179880341?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!esjQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!esjQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!esjQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!esjQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c76c36d-5242-4274-a946-99821cd84da8_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://addyo.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Elevate is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Critical Thinking during the age of AI]]></title><description><![CDATA[Who, what, where, when, why, how]]></description><link>https://addyo.substack.com/p/critical-thinking-during-the-age</link><guid isPermaLink="false">https://addyo.substack.com/p/critical-thinking-during-the-age</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Fri, 21 Nov 2025 15:31:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!54oK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a time where AI can generate code, design ideas, and occasionally plausible answers on demand, the need for <strong>human critical thinking</strong> is greater than ever. Even the smartest automation can&#8217;t replace the ability to ask the right questions, challenge assumptions, and think independently at this time.</p><p>This essay explores the importance of critical thinking skills for software engineers and technical teams using the classic <strong>&#8220;Who, what, where, when, why, how&#8221;</strong> framework to structure pragmatic guidance. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b1bu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b1bu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b1bu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b1bu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b1bu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b1bu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;No alternative text description for this image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="No alternative text description for this image" title="No alternative text description for this image" srcset="https://substackcdn.com/image/fetch/$s_!b1bu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b1bu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b1bu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b1bu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c3b29f-fdec-4f47-8ab2-ca0804a30ed4_1536x1536.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>tl;dr: Critical thinking checklist for AI-augmented teams</strong></p><ul><li><p><strong>Who:</strong> Don&#8217;t rely on AI as an oracle. Verify its output.</p></li><li><p><strong>What:</strong> Define the <em>real</em> problem before rushing to a solution.</p></li><li><p><strong>Where:</strong> Context is king. A fix that works in a sandbox might break in production.</p></li><li><p><strong>When:</strong> Know when to use a quick heuristic (triage) vs. deep analysis (root cause).</p></li><li><p><strong>Why:</strong> Use the &#8220;5 Whys&#8221; technique to uncover underlying causes.</p></li><li><p><strong>How:</strong> Communicate with evidence and data, not just opinions.</p></li></ul><p>We&#8217;ll dive into how each of these question categories applies to decision-making in an AI-augmented world, with concrete examples and common pitfalls. The goal is to show how humble curiosity and evidence-based reasoning can keep projects on track and avoid downstream issues.</p><h2>Who: Involve the right people and perspectives</h2><p><strong>Who is involved in defining or solving the problem?</strong> In technical projects, critical thinking starts by identifying the <em>who</em>. </p><p>This means knowing who the stakeholders are (e.g. engineers, product managers, users, domain experts) and making sure the right people are engaged in decision-making. Problems in engineering for example are rarely solved in isolation &#8211; they affect users and often span multiple teams. A critical thinker asks: <em>Who should we consult or inform? Who might have relevant expertise or a different perspective?</em> Including diverse viewpoints is essential. </p><p>Otherwise, teams risk falling into <strong>groupthink</strong>, where everyone converges on the same idea and dissenting opinions are silenced. Groupthink can fool a team into validating only their aligned views without questioning if those views rest on good data or sound assumptions. To counter this, effective teams encourage questions from all members and even bring in outsiders for fresh eyes. In short, <em>who</em> is in the room (and <em>who</em> isn&#8217;t) can make or break the objectivity of technical decisions.</p><p><strong>Who should we listen to &#8211; human or AI?</strong> In the age of AI assistants, we must also critically assess <em>who</em> an answer is coming from. Is it the output of a large language model or a seasoned colleague? An AI might confidently provide an answer that sounds authoritative, but remember it&#8217;s a statistical engine. <strong>Who &#8220;said&#8221; it matters.</strong> </p><p>A critical thinker treats an AI&#8217;s output as just another input to examine, not an oracle. If an entity (like an AI) hands us a plausible-sounding answer, our human tendency is to accept it and not dig deeper. This cognitive laziness isn&#8217;t new &#8211; it&#8217;s a general human weakness to take the easy, <em>&#8220;sounds good&#8221;</em> solution and run with it. But in engineering, blindly trusting an answer can be dangerous. If an AI code assistant suggests a code snippet or architecture, ask: <em>Who authored this suggestion? Does the AI actually understand our context?</em> Treat AI outputs as if coming from an inexperienced intern &#8211; verify everything. </p><p>For example, if Cursor provides a code fix for a bug, a critical engineer would review that code thoroughly and test it, just as they would review a junior developer&#8217;s work. The <em>who</em> question reminds us that <strong>accountability and understanding lie with the humans</strong>, regardless of AI involvement.</p><p><strong>Who is responsible and who is affected?</strong> Finally, critical thinking means staying aware of who will be affected by technical decisions. Shipping a quick-and-dirty fix might satisfy a manager in the short term, but who will maintain the code later? If a system fails, who bears the cost &#8211; is it the end-users, the on-call engineers, the company&#8217;s reputation? </p><p>Considering the human impact grounds our problem-solving in reality. It fosters <em>humility</em> &#8211; a recognition that our decisions affect real people and that we ourselves don&#8217;t have all the answers. Great engineers and product people cultivate this humility. They know there&#8217;s always more to learn and that no single person has the complete picture. By adopting a posture of learning and asking colleagues questions, they fill in their knowledge gaps and catch mistakes early.</p><p>In practice, this might mean a backend developer double-checks a feature&#8217;s impact with a frontend teammate (&#8220;Could this API change break the mobile app?&#8221;) or a developer seeking a security review from the infosec team rather than assuming &#8220;it&#8217;s probably fine.&#8221; In short, critical thinking in teams is a social endeavor: it thrives when <em>who</em> is involved includes a mix of people willing to question each other and themselves.</p><h2>What: define the real problem and gather evidence</h2><p><strong>What problem are we actually trying to solve?</strong> This is perhaps the most important question. A classic pitfall in engineering is rushing to solve <em>something</em> without confirming it&#8217;s the <em>right</em> thing. </p><p>In fact, Harvard Business Review has <a href="https://hbr.org/2012/09/the-power-of-defining-the-prob">emphasized</a> that rigorously defining the problem upfront ensures we address the right challenges and avoid wasted effort. In practice, this means taking time to clarify requirements and success criteria. Imagine a scenario: a product manager requests <em>&#8220;an AI feature to summarize user data&#8221;</em>. Jumping straight to coding a summarization algorithm would be premature before asking <em>what</em> the end goal is. Is the goal to help users understand their data trends? If so, maybe the &#8220;right problem&#8221; is actually that users are overwhelmed by raw data, and the solution might involve better visualization rather than just a summary. </p><p>Critical thinking urges us to explicitly articulate the problem and question initial assumptions. Our natural instinct is to go into <strong>&#8220;<a href="https://www.sngular.com/insights/337/are-we-solving-the-right-problems">problem-solving mode</a>&#8221;</strong> immediately, but this <em>tends to lead us to quick, surface-level fixes rather than more strategic and thoughtful solutions.</em> In other words, if we don&#8217;t slow down to define <em>what</em> needs solving, we risk fixing a symptom or a poorly-understood issue. A thoughtful engineer will ask early: <em>&#8220;How do we know we&#8217;re solving the right problem?&#8221;</em> &#8211; a simple question that can save immense time and prevent downstream headaches.</p><p>Concretely, defining <em>what</em> the problem is involves gathering evidence and facts. For example, suppose users are complaining that a system is &#8220;slow.&#8221; Rather than blindly optimizing random code, a critical thinker will ask: <em>What is slow, exactly?</em> Is it page load time, a specific query, or the entire app? <em>What evidence do we have?</em> Maybe logs show one database query is taking 5 seconds. That frames the problem clearly: improving that query&#8217;s performance. Similarly, in debugging scenarios, asking <em>&#8220;What changed?&#8221;</em> when something broke often points to the cause &#8211; a recent deployment or config update. This investigative mindset ensures we tackle the <em>actual</em> cause rather than just the first guess.</p><p><strong>What evidence supports our solution or conclusion?</strong> Critical thinking in engineering is fundamentally about evidence-based decision making. It&#8217;s not enough to have an idea but we need to justify it with data or logical reasoning. Always ask: <em>&#8220;Does the evidence support this conclusion?&#8221;</em> For instance, if an AI model suggests that a bug is due to a null pointer exception, don&#8217;t accept it at face value &#8211; check the logs or write a unit test to confirm. If a performance test indicates improvement, verify the results on multiple runs or environments. In modern AI-assisted development, this is especially vital. </p><p>Large language models (LLMs) often produce answers that <strong>sound</strong> correct. They&#8217;re excellent at sounding confident, which can trick even experienced engineers.</p><blockquote><p>If some entity gives you a good enough result, probably you aren&#8217;t going to spend much time improving it unless there is a good reason to do so. Likewise you probably aren&#8217;t going to spend a lot of time researching something that AI tells you if it sounds plausible. This is certainly a weakness, but it&#8217;s a general weakness in human cognition, and has little to do with AI in and of itself. - <a href="https://news.ycombinator.com/item?id=43057907#:~:text=,sounds%20plausible">Hacker News</a></p></blockquote><p>However, a plausible answer isn&#8217;t necessarily a true one. LLM answers are &#8220;almost always&#8221; <em>plausible-sounding but with no guarantee of being correct &#8211; a tremendous flaw with real consequences</em>. A critical thinker treats any proposed solution (whether from AI or a teammate) as a hypothesis to be tested, not a fact. They gather evidence to confirm or refute it. This might involve running an experiment, collecting metrics, or searching for analogous past incidents.</p><p>Consider the example of evaluating an AI-generated code snippet. Suppose Cursor provides a solution for timezone conversions. Instead of simply copy-pasting and assuming validity, a critical developer tests it against various formats and edge cases. If they discover the code fails on complex offsets, this evidence dictates the next step- perhaps switching to a dedicated library. By asking, &#8220;What data supports this?&#8221;, engineers avoid the trap of confirmation bias.</p><p>Instead, they actively look for <em>falsifying</em> evidence. In technical debates, confirmation bias might lead someone to defend their initial design choice and ignore alternative approaches. The antidote is to seek out data or feedback that challenges your idea: if you believe a new feature improved load time, also look at any cases where it regressed performance. <strong>What</strong> we know (and don&#8217;t know) should drive decisions, not just what we <em>feel</em>. Good critical thinkers are almost like scientists &#8211; they gather facts, run tests, and let evidence rather than ego determine the path forward.</p><h2>Where: consider context and scope</h2><p><strong>Where does this problem occur, and where will a solution apply?</strong> Context is everything in engineering. A fix that works perfectly in one environment might fail in another. Critical thinking means being mindful of <em>where</em> our assumptions hold true. Engineers should ask: <em>Where is the boundary of this issue? Where in the system or workflow are we seeing the effects?</em> </p><p>For example, if an AI ops tool flags an anomaly in system metrics, we should pinpoint where &#8211; which server, which module &#8211; before reacting. A spike in CPU on one microservice doesn&#8217;t mean the whole system is failing. By localizing <em>where</em> the problem lives, we avoid over-generalizing or deploying unnecessary global &#8220;fixes.&#8221; Similarly, consider <em>where</em> a solution will be used. Is the code running on a user&#8217;s low-powered smartphone or on a beefy cloud server? The context might dictate very different approaches. A critical thinker is always aware of the environment: <em>&#8220;Where will this code run? Where are the users encountering difficulties?&#8221;</em></p><p><strong>Where are the gaps in our knowledge?</strong> Asking &#8220;where&#8221; also means identifying where we need more information. If we&#8217;re debugging a distributed system, we might realize we don&#8217;t know where a specific request fails &#8211; is it at the client, the API gateway, or the database? That&#8217;s a cue to gather more data (e.g. add logging at various points) to determine the location of failure. Similarly, if a product idea is being discussed, critical thinking prompts us to ask <em>where in the user journey this idea fits</em>. This prevents solving a non-issue; perhaps the &#8220;cool feature&#8221; is addressing a part of the app that users rarely visit. Knowing <em>where</em> helps allocate effort to where it matters most.</p><p>To illustrate, imagine planning an experiment for a new feature rollout. A critical question is: <em>Where will we test it &#8211; in a staging environment, with internal users, or as a small percentage A/B test in production?</em> Each context has pros and cons. Testing in a realistic environment (like a small percentage of live users) may reveal issues that an isolated lab test won&#8217;t. On the other hand, some experiments should stay in a sandbox to avoid impacting real users. By explicitly considering <em>where</em> an experiment runs, engineers ensure they approach testing with appropriate rigor given the constraints. It&#8217;s easy to get false confidence from a perfect lab result that doesn&#8217;t hold in the messy real world.</p><p>Finally, &#8220;where&#8221; can be metaphorical: <em>Where could this solution cause side effects? Where might this decision have downstream impact?</em> Thinking a few steps ahead is a hallmark of seasoned engineers. For example, when modifying a shared library, ask where else that library is used. This way, you anticipate ripple effects and can check those places or alert those teams before problems occur. In sum, <strong>contextual awareness</strong> &#8211; spatial, environmental, and systemic &#8211; is a key part of critical thinking. It prevents tunnel vision. Great engineers don&#8217;t just solve <em>a</em> problem; they solve the problem <em>in the right place</em> and with full awareness of the setting.</p><h2>When: timing, timelines, and when to dive deep</h2><p><strong>When did or will something happen?</strong> The dimension of time is crucial in technical work. Critical thinking involves asking <em>when</em> both in diagnosing issues and in planning work. In troubleshooting, understanding <em>when</em> a bug first appeared or <em>when</em> a system behaves differently often reveals the cause. (&#8220;The system crashed at 3 AM last night &#8211; what happened around that time?&#8221; Perhaps a nightly job or a deployment coincided with the crash.) Experienced engineers habitually ask: <em>&#8220;When did it last work? What&#8217;s changed since then?&#8221;</em> This line of questioning is often more effective at finding the root cause than blindly guessing. It ties into evidence gathering &#8211; a deploy timeline or version history might show exactly when a faulty piece of code went live.</p><p><strong>When should we apply more rigor, and when is a quick heuristic enough?</strong> Not every decision warrants days of analysis; part of critical thinking is knowing <em>when</em> to go deep. In engineering, we constantly balance thoroughness against time constraints. Project deadlines and on-call incidents can create immense pressure to act quickly. Under stress or tight timelines, humans tend to rely on intuition and mental shortcuts &#8211; what cognitive scientists call <em>heuristics</em>. These are useful, but they also open the door to biases and mistakes. </p><p>Research at NASA has <a href="https://appel.nasa.gov/2018/04/11/mitigating-cognitive-bias-in-engineering-decision-making/#:~:text=Although%20cognitive%20bias%20is%20a,likelihood%20that%20bias%20will%20occur">noted</a> that when engineers are under stress or have limited time, they make faster decisions that are <strong>more prone to error</strong> than those made with time to reflect. This doesn&#8217;t mean we can always avoid urgency, but it means we should <strong>acknowledge the risk</strong>. A critical thinker under time pressure will consciously slow down on the most crucial aspects of the decision. For instance, if you&#8217;re debugging a production outage at 2 AM, you might use a quick fix to get the system running (that&#8217;s a heuristic &#8211; e.g. restart a service). But a critical mindset means you&#8217;ll also note, <em>&#8220;This is a band-aid. I need to investigate the root cause in the morning.&#8221;</em> In other words, know <em>when</em> to apply a temporary fix and <em>when</em> to invest in a permanent solution.</p><p>Approaching rigor with limited time often involves triage: prioritizing which questions need deep answers now and which can be answered later. A useful prompt is, <em>&#8220;How do we approach this with rigor given time constraints?&#8221;</em> For example, in planning a new feature under a tight deadline, critical thinking might lead a team to identify the riskiest assumption and test it early (even in a quick-and-dirty way), rather than trying to perfect every detail. They focus on <em>when</em> each piece of information is needed. Is it okay to decide the UI later, but crucial to validate the algorithm now? If so, time is allocated accordingly.</p><p>Good critical thinkers also develop a sense of timing for interventions. <em>When should we ask for help?</em> If a problem remains unsolved after a certain amount of time, a critical engineer knows it might be time to get a second pair of eyes or escalate to a wider team discussion. <em>When should we pause and reconsider?</em> On teams practicing Agile, this might be at sprint boundaries or before major releases &#8211; essentially built-in &#8220;when&#8221; checkpoints to ask if they&#8217;re on the right track. And <em>when have we done enough analysis?</em> There is a point of diminishing returns. </p><p>Being rigorous doesn&#8217;t mean being paralyzed by analysis. It means doing the <em>right amount</em> of thinking for the decision at hand. As an example, if you have a day to debug an issue, spending the first 4 hours to methodically gather data is wise; spending 23 hours to get a perfect answer might mean missing the deadline. Critical thinking helps balance these through self-awareness: knowing when you&#8217;re falling into analysis paralysis versus when you&#8217;re leaping to conclusions too soon.</p><h2>Why: questioning motives, causes, and rationale</h2><p><strong>Why are we doing this?</strong> The &#8220;why&#8221; questions get to the heart of motivation and causality. In an engineering context, constantly asking <em>why</em> serves two big purposes: (1) ensuring there&#8217;s a sound rationale for actions (so we&#8217;re not just doing things because &#8220;someone said so&#8221;), and (2) drilling down to find root causes of problems rather than treating symptoms. A critical thinker faced with a task &#8211; say, implementing a new AI tool &#8211; will ask: <em>&#8220;Why do we need this tool? What problem will it solve and why is that important?&#8221;</em></p><p>If the best answer the team has is &#8220;because it&#8217;s trendy&#8221; or &#8220;our competitor has it,&#8221; that should spark concern. Chasing buzzwords without a clear <em>why</em> can lead teams to invest in solutions that don&#8217;t actually address their users&#8217; needs. On the other hand, articulating a strong why (e.g. &#8220;to reduce the time users spend analyzing their data by automating summaries&#8221;) aligns the team on the real goal. It fosters independent thinking &#8211; an engineer confident in the <em>why</em> can independently make better decisions during implementation, because they understand the end goal deeply rather than just following orders.</p><p><strong>Why did this happen?</strong> When something goes wrong (or right), asking &#8220;why&#8221; repeatedly is a proven technique to get beyond superficial answers. In fact, the <strong><a href="https://reliability.com/resources/articles/5-whys-technique-root-cause-analysis-example-and-template/#:~:text=The%205%20Whys%20Technique%20is,level%20symptoms">Five Whys</a></strong> technique in root cause analysis is essentially institutionalized critical thinking &#8211; it forces you to peel back causes layer by layer. The idea is to <a href="https://www.qualitygurus.com/five-whys-analysis-how-to-use-it-to-improve-business-performance/">avoid jumping</a> on the first explanation and instead uncover the chain of causality. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-ez0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-ez0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 424w, https://substackcdn.com/image/fetch/$s_!-ez0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 848w, https://substackcdn.com/image/fetch/$s_!-ez0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 1272w, https://substackcdn.com/image/fetch/$s_!-ez0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-ez0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56680,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/178957072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-ez0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 424w, https://substackcdn.com/image/fetch/$s_!-ez0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 848w, https://substackcdn.com/image/fetch/$s_!-ez0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 1272w, https://substackcdn.com/image/fetch/$s_!-ez0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18628494-1acb-4eda-bcf0-e4500d5ff54d_1920x1080.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For example, imagine a machine learning model&#8217;s accuracy suddenly drops. A naive response might be, <em>&#8220;The model is bad, retrain it.&#8221;</em> A critical approach would ask: <em>&#8220;Why did the accuracy drop? Because the input data distribution changed.&#8221;</em> Why did that happen? Perhaps a new data source was added. Why was that not accounted for? Maybe the data pipeline lacked a validation for distribution shifts. By the time you&#8217;ve asked &#8220;why&#8221; five times (or as many times as needed), you likely have a much clearer picture &#8211; maybe the real root cause was a flawed monitoring process that failed to catch the data drift early. </p><p>The difference is huge: a quick fix might just retrain the model (addressing the symptom of low accuracy temporarily), but the Five Whys approach might lead you to improve the monitoring system, preventing the issue from recurring. As one guide on the 5 Whys explains, this method <em>&#8220;aims to get to the heart of the matter rather than just addressing surface-level symptoms,&#8221;</em> encouraging teams to move beyond quick fixes to sustainable solutions.</p><p>However, <em>why</em> can be a double-edged sword if we&#8217;re not careful. Humans are prone to biases when answering why. One common pitfall is <strong>confirmation bias</strong> &#8211; we might latch onto a convenient explanation that fits our preconceptions and <a href="https://www.sngular.com/insights/337/are-we-solving-the-right-problems">stop</a> investigating further. For instance, an engineer might assume <em>&#8220;The server crashed because of a memory leak, which has happened before,&#8221;</em> and not consider other causes like a new configuration change, simply because the memory leak fits their mental model. If they don&#8217;t seek evidence to <em>disconfirm</em> their memory leak hypothesis, they might miss the real cause. </p><p>The earlier-mentioned <em>plunging-in bias</em> is another &#8220;why&#8221; trap: it&#8217;s the tendency to rush into solving a perceived problem without fully understanding it. Studies have found that this bias &#8211; jumping to conclusions and imposing pre-determined solutions &#8211; leads to addressing symptoms rather than root causes in about half of failed decisions studied. In other words, not asking &#8220;why&#8221; enough (or not the right why) can sink projects. The Harley-Davidson company in the 1980s famously misdiagnosed why they were losing market share, blaming external factors and thus implementing wrong solutions, when the real issues were internal practices. It took years for them to correct course, exemplifying how failing to pin down the true &#8220;why&#8221; can prolong pain.</p><p>Good critical thinkers are almost relentlessly curious about <em>why</em>. They maintain a <strong>humble curiosity</strong> &#8211; an openness to finding out that their initial assumption was wrong. They ask questions like: <em>&#8220;Why do we believe this approach will work? Is it because of actual data or just our gut? Why are users asking for this feature &#8211; what&#8217;s the underlying need?&#8221;</em> By drilling into reasons, they often catch logical gaps or uncover hidden requirements. Importantly, they also communicate the <em>why</em> behind decisions to others, which helps teams stay aligned and spot flaws. If you can&#8217;t clearly explain <em>why</em> a particular technical decision was made, that&#8217;s a red flag &#8211; either the decision lacks solid reasoning or that reasoning isn&#8217;t shared (both are dangerous). On the flip side, when everyone understands the rationale (the why), they can independently verify if new developments still support that rationale or if it needs revisiting.</p><h2>How: apply rigor and communicate clearly</h2><p>After exploring &#8220;who, what, where, when, why,&#8221; the final question is <strong>&#8220;How do we actually practice critical thinking day to day?&#8221;</strong> This is about the methods and mindset &#8211; how to approach problems rigorously yet efficiently, and how to carry solutions through with clear communication. Good critical thinkers tend to have a systematic approach. They <strong>formulate questions clearly</strong>, <strong>validate evidence</strong>, and <strong>communicate solutions logically</strong>. Let&#8217;s break this down:</p><ul><li><p><strong>How to approach problems methodically:</strong> Often it starts with asking better questions. Instead of vague queries, they ask specific, open-ended questions that lead to insight. For example, rather than &#8220;Is this design good?&#8221;, a critical thinker might ask &#8220;How does this design address the user&#8217;s primary need and how could it fail?&#8221; It&#8217;s important to avoid loaded or leading questions that just confirm what we already think. Maintaining an open mind and <a href="https://daily.dev/blog/critical-thinking-key-skill-for-software-developers">probing</a> for details yields much more useful information. A practical habit is to enumerate what you know and what you <strong>don&#8217;t</strong> know, then plan how to test or learn the latter. Think like a scientist: if you have a hypothesis (e.g. &#8220;the database is the bottleneck&#8221;), figure out how to prove or disprove it (perhaps by profiling or looking at query times). This structured interrogation of problems is at the core of critical thinking.</p></li><li><p><strong>How to validate evidence and avoid bias:</strong> Once you have data or answers, a critical thinker validates them. Does the data actually support the conclusion, or are there alternative interpretations? This might mean cross-checking metrics from two sources, reproducing a bug in a test environment to ensure it&#8217;s not a fluke, or getting a code review for an assumption you&#8217;ve made. It also means being aware of your own biases. As discussed, if you find yourself gravitating to an explanation too quickly, pause and ask, <em>&#8220;Am I considering all the evidence, or just the bits that confirm my theory?&#8221;</em> One strategy is actively seeking contradictory evidence. If you think a new feature improved retention, look at any cohort where retention didn&#8217;t improve &#8211; what&#8217;s different there? By <strong>welcoming negative data</strong>, you ensure you&#8217;re not kidding yourself. This is essentially a quality assurance mindset but applied to thinking: test the robustness of your ideas like you test your code. Additionally, frameworks and checklists can help maintain rigor. Some teams, for example, use a <strong>premortem</strong> exercise (imagining a future where the project failed and writing down reasons why) to surface potential issues and assumptions that weren&#8217;t initially considered. Such techniques enforce a more critical evaluation of a plan <em>before</em> it&#8217;s executed.</p></li><li><p><strong>How to communicate solutions and reasoning:</strong> A brilliant solution isn&#8217;t worth much if it can&#8217;t be communicated and implemented by the team. Critical thinking shines in how solutions are explained. Good engineers organize their explanation logically: start with the problem definition (the <em>what</em> and <em>why</em>), state the proposed solution (the <em>how</em>), and provide the evidence or reasoning backing it. They make their assumptions explicit and describe the trade-offs considered. This kind of communication not only helps others understand the proposal, it also serves as a final self-check for the thinker: if you can&#8217;t articulate it clearly, maybe your thinking isn&#8217;t clear yet. Importantly, critical thinkers use <strong>facts and data</strong> to bolster their communication, rather than hyperbole or opinion. As one engineering leadership article notes, <em>humble engineers prefer facts instead of opinions</em> &#8211; they will <a href="https://dangoslen.me/blog/a-case-for-being-a-humble-engineer/#:~:text=Lastly%2C%20humble%20engineers%20use%20facts,constant%2C%20and%20lead%20to%20solutions">cite the data</a> (&#8220;this change improved load time by 25% as measured on the dashboard&#8221;) rather than make boastful claims. This approach builds credibility. It shows you&#8217;re guided by evidence, which makes it easier for colleagues and stakeholders to trust the solution. Furthermore, clear communication involves listening and inviting feedback. A critical thinker doesn&#8217;t deliver a monologue; they encourage others to poke holes and ask questions, because that scrutiny will either validate the idea or help improve it. In meetings, this might look like: <em>&#8220;Here&#8217;s what I propose and why. Does anyone see any gap in this reasoning or have concerns?&#8221;</em> By fostering an open dialogue, they ensure the solution is robust and agreed upon, not just the loudest voice winning.</p></li></ul><p>Finally, <strong>&#8220;How do we ensure we&#8217;re continuously improving our critical thinking?&#8221;</strong> This meta-question is worth a thought. The answer is practice and reflection. Just as we do retrospectives for projects, doing mini-retrospectives on decisions can sharpen thinking skills. For instance, if a rushed decision led to a bug, a team can analyze: how did we miss it, and how can we catch such things next time? </p><p>Over time, engineers build a mental library of lessons learned (e.g. &#8220;Remember to check X, because last time we assumed and it burned us&#8221;). Many top engineers also cultivate habits like reading post-mortems from other companies&#8217; failures or studying cognitive biases to become familiar with traps they might fall into. Critical thinking isn&#8217;t a one-and-done checkbox; it&#8217;s a continuous, career-long discipline of staying curious, humble, and evidence-driven.</p><h2>Conclusion</h2><p>As AI gets increasingly used, critical thinking is <strong>not optional, but essential</strong>. </p><p>We should ask <em>Who</em> should be involved, <em>What</em> is the real problem, <em>Where</em> is the context, <em>When</em> to dig deeper, <em>Why</em> something is done, and <em>How</em> to do it properly. By using this classic framework pragmatically, technical teams can navigate complexity with clarity. </p><p>It means a culture where independent thinking is valued: team members feel safe to question a proposed solution (<em>&#8220;How do we know this is truly the fix and not a band-aid?&#8221;</em>), to challenge assumptions (<em>&#8220;Why are we sure the users want this feature?&#8221;</em>), and to demand evidence (<em>&#8220;Does the data actually show an improvement, or are we seeing what we want to see?&#8221;</em>). Embracing humble curiosity &#8211; the idea that no matter how experienced we are, we could be missing something &#8211; keeps engineers from falling prey to confirmation bias or overconfidence.</p><p>Critical thinking also protects against the allure of quick fixes. It&#8217;s understandably tempting to patch a problem and move on, especially under pressure. But as we&#8217;ve seen, failing to think critically about a quick fix can mean the same problem resurfaces later or, worse, that we fix the wrong thing entirely. By asking the tough questions upfront and validating before acting, we actually <strong>save time and trouble in the long run</strong>. We avoid downstream issues by catching them upstream &#8211; whether it&#8217;s discovering a design flaw before code is written or realizing an AI&#8217;s output is flawed before it reaches customers.</p><p>In conclusion, while AI and automation will continue to evolve and handle more routine work, <strong>critical thinking remains an uniquely human advantage</strong>. It&#8217;s how we ensure that we&#8217;re solving the right problems, in the right way, for the right reasons.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!54oK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!54oK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 424w, https://substackcdn.com/image/fetch/$s_!54oK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 848w, https://substackcdn.com/image/fetch/$s_!54oK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!54oK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!54oK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg" width="1456" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1146228,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/178957072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!54oK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 424w, https://substackcdn.com/image/fetch/$s_!54oK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 848w, https://substackcdn.com/image/fetch/$s_!54oK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!54oK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e90581d-19f6-4a27-bec6-26e9ce06b3c5_5246x3496.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Conductors to Orchestrators: The Future of Agentic Coding]]></title><description><![CDATA[From micro-manager to macro-manager: coding's asynchronous future]]></description><link>https://addyo.substack.com/p/conductors-to-orchestrators-the-future</link><guid isPermaLink="false">https://addyo.substack.com/p/conductors-to-orchestrators-the-future</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Sat, 01 Nov 2025 14:30:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!knBl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>AI coding assistants</strong> have quickly moved from novelty to necessity where up to 90% of software engineers use some kind of AI for coding. But a new paradigm is emerging in software development - one where engineers leverage <strong>fleets of autonomous coding agents</strong>. In this agentic future, the role of the software engineer is evolving from <strong>implementer</strong> to <strong>manager</strong>, or in other words, from <em>coder</em> to <strong>conductor</strong> and ultimately <strong><a href="https://www.youtube.com/watch?v=sQFIiB6xtIs">orchestrator</a></strong>.</p><p>Over time, developers will increasingly <strong>guide AI agents to build the right code</strong> and coordinate multiple agents working in concert. This write-up explores the distinction between <strong>Conductors</strong> and <strong>Orchestrators</strong> in AI-assisted coding, defines these roles, and examines how today&#8217;s cutting-edge tools embody each approach. Senior engineers may start to see the writing on the wall: our jobs are shifting from <em>&#8220;How do I code this?&#8221;</em> to <em>&#8220;How do I get the right code built?&#8221;</em> - a subtle but profound change.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://addyo.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Elevate is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xumY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xumY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 424w, https://substackcdn.com/image/fetch/$s_!xumY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 848w, https://substackcdn.com/image/fetch/$s_!xumY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 1272w, https://substackcdn.com/image/fetch/$s_!xumY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xumY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png" width="1456" height="812" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:635199,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/177541153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xumY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 424w, https://substackcdn.com/image/fetch/$s_!xumY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 848w, https://substackcdn.com/image/fetch/$s_!xumY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 1272w, https://substackcdn.com/image/fetch/$s_!xumY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb45e5d-d4fd-4f87-b6cc-8e49d07b7830_1678x936.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What&#8217;s the tl;dr of an orchestrator tool? It supports multi-agent workflows where you can run many agents in parallel without them interfering with each another. But let&#8217;s talk terminology more first.</p><h2><strong>The Conductor: Guiding a single AI agent</strong></h2><p>In the context of AI coding, acting as a <strong>Conductor</strong> means working closely with a single AI agent on a specific task, much like a conductor guiding a soloist through a performance.</p><p>The engineer remains in the loop at each step, dynamically steering the agent&#8217;s behavior, tweaking prompts, intervening when needed, and iterating in real-time. This is the logical extension of the &#8220;AI pair programmer&#8221; model many developers are already familiar with. With conductor-style workflows, <strong>coding happens in a synchronous, interactive session between human and AI</strong>, typically in your IDE or CLI.</p><p><strong>Key characteristics:</strong> A conductor keeps a tight feedback loop with one agent, verifying or modifying each suggestion, much as a driver navigates with a GPS. The AI helps write code, but the developer still performs many manual steps - creating branches, running tests, writing commit messages, etc., and ultimately decides which suggestions to accept.</p><p>Crucially, <strong>most of this interaction is ephemeral</strong>: once code is written and the session ends, the AI&#8217;s role is done and any context or decisions not captured in code may be lost. This mode is powerful for focused tasks and allows fine-grained control, but it doesn&#8217;t fully exploit what multiple AIs could do in parallel.</p><p><strong>Modern tools as Conductors:</strong> Several current AI coding tools exemplify the conductor pattern:</p><ul><li><p><strong>Claude Code (Anthropic):</strong> Anthropic&#8217;s Claude model offers a coding assistant mode (accessible via a CLI tool or editor integration) where the developer converses with Claude to generate or modify code. For example, with the <strong>Claude Code CLI</strong>, you navigate your project in a shell, ask Claude to implement a function or refactor code, and it prints diffs or file updates for you to approve. You remain the conductor: you trigger each action and review the output immediately. While Claude Code has features to handle long-running tasks and tools, in the basic usage it&#8217;s essentially a smart co-developer working step-by-step under human direction.</p></li><li><p><strong>Gemini CLI (Google):</strong> A command-line assistant powered by Google&#8217;s Gemini model, used for planning and coding with a very large context window. An engineer can prompt Gemini CLI to analyze a codebase or draft a solution plan, then iterate on results interactively. The human directs each step and Gemini responds within the CLI session. It&#8217;s a one-at-a-time collaborator, not running off to make code changes on its own (at least in this conductor mode).</p></li><li><p><strong>Cursor (Editor AI Assistant):</strong> The Cursor editor (a specialized AI-augmented IDE) can operate in an inline or chat mode where you ask it questions or to write a snippet, and it immediately performs those edits or gives answers within your coding session. Again, you guide it one request at a time. Cursor&#8217;s strength as a conductor is its deep context integration - it indexes your whole codebase so the AI can answer questions about any part of it. But the hallmark is that <strong>you, the developer, initiate and oversee each change</strong> in real time.</p></li><li><p><strong>VSCode, Cline, Roo Code (in-IDE chat):</strong> Similar to above, other coding agents also fall into this category. They suggest code or even multi-step fixes, but always under continuous human guidance.</p></li></ul><p>This conductor-style AI assistance has already boosted productivity significantly. It feels like having a junior engineer or pair programmer always by your side. However, it&#8217;s inherently <strong>one-agent-at-a-time and synchronous</strong>. To truly leverage AI at scale, we need to go beyond being a single-agent conductor. This is where the <strong>Orchestrator</strong> role comes in.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!51RX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!51RX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 424w, https://substackcdn.com/image/fetch/$s_!51RX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 848w, https://substackcdn.com/image/fetch/$s_!51RX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 1272w, https://substackcdn.com/image/fetch/$s_!51RX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!51RX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdb352c4-7273-45d7-810e-737b4133508d_1670x934.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:780354,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/177541153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!51RX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 424w, https://substackcdn.com/image/fetch/$s_!51RX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 848w, https://substackcdn.com/image/fetch/$s_!51RX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 1272w, https://substackcdn.com/image/fetch/$s_!51RX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdb352c4-7273-45d7-810e-737b4133508d_1670x934.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>The Orchestrator: Managing a fleet of agents</strong></h2><p>If a conductor works with one AI &#8220;musician,&#8221; an <strong>Orchestrator</strong> oversees the entire symphony of multiple AI agents working in parallel on different parts of a project. The orchestrator sets high-level goals, defines tasks, and lets a team of autonomous coding agents independently carry out the implementation details. </p><p>Instead of micromanaging every function or bug fix, the human focuses on <strong>coordination, quality control, and integration</strong> of the agents&#8217; outputs. In practical terms, this often means an engineer can <strong>assign tasks to AI agents (e.g. via issues or prompts) and have those agents asynchronously produce code changes - often as ready-to-review pull requests</strong>. The engineer&#8217;s job becomes reviewing, giving feedback, and merging the results, rather than writing all the code personally.</p><p>This asynchronous, parallel workflow is a fundamental shift. It moves AI assistance from the foreground to the background. <strong>While you attend to higher-level design or other work, your &#8220;AI team&#8221; is coding in the background.</strong> When they&#8217;re done, they hand you completed work (with tests, docs, etc.) for review. It&#8217;s akin to being a project tech lead delegating tasks to multiple devs and later reviewing their pull requests, except the &#8220;devs&#8221; are AI agents.</p><p><strong>Key characteristics:</strong> An orchestrator deals with <strong>autonomous agents</strong> that can plan and execute multi-step coding tasks with minimal intervention. These agents have more agency: they can clone your repo, create new git branches, edit multiple files, compile/run tests, and iteratively refine their solution before presenting it.</p><p>The orchestrator doesn&#8217;t see every intermediate step (unless they choose to peek in); they mainly ensure the final outcome aligns with requirements. Importantly, all this happens in a <strong>tracked, persistent workflow</strong> (often leveraging version control and CI pipelines) rather than ephemeral suggestions. For example, GitHub&#8217;s coding agent operates entirely via pull requests on GitHub, so every change is logged and reviewable. Another hallmark is concurrency: an orchestrator can spin up multiple agents to tackle different tasks simultaneously, dramatically parallelizing development.</p><p><strong>Modern tools as Orchestrators:</strong> Over just the past year, several tools have emerged that embody this orchestrator paradigm:</p><ul><li><p><strong>GitHub Copilot </strong><em><strong>Coding Agent</strong></em> (Microsoft): This upgrade to Copilot transforms it from an in-editor assistant into an <strong>autonomous background developer (</strong>I cover it in <a href="https://www.youtube.com/watch?v=sQFIiB6xtIs">this video</a>). You can assign a GitHub issue to Copilot&#8217;s agent or invoke it via the VS Code agents panel, telling it (for example) &#8220;Implement feature X&#8221; or &#8220;Fix bug Y&#8221;. Copilot then <strong>spins up an ephemeral dev environment via GitHub Actions, checks out your repo, creates a new branch, and begins coding</strong>. It can run tests, linters, even spin up the app if needed, all without human babysitting. When finished, it opens a pull request with the changes, complete with a description and meaningful commit messages. It then asks for your review. You, the human orchestrator, review the PR (perhaps using Copilot&#8217;s AI-assisted <strong>code review</strong> to get an initial analysis). If changes are needed, you can leave comments like @copilot please update the unit tests for edge case Z, and the agent will iterate on the PR. <strong>This is asynchronous, autonomous code generation in action.</strong> Notably, Copilot automates the tedious book-keeping: branch creation, committing, opening PRs, etc., which used to cost developers time. All the grunt work around writing code (aside from the design itself) is handled, allowing developers to focus on reviewing and guiding at a high level. GitHub&#8217;s agent effectively lets one engineer supervise many &#8220;AI juniors&#8221; working in parallel across different issues (and you can even create multiple specialized agents for different task types).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rmBg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rmBg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 424w, https://substackcdn.com/image/fetch/$s_!rmBg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 848w, https://substackcdn.com/image/fetch/$s_!rmBg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 1272w, https://substackcdn.com/image/fetch/$s_!rmBg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rmBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png" width="1456" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:540032,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/177541153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rmBg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 424w, https://substackcdn.com/image/fetch/$s_!rmBg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 848w, https://substackcdn.com/image/fetch/$s_!rmBg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 1272w, https://substackcdn.com/image/fetch/$s_!rmBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ced1b20-bbb4-4450-aa2d-6b108c0b0f2a_3010x1564.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></li><li><p><strong>Jules, Google&#8217;s Coding Agent:</strong> <strong>Jules</strong> is an autonomous coding agent. Jules is <strong>&#8220;not a co-pilot, not a code-completion sidekick, but an autonomous agent that reads your code, understands your intent, and gets to work.&#8221;</strong> Integrated with Google Cloud and GitHub, Jules lets you connect a repository and then ask it to perform tasks much as you would a developer on your team. Under the hood, Jules <strong>clones your entire codebase into a secure cloud VM</strong> and analyzes it with a powerful model. You might tell Jules: &#8220;Add user authentication to our app&#8221; or &#8220;Upgrade this project to the latest Node.js and fix any compatibility issues.&#8221; It will formulate a plan, present it to you for approval, and once you approve, execute the changes asynchronously. It makes commits on a new branch and can even open a pull request for you to merge. Jules handles writing new code, updating tests, bumping dependencies, etc., all while you could be doing something else. Crucially, Jules provides <strong>transparency and control</strong>: it shows you its proposed plan and reasoning before making changes, and allows you to intervene or modify instructions at any point (a feature Google calls &#8220;user steerability&#8221;). This is akin to giving an AI intern the spec and watching over their shoulder less frequently - you trust them to get it mostly right, but you still verify the final diff. Jules also boasts unique touches like <strong>audio changelogs</strong> (it generates spoken summaries of code changes) and the ability to run multiple tasks concurrently in the cloud. In short, Google&#8217;s Jules demonstrates the orchestrator model: you define the task, Jules does the heavy lifting asynchronously, and you oversee the result.<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DjpK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DjpK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 424w, https://substackcdn.com/image/fetch/$s_!DjpK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 848w, https://substackcdn.com/image/fetch/$s_!DjpK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!DjpK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DjpK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png" width="1400" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&#128025; Google Jules: The AI Coding Agent That Actually Works Autonomously | by  Elio Verhoef | Medium&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="&#128025; Google Jules: The AI Coding Agent That Actually Works Autonomously | by  Elio Verhoef | Medium" title="&#128025; Google Jules: The AI Coding Agent That Actually Works Autonomously | by  Elio Verhoef | Medium" srcset="https://substackcdn.com/image/fetch/$s_!DjpK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 424w, https://substackcdn.com/image/fetch/$s_!DjpK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 848w, https://substackcdn.com/image/fetch/$s_!DjpK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!DjpK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1ebab2-c5d9-4097-ba85-68707df9df17_1400x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></li><li><p><strong>OpenAI Codex (Cloud Agent):</strong> OpenAI introduced a new cloud-based <strong>Codex agent</strong> in to complement ChatGPT. This evolved Codex (different from the 2021 Codex model) is described as <strong>&#8220;a cloud-based software engineering agent that can work on many tasks in parallel&#8221;</strong>. It&#8217;s available as part of ChatGPT Plus/Pro under the name <em>OpenAI Codex</em> and via an npm CLI (npm i -g @openai/codex). With the Codex CLI or its VS Code/Cursor extensions, you can delegate tasks to OpenAI&#8217;s agent similar to Copilot or Jules. For instance, from your terminal you might say: <em>&#8220;Hey Codex, implement dark mode for the settings page&#8221;</em>. Codex then launches into your repository, edits the necessary files, perhaps runs your test suite, and when done, presents the diff for you to merge. It operates in an isolated sandbox for safety, running each task in a container with your repo and environment. Like others, OpenAI&#8217;s Codex agent integrates with developer workflows: you can even kick off tasks from a <strong>ChatGPT mobile app</strong> on your phone and get notified when the agent is done. OpenAI emphasizes seamless switching <strong>&#8220;between real-time collaboration and async delegation&#8221;</strong> with Codex. In practice, this means you have the flexibility to use it in conductor mode (pair-programming in your IDE) or orchestrator mode (hand off a background task to the cloud agent). Codex can also be invited into your Slack channels - teammates can assign tasks to @Codex in Slack and it will pull context from the conversation and your repo to execute them. It&#8217;s a vision of ubiquitous AI assistance, where coding tasks can be delegated from anywhere. Early users report that Codex can autonomously identify and fix bugs, or generate significant features, given a well-scoped prompt. All of this again aligns with the orchestrator workflow: the human defines the goal, the AI agent autonomously delivers a solution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pW8l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pW8l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 424w, https://substackcdn.com/image/fetch/$s_!pW8l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 848w, https://substackcdn.com/image/fetch/$s_!pW8l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 1272w, https://substackcdn.com/image/fetch/$s_!pW8l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pW8l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png" width="1274" height="847" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:847,&quot;width&quot;:1274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OpenAI Codex: The Autonomous AI Coding Agent | by Komal Raut | AI  Simplified in Plain English | Medium&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OpenAI Codex: The Autonomous AI Coding Agent | by Komal Raut | AI  Simplified in Plain English | Medium" title="OpenAI Codex: The Autonomous AI Coding Agent | by Komal Raut | AI  Simplified in Plain English | Medium" srcset="https://substackcdn.com/image/fetch/$s_!pW8l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 424w, https://substackcdn.com/image/fetch/$s_!pW8l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 848w, https://substackcdn.com/image/fetch/$s_!pW8l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 1272w, https://substackcdn.com/image/fetch/$s_!pW8l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884081a1-6333-408f-b74b-fe797d666bb2_1274x847.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong>Anthropic Claude Code (for Web):</strong> Anthropic has offered Claude as an AI chatbot for a while, and their Claude Code CLI has been a favorite for interactive coding. Anthropic took the next step by launching <strong>Claude Code for Web</strong>, effectively a hosted version of their coding agent. Using Claude Code for Web, you point it at your GitHub repo (with configurable sandbox permissions) and give it a task. The agent then runs in Anthropic&#8217;s managed container, just like the CLI version, but now you can trigger it from a web interface or even a mobile app. It queues up multiple prompts and steps, executes them, and when done, pushes a branch to your repo (and can open a PR). Essentially, Anthropic took their single-agent Claude Code and made it an orchestratable service in the cloud. They even provided a &#8220;teleport&#8221; feature to transfer the session to your local environment if you want to take over manually. The rationale for this web version aligns with orchestrator benefits: convenience and scale. You don&#8217;t need to run long jobs on your machine; Anthropic&#8217;s cloud handles the heavy lifting, with <strong>filesystem and network isolation</strong> for safety. Claude Code for Web acknowledges that <em>autonomy with safety</em> is key - by sandboxing the agent, they reduce the need for constant permission prompts, letting the agent operate more freely (less babysitting by the user). In effect, Anthropic has made it easier to use Claude as an autonomous coding worker you launch on demand.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6VKc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6VKc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 424w, https://substackcdn.com/image/fetch/$s_!6VKc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 848w, https://substackcdn.com/image/fetch/$s_!6VKc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 1272w, https://substackcdn.com/image/fetch/$s_!6VKc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6VKc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png" width="1456" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2423312,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/177541153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6VKc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 424w, https://substackcdn.com/image/fetch/$s_!6VKc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 848w, https://substackcdn.com/image/fetch/$s_!6VKc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 1272w, https://substackcdn.com/image/fetch/$s_!6VKc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa78c062a-d656-478d-99ea-19a7c3619790_3550x1990.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></li><li><p><strong>Cursor Background Agents:</strong> tl;dr - Cursor 2.0 has a more focused <a href="https://cursor.com/blog/2-0#the-multi-agent-interface">multi-agent interface</a> more focused around agents rather than files. Cursor 2 expands its <a href="https://cursor.com/docs/cloud-agent">Background Agents</a> feature into a full-fledged orchestration layer for developers. Beyond serving as an interactive assistant, Cursor 2 lets you spawn autonomous background agents that operate asynchronously in a managed cloud workspace. When you delegate a task, Cursor 2&#8217;s agents now clone your GitHub repository, spin up an ephemeral environment, and check out an isolated branch where they execute work end-to-end. These agents can handle the entire development loop - from editing and running code, to installing dependencies, executing tests, running builds, and even searching the web or referencing documentation to resolve issues. Once complete, they push commits and open a detailed pull request summarizing their work. Cursor 2 introduces multi-agent orchestration, allowing several background agents to run concurrently across different tasks - for instance, one refining UI components while another optimizes backend performance or fixes tests. Each agent&#8217;s activity is visible through a real-time dashboard that can be accessed from desktop or mobile, enabling you to monitor progress, issue follow-ups, or intervene manually if needed. This new system effectively treats each agent as part of an on-demand AI workforce, coordinated through the developer&#8217;s high-level intent. Cursor 2&#8217;s focus on parallel, asynchronous execution dramatically amplifies a single engineer&#8217;s throughput - fully realizing the orchestrator model where humans oversee a fleet of cooperative AI developers rather than a single assistant.<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RsSA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RsSA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 424w, https://substackcdn.com/image/fetch/$s_!RsSA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 848w, https://substackcdn.com/image/fetch/$s_!RsSA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 1272w, https://substackcdn.com/image/fetch/$s_!RsSA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RsSA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1529052,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/177541153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RsSA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 424w, https://substackcdn.com/image/fetch/$s_!RsSA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 848w, https://substackcdn.com/image/fetch/$s_!RsSA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 1272w, https://substackcdn.com/image/fetch/$s_!RsSA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd10a8fb0-9331-4162-bdf4-3b27c4acb9db_3010x1694.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></li><li><p><strong>Agent Orchestration Platforms:</strong> Beyond individual product offerings, there are also emerging <strong>platforms and open-source projects</strong> aimed at orchestrating multiple agents. For instance, <strong><a href="https://conductor.build/">Conductor</a></strong> by Melty Labs (despite its name!) is actually an orchestration tool that lets you deploy and manage multiple Claude Code agents on your own machine in parallel. With Conductor, each agent gets its own isolated Git worktree to avoid conflicts, and you can see a dashboard of all agents (&#8220;who&#8217;s working on what&#8221;) and review their code as they progress. The idea is to make running a small swarm of coding agents as easy as running one. Similarly, <strong><a href="https://smtg-ai.github.io/claude-squad/">Claude Squad</a></strong> is a popular open-source terminal app that essentially multiplexes Anthropic&#8217;s Claude - it can spawn several Claude Code instances working concurrently in separate tmux panes, allowing you to give each a different task and thus code &#8220;10x faster&#8221; by parallelizing. These orchestration tools underscore the trend: developers want to coordinate <em>multiple</em> AI coding agents and have them collaborate or divide work. Even Microsoft&#8217;s Azure AI services are enabling this - at Build 2025 they announced tools for developers to <strong>&#8220;orchestrate multiple specialized agents to handle complex tasks&#8221;</strong>, with SDKs supporting Agent-to-Agent communication so your fleet of agents can talk to each other and share context. All of this infrastructure is being built to support the <strong>orchestrator engineer</strong>, who might eventually oversee dozens of AI processes tackling different parts of the software development lifecycle.<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AFu_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AFu_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 424w, https://substackcdn.com/image/fetch/$s_!AFu_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 848w, https://substackcdn.com/image/fetch/$s_!AFu_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 1272w, https://substackcdn.com/image/fetch/$s_!AFu_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AFu_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp" width="1456" height="984" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:984,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:586276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/177541153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AFu_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 424w, https://substackcdn.com/image/fetch/$s_!AFu_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 848w, https://substackcdn.com/image/fetch/$s_!AFu_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 1272w, https://substackcdn.com/image/fetch/$s_!AFu_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1977a6f3-15ec-4112-b30d-95510af7df13_3256x2200.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><blockquote><p>&#8220;I found <a href="https://www.linkedin.com/redir/suspicious-page?url=https%3A%2F%2Fconductor%2ebuild%2F&amp;lipi=urn%3Ali%3Apage%3Ad_flagship3_detail_base%3B4m9RhAtxR6ebkFVf%2FmOdlg%3D%3D">Conductor</a> to make the most sense to me. It was a perfect balance of talking to an agent and seeing my changes in a pane next to it. Its Github integration feels seamless; e.g. after merging PR, it immediately showed a task as &#8220;Merged&#8221; and provided an &#8220;Archive&#8221; button.&#8221; - <a href="https://www.linkedin.com/in/juriyzaytsev?miniProfileUrn=urn%3Ali%3Afsd_profile%3AACoAAACPjPoB242NjG3ty49SjbsQdnWjb4xr0Tg&amp;lipi=urn%3Ali%3Apage%3Ad_flagship3_detail_base%3B4m9RhAtxR6ebkFVf%2FmOdlg%3D%3D">Juriy Zaytsev</a>, Staff SWE, LinkedIn</p><p>He also tried <a href="https://www.magnet.run/">Magnet</a>: &#8220;The idea of tying tasks to a Kanban board is interesting and makes sense. As such, Magnet feels very product -centric.&#8221;</p></blockquote><h2><strong>Conductor vs Orchestrator - Differences</strong></h2><p><strong>Many engineers will continue to engage in conductor-style workflows (single-agent, interactive) even as orchestrator patterns mature. The two modes will co-exist.</strong></p><p>It&#8217;s clear that &#8220;conductor&#8221; and &#8220;orchestrator&#8221; aren&#8217;t just fancy terms but they describe a genuine shift in how we work with AI:</p><ul><li><p><strong>Scope of control:</strong> A conductor operates at the micro level, guiding one agent through a single task or a narrow problem. An orchestrator operates at the macro level, defining broader tasks and objectives for multiple agents or for a powerful single agent that can handle multi-step projects. The conductor asks, &#8220;How do I solve this function or bug with the AI&#8217;s help?&#8221; The orchestrator asks, &#8220;What set of tasks can I delegate to AI agents today to move this project forward?&#8221;</p></li><li><p><strong>Degree of autonomy:</strong> In conductor mode, the AI&#8217;s autonomy is low - it waits for user prompts each step of the way. In orchestrator mode, we give the AI high autonomy - it might plan and execute dozens of steps internally (writing code, running tests, adjusting its approach) before needing human feedback. A GitHub Copilot agent or Jules will try to complete a feature from start to finish once assigned, whereas Copilot&#8217;s IDE suggestions only go line-by-line as you type.</p></li><li><p><strong>Synchronous vs Asynchronous:</strong> Conductor interactions are typically synchronous - you prompt, AI responds within seconds, you immediately integrate or iterate. It&#8217;s a real-time loop. Orchestrator interactions are asynchronous - you might dispatch an agent and check back minutes or hours later when it&#8217;s done (somewhat like kicking off a long CI job). This means orchestrators must handle waiting, context-switching, and possibly managing multiple things concurrently, which is a different workflow rhythm for developers.</p></li><li><p><strong>Artifacts and traceability:</strong> A subtle but important difference: orchestrator workflows produce persistent artifacts like branches, commits, and pull requests that are preserved in version control. The agent&#8217;s work is fully recorded (and often linked to an issue/ticket), which improves traceability and collaboration. With conductor-style (IDE chat, etc.), unless the developer manually commits intermediate changes, a lot of the AI&#8217;s involvement isn&#8217;t explicitly documented. In essence, orchestrators leave a paper trail (or rather a git trail) that others on the team can see or even trigger themselves. This can help bring AI into team processes more naturally.</p></li><li><p><strong>Human Effort Profile:</strong> For a conductor, the human is actively engaged nearly 100% of the time the AI is working - reviewing each output, refining prompts, etc. It&#8217;s interactive work. For an orchestrator, the human&#8217;s effort is front-loaded (writing a good task description or spec for the agent, setting up the right context) and back-loaded (reviewing the final code and testing it), but not much is needed in the middle. This means one orchestrator can manage more total work in parallel than would ever be possible by working with one AI at a time. Essentially, orchestrators leverage <strong>automation at scale</strong>, trading off fine-grained control for breadth of throughput.<br></p></li></ul><p>To illustrate, consider a common scenario: adding a new feature that touches frontend, backend, and requires new tests. As a conductor, you might open your AI chat and implement the backend logic with the AI&#8217;s help, then separately implement the frontend, then ask it to generate some tests - doing each step sequentially with you in the loop throughout. As an orchestrator, you could assign the backend implementation to one agent (Agent A), the frontend UI changes to another (Agent B), and test creation to a third (Agent C). You give each a prompt or an issue description, then step back and let them work concurrently. </p><p>After a short time, you get perhaps three PRs: one for backend, one for frontend, one for tests. Your job then is to review and integrate them (and maybe have Agent C adjust tests if Agents A/B&#8217;s code changed during integration). In effect, you managed a mini &#8220;AI team&#8221; to deliver the feature. This example highlights how orchestrators think in terms of <strong>task distribution and integration</strong>, whereas conductors focus on <strong>step-by-step implementation</strong>.</p><p>It&#8217;s worth noting that <strong>these roles are fluid, not rigid categories</strong>. A single developer might act as a conductor in one moment and an orchestrator the next. For example, you might kick off an asynchronous agent to handle one task (orchestrator mode) while you personally work with another AI on a tricky algorithm in the meantime (conductor mode). Tools are also blurring lines: as OpenAI&#8217;s Codex marketing suggests, you can seamlessly switch between collaborating in real-time and delegating async tasks. So, think of &#8220;conductor&#8221; vs &#8220;orchestrator&#8221; as two ends of a spectrum of AI-assisted development, with many hybrid workflows in between.</p><h2><strong>Why Orchestrators matter</strong></h2><p>Experts are suggesting that this shift to orchestration could be one of the biggest leaps in programming productivity we&#8217;ve ever seen. Consider the historical trends: we went from writing assembly to using high-level languages, then to using frameworks and libraries, and recently to leveraging AI for autocompletion. Each step abstracted away more low-level work. <strong>Autonomous coding agents are the next abstraction layer</strong> - instead of manually coding every piece, you describe what you need at a higher level and let multiple agents build it.</p><p>As orchestrator-style agents ramp up, we could imagine even larger percentages of code being drafted by AIs. What does a software team look like when AI agents generate, say, 80% or 90% of the code, and humans provide the 10% critical guidance and oversight? Many believe it doesn&#8217;t mean replacing developers - it means <strong>augmenting developers to build better software</strong>. We may witness an explosion of productivity where a small team of engineers, effectively managing dozens of agent processes, can accomplish what once took an army of programmers months. (Note: I continue to believe the code review loop where we&#8217;ll continue to focus our human skills is going to need work if all this code is not to be slop).</p><p>One intriguing possibility is that <strong>every engineer becomes, to some degree, a </strong><em><strong>manager</strong></em><strong> of AI developers</strong>. It&#8217;s a bit like everyone having a personal team of interns or junior engineers. Your effectiveness will depend on how well you can break down tasks, communicate requirements to AI, and verify the results. Human judgment will remain vital: deciding what to build, ensuring correctness, handling ambiguity, and injecting creativity or domain knowledge where AI might fall short. In other words, the skillset of an orchestrator - good planning, prompt engineering, validation, and oversight - is going to be in high demand. Far from making engineers obsolete, these agents could <strong>elevate engineers into more strategic, supervisory roles</strong> on projects</p><h2><strong>Toward an &#8220;AI Team&#8221; of specialists</strong></h2><p>Today&#8217;s coding agents mostly tackle implementation: write code, fix code, write tests, etc. But the vision doesn&#8217;t stop there. Imagine a full software development pipeline where <strong>multiple specialized AI agents handle different phases of the lifecycle, coordinated by a human orchestrator</strong>. This is already on the horizon. Researchers and companies have floated architectures where, for example, you have:</p><ul><li><p>a <strong>Planning Agent</strong> that analyzes feature requests or bug reports and breaks them into specific tasks</p></li><li><p>a <strong>Coding Agent</strong> (or several) that implement the tasks in code</p></li><li><p>a <strong>Testing Agent</strong> that generates and runs tests to verify the changes</p></li><li><p>a <strong>Code Review Agent</strong> that checks the pull requests for quality and standards compliance</p></li><li><p>a <strong>Documentation Agent</strong> that updates README or docs to reflect the changes</p></li><li><p>possibly a <strong>Deployment/Monitoring Agent</strong> that can roll out the change and watch for issues in production.</p></li></ul><p>In this scenario, the human engineer&#8217;s role becomes one of <strong>oversight and orchestration across the whole flow</strong>: you might initiate the process with a high-level goal (e.g., &#8220;Add support for payment via cryptocurrency in our app&#8221;), the planning agent turns that into sub-tasks, coding agents implement each sub-task asynchronously, the testing agent and review agent catch problems or polish the code, and finally everything gets merged and deployed under watch of monitoring agents. </p><p>The human would step in to approve plans, resolve any conflicts or questions the agents raise, and give final approval to deploy. This is essentially an <strong>&#8220;AI swarm&#8221;</strong> tackling software development end-to-end, with the engineer as the conductor of the orchestra.</p><p>While this might sound futuristic, we see early signs. Microsoft&#8217;s Azure AI Foundry now provides building blocks for multi-agent workflows and agent orchestration in enterprise settings, implicitly supporting the idea that multiple agents will collaborate on complex, multi-step tasks. Internal experiments at tech companies have agents creating pull requests that other agent reviewers automatically critique, forming an AI/AI interaction with a human in the loop at the end. In open-source communities, people have chained tools like Claude Squad (parallel coders) with additional scripts that integrate their outputs. And the conversation has started about standards like <strong>Model-Context Protocol (MCP)</strong> for agents sharing state and communicating results to each other.</p><p>I&#8217;ve noted before that &#8220;<em>specialized agents for Design, Implementation, Test, and Monitoring could work together to develop, launch, and land features in complex environments</em>&#8220; - with developers onboarding these AI agents to their team and guiding/overseeing their execution. In such a setup, agents would <em>&#8220;coordinate with other agents autonomously, request human feedback, reviews and approvals&#8221;</em> at key points, and otherwise handle the busywork amongst themselves. The goal is a <strong>central platform where we can deploy specialized agents across the workflow, without humans micromanaging each individual step</strong> - instead, the human oversees the entire operation with full context. </p><p>This could transform how software projects are managed: more like running an automated assembly line where engineers ensure quality and direction, rather than hand-crafting each component on the line.</p><h2><strong>Challenges and Human Role in orchestration</strong></h2><p>Does this mean programming becomes a push-button activity where you sit back and let the AI factory run? Not quite - and likely never entirely. There are significant challenges and open questions with the orchestrator model:</p><ul><li><p><strong>Quality control &amp; trust:</strong> Orchestrating multiple agents means you&#8217;re not eyeballing every single change as it&#8217;s made. Bugs or design flaws might slip through if you solely rely on AI. Human oversight remains <strong>critical</strong> as the final failsafe. Indeed, current tools explicitly require the human to review the AI&#8217;s pull requests before merging. The relationship is often compared to managing a team of junior developers: they can get a lot done, but you wouldn&#8217;t ship their code without review. The orchestrator engineer must be vigilant about checking the AI&#8217;s work, writing good test cases, and having monitoring in place. AI agents can make mistakes or produce logically correct but undesirable solutions (for instance, implementing a feature in a convoluted way). Part of the orchestration skillset is knowing <strong>when to intervene</strong> versus when to trust the agent&#8217;s plan. As the CTO of Stack Overflow wrote, <em>&#8220;developers maintain expertise to evaluate AI outputs&#8221;</em> and will need new <strong>&#8220;trust models&#8221;</strong> for this collaboration.</p></li><li><p><strong>Coordination &amp; conflict:</strong> When multiple agents work on a shared codebase, coordination issues arise - much like multiple developers can conflict if they touch the same files. We need strategies to prevent merge conflicts or duplicated work. Current solutions use <em>workspace isolation</em> (each agent works on its own git branch or separate environment) and clear task separation. For example, one agent per task, and tasks designed to minimize overlap. Some orchestrator tools can even automatically merge changes or rebase agent branches, but usually it falls to the human to integrate. Ensuring agents don&#8217;t step on each others&#8217; toes is an active area of development. It&#8217;s conceivable that in the future agents might negotiate with each other (via something like agent-to-agent communication protocols) to avoid conflicts, but today the orchestrator sets the boundaries.</p></li><li><p><strong>Context, shared state and hand-offs: </strong>Coding workflows are rich in state: repository structure, dependencies, build systems, test suites, style guidelines, team practices, legacy code, branching strategies etc. Multi-agent orchestration demands shared context, memory, and smooth transitions. But in enterprise settings: Context sharing across agents is non-trivial. Without a unified &#8220;workflow orchestration layer&#8221;, each agent can become a silo, working well in its domain but failing to mesh.  In a coding-engineering team this may translate into: one agent creates a feature branch, another one runs unit tests, another merges into master - if the first agent doesn&#8217;t tag metadata the second is expecting, you get breakdowns.</p></li><li><p><strong>Prompting and specifications:</strong> Ironically, as the AI handles more coding, <strong>the human&#8217;s &#8220;coding&#8221; moves up a level to writing specifications and prompts</strong>. The quality of an agent&#8217;s output is highly dependent on how well you specify the task. Vague instructions lead to subpar results or agents going astray. Best practices that have emerged include writing mini design docs or acceptance criteria for the agents - essentially treating them like contractors who need a clear definition of done. This is why we&#8217;re seeing ideas like <em>spec-driven development</em> for AI: you feed the agent a detailed spec of what to build, so it can execute predictably. Engineers will need to hone their ability to describe problems and desired solutions unambiguously. Paradoxically, it&#8217;s a very old-school skill (writing good specs and tests) made newly important in the AI era. As agents improve, prompts might get simpler (&#8220;write me a mobile app for X and Y with these features&#8221;) and yet yield more complex results, but we&#8217;re not quite at the point of the AI intuiting everything unsaid. For now, orchestrators must be excellent communicators to their digital workforce.</p></li><li><p><strong>Tooling and debugging:</strong> With a human developer, if something goes wrong, they can debug in real time. With autonomous agents, if something goes wrong (say the agent gets stuck on a problem or produces a failing PR), the orchestrator has to debug the situation: Was it a bad prompt? Did the agent misinterpret the spec? Do we roll back and try again or step in and fix it manually? New tools are being added to help here: for instance, <strong>checkpointing and rollback</strong> commands let you undo an agent&#8217;s changes if it went down a wrong path. Monitoring dashboards can show if an agent is taking too long or has errors. But effectively, orchestrators might at times have to drop down to conductor mode to fix an issue, then go back to orchestration. This interplay will improve as agents get more robust, but it highlights that orchestrating isn&#8217;t just &#8220;fire and forget&#8221; - it requires active monitoring. AI observability tools (tracking cost, performance, accuracy of agents) are likely to become part of the developer&#8217;s toolkit.</p></li><li><p><strong>Ethics and responsibility:</strong> Another angle - if an AI agent writes most of the code, who is responsible for license compliance, security vulnerabilities, or bias in that code? Ultimately the human orchestrator (or their organization) carries responsibility. This means orchestrators should incorporate practices like security scanning of AI-generated code and verifying dependencies. Interestingly, some agents like Copilot and Jules include built-in safeguards (they won&#8217;t introduce known vulnerable versions of libraries, for instance, and can be directed to run security audits). But at the end of the day, <em>&#8220;trust, but verify&#8221;</em> is the mantra. The human remains accountable for what ships, so orchestrators will need to ensure AI contributions meet the team&#8217;s quality and ethical standards.<br></p></li></ul><p>In summary, the rise of orchestrator-style development doesn&#8217;t remove the human from the loop - it <strong>changes the human&#8217;s position in the loop</strong>. We move from being the one turning the wrench to the one designing and supervising the machine that turns the wrench. It&#8217;s a higher-leverage position, but also one that demands broader awareness. </p><p>Developers who adapt to being effective conductors and orchestrators of AI will likely be <strong>even more valuable</strong> in this new landscape.</p><h2><strong>Conclusion: Every engineer a maestro?</strong></h2><p>Will every engineer become an orchestrator of multiple coding agents? It&#8217;s a provocative question, but trends suggest we&#8217;re headed that way for a large class of programming tasks. The day-to-day reality of a software engineer in the late 2020s could involve less heads-down coding and more high-level supervision of code that&#8217;s mostly written by AIs. </p><p>Today we&#8217;re already seeing early adopters treating AI agents as teammates - for example, some developers report delegating 10+ pull requests per day to AI, effectively <strong>treating the agent as an independent teammate rather than a smart autocomplete</strong>. Those developers free themselves to focus on system design, tricky algorithms, or simply coordinating even more work.</p><p>That said, the transition won&#8217;t happen overnight for everyone. Junior developers might start as &#8220;AI conductors,&#8221; getting comfortable working with a single agent, before they take on orchestrating many. Seasoned engineers are more likely to early-adopt orchestrator workflows, since they have the experience to architect tasks and evaluate outcomes. In many ways, it mirrors career growth: junior engineers implement (now with AI help), senior engineers design and integrate (soon with AI agent teams). </p><p>The tools we discussed - from GitHub&#8217;s coding agent to Google&#8217;s Jules to OpenAI&#8217;s Codex - are rapidly lowering the barrier to try this approach, so expect it to go mainstream quickly. The hyperbole aside, there&#8217;s truth that these capabilities can dramatically amplify what an individual developer can do.</p><p>So, will we all be orchestrators? Probably to some extent - yes. We&#8217;ll still write code, especially for novel or complex pieces that defy simple specification. But much of the boilerplate, routine patterns, and even a lot of sophisticated glue code could be offloaded to AI. The role of &#8220;software engineer&#8221; may evolve to emphasize product thinking, architecture, and validation, with the actual coding being a largely automated act. In this envisioned future, asking an engineer to crank out thousands of lines of mundane code by hand would feel as inefficient as asking a modern accountant to calculate ledgers with pencil and paper. Instead, the engineer would delegate that to their AI agents and focus on the creative and critical-thinking aspects around it.</p><p>Btw, yes, there&#8217;s plenty to be cautious about. We need to ensure these agents don&#8217;t introduce more problems than they solve. And the developer experience of orchestrating multiple agents is still maturing - it can be clunky at times. But the trajectory is clear. Just as continuous integration and automated testing became standard practice, <strong>continuous delegation to AI</strong> could become a normal part of the development process. The engineers who master both modes - knowing when to be a precise conductor and when to scale up as an orchestrator - will be in the best position to leverage this &#8220;agentic&#8221; world.</p><p>One thing is certain: the way we build software in the next 5-10 years will look quite different from the last 10. <strong>I want to stress that not all or most code will be agent-driven within a year or two, but that&#8217;s a direction we&#8217;re heading in.</strong> The keyboard isn&#8217;t going away, but alongside our keystrokes we&#8217;ll be issuing high-level instructions to swarms of intelligent helpers. In the end, the human element remains irreplaceable: it&#8217;s our judgment, creativity, and understanding of real-world needs that guides these AI agents toward meaningful outcomes. </p><p><strong>The future of coding isn&#8217;t AI or human, it&#8217;s AI </strong><em><strong>and</strong></em><strong> human - with humans at the helm as conductors and orchestrators, directing a powerful ensemble to achieve our software ambitions.</strong></p><p><em>I&#8217;m excited to share I&#8217;m written a new <a href="https://beyond.addy.ie">AI-assisted engineering book</a> with O&#8217;Reilly. If you&#8217;ve enjoyed my writing here you may be interested in checking it out.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!knBl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!knBl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!knBl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!knBl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!knBl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!knBl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png" width="1456" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7188581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/177541153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!knBl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!knBl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!knBl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!knBl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdac571f-5afb-495e-ab15-794c18d7702c_5246x3496.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://addyo.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Elevate is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Gemini CLI Tips & Tricks]]></title><description><![CDATA[~30 pro-tips for effectively using Gemini CLI for agentic coding]]></description><link>https://addyo.substack.com/p/gemini-cli-tips-and-tricks</link><guid isPermaLink="false">https://addyo.substack.com/p/gemini-cli-tips-and-tricks</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Tue, 21 Oct 2025 16:04:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3ImM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This guide is also available to star/follow on our <a href="https://github.com/addyosmani/gemini-cli-tips">GitHub repository</a>.</em></p><p><strong><a href="https://github.com/google-gemini/gemini-cli">Gemini CLI</a></strong> is an open-source AI assistant that brings the power of Google&#8217;s Gemini model directly into your <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=The%20Gemini%20CLI%20is%20an,via%20a%20Gemini%20API%20key">terminal</a>. It functions as a conversational, &#8220;agentic&#8221; command-line tool - meaning it can reason about your requests, choose tools (like running shell commands or editing files), and execute multi-step plans to help with your development <a href="https://cloud.google.com/blog/topics/developers-practitioners/agent-factory-recap-deep-dive-into-gemini-cli-with-taylor-mullen#:~:text=The%20Gemini%20CLI%20%20is,understanding%20of%20the%20developer%20workflow">workflow</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3ImM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3ImM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 424w, https://substackcdn.com/image/fetch/$s_!3ImM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 848w, https://substackcdn.com/image/fetch/$s_!3ImM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 1272w, https://substackcdn.com/image/fetch/$s_!3ImM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3ImM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png" width="1456" height="896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:521169,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/176589430?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3ImM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 424w, https://substackcdn.com/image/fetch/$s_!3ImM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 848w, https://substackcdn.com/image/fetch/$s_!3ImM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 1272w, https://substackcdn.com/image/fetch/$s_!3ImM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe58f305f-a049-48f5-a77a-489c731faa7f_1736x1068.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In practical terms, Gemini CLI acts like a supercharged pair programmer and command-line assistant. It excels at coding tasks, debugging, content generation, and even system automation, all through natural language prompts. Before diving into pro tips, let&#8217;s quickly recap how to set up Gemini CLI and get it running.</p><h2><strong>Getting Started</strong></h2><p><strong>Installation:</strong> You can install Gemini CLI via npm. For a global install, use:</p><pre><code>npm install -g @google/gemini-cli</code></pre><p>Or run it without installing using <code>npx</code>:</p><pre><code>npx @google/gemini-cli</code></pre><p>Gemini CLI is available on all major platforms (it&#8217;s built with Node.js/TypeScript). Once installed, simply run the <code>gemini</code> command in your terminal to launch the interactive <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Interactive%20Mode%20,conversational%20session">CLI</a>.</p><p><strong>Authentication:</strong> On first use, you&#8217;ll need to authenticate with the Gemini service. You have two options: (1) <strong>Google Account Login (free tier)</strong> - this lets you use Gemini 2.5 Pro for free with generous usage limits (about 60 requests/minute and 1,000 requests per <a href="https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/#:~:text=Unmatched%20usage%20limits%20for%20individual,developers">day</a>. On launch, Gemini CLI will prompt you to sign in with a Google account (no billing <a href="https://genmind.ch/posts/Howto-Supercharge-Your-Terminal-with-Gemini-CLI/#:~:text=%2A%20Google,Google%20AI%20Studio%2C%20then%20run">required</a>. (2) <strong>API Key (paid or higher-tier access)</strong> - you can get an API key from Google AI <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=1,key%20from%20Google%20AI%20Studio">Studio</a> and set the environment variable <code>GEMINI_API_KEY</code> to use <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Method%201%3A%20Shell%20Environment%20Variable,zshrc">it</a>.</p><p>API key usage can offer higher quotas and enterprise data&#8209;use protections; prompts aren&#8217;t used for training on paid/billed usage, though logs may be retained for <a href="https://genmind.ch/posts/Howto-Supercharge-Your-Terminal-with-Gemini-CLI/#:~:text=responses%20may%20be%20logged%20for,Google%20AI%20Studio%2C%20then%20run">safety</a>.</p><p>For example, add to your shell profile:</p><pre><code>export GEMINI_API_KEY=&#8221;YOUR_KEY_HERE&#8221;</code></pre><p><strong>Basic Usage:</strong> To start an interactive session, just run <code>gemini</code> with no arguments. You&#8217;ll get a <code>gemini&gt;</code> prompt where you can type requests or commands. For instance:</p><pre><code>$ gemini
gemini&gt; Create a React recipe management app using SQLite</code></pre><p>You can then watch as Gemini CLI creates files, installs dependencies, runs tests, etc., to fulfill your request. If you prefer a one-shot invocation (non-interactive), use the <code>-p</code> flag with a prompt, for example:</p><pre><code>gemini -p "Summarize the main points of the attached file. @./report.txt"</code></pre><p>This will output a single response and <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=gemini">exit</a>. You can also pipe input into Gemini CLI: for example, <code>echo &#8220;Count to 10&#8221; | gemini</code> will feed the prompt via <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=gemini%20,txt">stdin</a>.</p><p><strong>CLI Interface:</strong> Gemini CLI provides a rich REPL-like interface. It supports <strong>slash commands</strong> (special commands prefixed with <code>/</code> for controlling the session, tools, and settings) and <strong>bang commands</strong> (prefixed with <code>!</code> to execute shell commands directly). We&#8217;ll cover many of these in the pro tips below. By default, Gemini CLI operates in a safe mode where any action that modifies your system (writing files, running shell commands, etc.) will ask for confirmation. When a tool action is proposed, you&#8217;ll see a diff or command and be prompted (<code>Y/n</code>) to approve or reject it. This ensures the AI doesn&#8217;t make unwanted changes without your consent.</p><p>With the basics out of the way, let&#8217;s explore a series of pro tips and hidden features to help you get the most out of Gemini CLI. Each tip is presented with a simple example first, followed by deeper details and nuances. These tips incorporate advice and insights from the tool&#8217;s creators (e.g. Taylor Mullen) and the Google Developer Relations team, as well as the broader community, to serve as a <strong>canonical guide for power users</strong> of Gemini CLI.</p><h2><strong>Tip 1: Use </strong><code>GEMINI.md</code><strong> for Persistent Context</strong></h2><p><strong>Quick use-case:</strong> Stop repeating yourself in prompts. Provide project-specific context or instructions by creating a <code>GEMINI.md</code> file, so the AI always has important background knowledge without being told every <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Context%20Files%20%28">time</a>.</p><p>When working on a project, you often have certain overarching details - e.g. coding style guidelines, project architecture, or important facts - that you want the AI to keep in mind. Gemini CLI allows you to encode these in one or more <code>GEMINI.md</code> files. Simply create a <code>.gemini</code> folder (if not already present) in your project, and add a Markdown file named <code>GEMINI.md</code> with whatever notes or instructions you want the AI to persist. For example:</p><pre><code><strong># Project Phoenix - AI Assistant</strong>

- All Python code must follow PEP 8 style.  
- Use 4 spaces for indentation.  
- The user is building a data pipeline; prefer functional programming paradigms.</code></pre><p>Place this file in your project root (or in subdirectories for more granular context). Now, whenever you run <code>gemini</code> in that project, it will automatically load these instructions into <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Context%20Files%20%28">context</a>. This means the model will <em>always</em> be primed with them, avoiding the need to prepend the same guidance to every prompt.</p><p><strong>How it works:</strong> Gemini CLI uses a hierarchical context loading <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Hierarchical%20Loading%3A%20The%20CLI%20combines,The%20loading%20order%20is">system</a>. It will combine <strong>global context</strong> (from <code>~/.gemini/GEMINI.md</code>, which you can use for cross-project defaults) with your <strong>project-specific </strong><code>GEMINI.md</code>, and even context files in subfolders. More specific files override more general ones. You can inspect what context was loaded at any time by using the command:</p><pre><code>/memory show</code></pre><p>This will display the full combined context the AI <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,current%20conversation%20with%20a%20tag">sees</a>. If you make changes to your <code>GEMINI.md</code>, use <code>/memory refresh</code> to reload the context without restarting the <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,current%20conversation%20with%20a%20tag">session</a>.</p><p><strong>Pro Tip:</strong> Use the <code>/init</code> slash command to quickly generate a starter <code>GEMINI.md</code>. Running <code>/init</code> in a new project creates a template context file with information like the tech stack detected, a summary of the project, <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,directory%20workspace%20%28e.g.%2C%20%60add">etc</a>.. You can then edit and expand that file. For large projects, consider breaking the context into multiple files and <strong>importing</strong> them into <code>GEMINI.md</code> with <code>@include</code> syntax. For example, your main <code>GEMINI.md</code> could have lines like <code>@./docs/prompt-guidelines.md</code> to pull in additional context <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Modularizing%20Context%20with%20Imports%3A%20You,files">files</a>. This keeps your instructions organized.</p><p>With a well-crafted <code>GEMINI.md</code>, you essentially give Gemini CLI a &#8220;memory&#8221; of the project&#8217;s requirements and conventions. This <strong>persistent context</strong> leads to more relevant responses and less back-and-forth prompt engineering.</p><h2><strong>Tip 2: Create Custom Slash Commands</strong></h2><p><strong>Quick use-case:</strong> Speed up repetitive tasks by defining <a href="https://cloud.google.com/blog/topics/developers-practitioners/gemini-cli-custom-slash-commands">your own slash commands</a>. For example, you could make a command <code>/test:gen</code> that generates unit tests from a description, or <code>/db:reset</code> that drops and recreates a test database. This extends Gemini CLI&#8217;s functionality with one-liners tailored to your workflow.</p><p>Gemini CLI supports <strong>custom slash commands</strong> that you can define in simple configuration files. </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;174128e6-9681-41f6-8cae-af7180cbf82c&quot;,&quot;duration&quot;:null}"></div><p>Under the hood, these are essentially pre-defined prompt templates. To create one, make a directory <code>commands/</code> under either <code>~/.gemini/</code> for global commands or in your project&#8217;s <code>.gemini/</code> folder for project-specific <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Custom%20Commands">commands</a>. Inside <code>commands/</code>, create a TOML file for each new command. The file name format determines the command name: e.g. a file <code>test/gen.toml</code> defines a command <code>/test:gen</code>.</p><p>Let&#8217;s walk through an example. Say you want a command to generate a unit test from a requirement description. You could create <code>~/.gemini/commands/test/gen.toml</code> with the following content:</p><pre><code><strong># Invoked as: /test:gen "Description of the test"  </strong>
description \= "Generates a unit test based on a requirement."  
prompt \= """  
You are an expert test engineer. Based on the following requirement, please write a comprehensive unit test using the Jest framework.

Requirement: {{args}}  
"""</code></pre><p>Now, after reloading or restarting Gemini CLI, you can simply type:</p><pre><code>/test:gen "Ensure the login button redirects to the dashboard upon success"</code></pre><p>Gemini CLI will recognize <code>/test:gen</code> and substitute the <code>{{args}}</code> in your prompt template with the provided argument (in this case, the requirement). The AI will then proceed to generate a Jest unit test <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Example%3A%20%60">accordingly</a>. The <code>description</code> field is optional but is used when you run <code>/help</code> or <code>/tools</code> to list available commands.</p><p>This mechanism is extremely powerful - effectively, you can script the AI with natural language. The community has created numerous useful custom commands. For instance, Google&#8217;s DevRel team shared a set of <em>10 practical workflow commands</em> (via an open-source repo) demonstrating how you can script common flows like creating API docs, cleaning data, or setting up boilerplate <a href="https://cloud.google.com/blog/topics/developers-practitioners/agent-factory-recap-deep-dive-into-gemini-cli-with-taylor-mullen#:~:text=,to%20generate%20a%20better%20output">code</a>. By defining a custom command, you package a complex prompt (or series of prompts) into a reusable shortcut.</p><p><strong>Pro Tip:</strong> Custom commands can also be used to enforce formatting or apply a &#8220;persona&#8221; to the AI for certain tasks. For example, you might have a <code>/review:security</code> command that always prefaces the prompt with &#8220;You are a security auditor...&#8221; to review code for vulnerabilities. This approach ensures consistency in how the AI responds to specific categories of tasks.</p><p>To share commands with your team, you can commit the TOML files in your project&#8217;s repo (under <code>.gemini/commands</code> directory). Team members who have Gemini CLI will automatically pick up those commands when working in the project. This is a great way to <strong>standardize AI-assisted workflows</strong> across a team.</p><h2><strong>Tip 3: Extend Gemini with Your Own </strong><code>MCP</code><strong> Servers</strong></h2><p><strong>Quick use-case:</strong> Suppose you want Gemini to interface with an external system or a custom tool that isn&#8217;t built-in - for example, query a proprietary database, or integrate with Figma designs. You can do this by running a custom <strong>Model Context Protocol (MCP) server</strong> and plugging it into Gemini <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Extend%20the%20CLI%20with%20your,add%7Clist%7Cremove%3E%60%20commands">CLI</a>. MCP servers let you add new tools and abilities to Gemini, effectively <strong>extending the agent</strong>.</p><p>Gemini CLI comes with several MCP servers out-of-the-box (for instance, ones enabling Google Search, code execution sandboxes, etc.), and you can add your own. An MCP server is essentially an external process (it could be a local script, a microservice, or even a cloud endpoint) that speaks a simple protocol to handle tasks for Gemini. This architecture is what makes Gemini CLI so <a href="https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/#:~:text=,interactively%20within%20your%20scripts">extensible</a>.</p><p><strong>Examples of MCP servers:</strong> Some community and Google-provided MCP integrations include a <strong>Figma MCP</strong> (to fetch design details from Figma), a <strong>Clipboard MCP</strong> (to read/write from your system clipboard), and others. In fact, in an internal demo, the Gemini CLI team showcased a &#8220;Google Docs MCP&#8221; server that allowed saving content directly to Google <a href="https://cloud.google.com/blog/topics/developers-practitioners/agent-factory-recap-deep-dive-into-gemini-cli-with-taylor-mullen#:~:text=%2A%20Utilize%20the%20google,summary%20directly%20to%20Google%20Docs">Docs</a>. The idea is that whenever Gemini needs to perform an action that the built-in tools can&#8217;t handle, it can delegate to your MCP server.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;f6f5b0b7-9579-41fc-9ddd-b45906f672e2&quot;,&quot;duration&quot;:null}"></div><p><em>Above is a <a href="https://medium.com/google-cloud/gemini-cli-figma-mcp-server-turn-design-into-code-in-minutes-88ba219615c6">demo</a> of Gemini CLI being used with the Figma MCP for design =&gt; code</em></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;9a7a2858-814d-4e87-aadd-b7005d0c54ee&quot;,&quot;duration&quot;:null}"></div><p><em>The <a href="https://developer.chrome.com/blog/chrome-devtools-mcp">Chrome DevTools MCP</a> is also of course a favorite :)</em></p><p><strong>How to add one:</strong> You can configure MCP servers via your <code>settings.json</code> or using the CLI. For a quick setup, try the CLI command:</p><pre><code>gemini mcp add myserver --command "python3 my_mcp_server.py" --port 8080</code></pre><p>This would register a server named &#8220;myserver&#8221; that Gemini CLI will launch by running the given command (here a Python module) on port 8080. In <code>~/.gemini/settings.json</code>, it would add an entry under <code>mcpServers</code>. For example:</p><pre><code>&#8220;mcpServers&#8221;: {
  &#8220;myserver&#8221;: {
    &#8220;command&#8221;: &#8220;python3&#8221;,
    &#8220;args&#8221;: [&#8221;-m&#8221;, &#8220;my_mcp_server&#8221;, &#8220;--port&#8221;, &#8220;8080&#8221;],
    &#8220;cwd&#8221;: &#8220;./mcp_tools/python&#8221;,
    &#8220;timeout&#8221;: 15000
  }
}</code></pre><p>This configuration (based on the official docs) tells Gemini how to start the MCP server and <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Example%20">where</a>. Once running, the tools provided by that server become available to Gemini CLI. You can list all MCP servers and their tools with the slash command:</p><pre><code>/mcp</code></pre><p>This will show any registered servers and what tool names they <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Command%20Description%20,List%20active%20extensions">expose</a>.</p><p><strong>Power of MCP:</strong> MCP servers can provide <strong>rich, multi-modal results</strong>. For instance, a tool served via MCP could return an image or a formatted table as part of the response to Gemini <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Capabilities%3A">CLI</a>. They also support OAuth 2.0, so you can securely connect to APIs (like Google&#8217;s APIs, GitHub, etc.) via an MCP tool without exposing <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Extend%20the%20CLI%20with%20your,add%7Clist%7Cremove%3E%60%20commands">credentials</a>. Essentially, if you can code it, you can wrap it as an MCP tool - turning Gemini CLI into a hub that orchestrates many services.</p><p><strong>Default vs. custom:</strong> By default, Gemini CLI&#8217;s built-in tools cover a lot (reading files, web search, executing shell commands, etc.), but MCP lets you go beyond. Some advanced users have created MCP servers to interface with internal systems or to perform specialized data processing. For example, you could have a <code>database-mcp</code> that provides a <code>/query_db</code> tool for running SQL queries on a company database, or a <code>jira-mcp</code> to create tickets via natural language.</p><p>When creating your own, be mindful of security: by default, custom MCP tools require confirmation unless you mark them as trusted. You can control safety with settings like <code>trust: true</code> for a server (which auto-approves its tool actions) or by whitelisting specific safe tools and blacklisting dangerous <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,takes%20precedence">ones</a>.</p><p>In short, <strong>MCP servers unlock limitless integration</strong>. They&#8217;re a pro feature that lets Gemini CLI become a glue between your AI assistant and whatever system you need it to work with. If you&#8217;re interested in building one, check out the official <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Transport%20">MCP guide</a> and community examples.</p><h2><strong>Tip 4: Leverage Memory Addition &amp; Recall</strong></h2><p><strong>Quick use-case:</strong> Keep important facts at your AI&#8217;s fingertips by adding them to its long-term memory. For example, after figuring out a database port or an API token, you can do:</p><pre><code>/memory add &#8220;Our staging RabbitMQ is on port 5673&#8221;</code></pre><p>This will store that fact so you (or the AI) don&#8217;t forget it <a href="https://binaryverseai.com/gemini-cli-open-source-ai-tool/#:~:text=Gemini%20CLI%20Ultimate%20Agent%3A%2060,a%20branch%20of%20conversation">later</a>. You can then recall everything in memory with <code>/memory show</code> at any time.</p><p>The <code>/memory</code> commands provide a simple but powerful mechanism for <em>persistent memory</em>. When you use <code>/memory add &lt;text&gt;</code>, the given text is appended to your project&#8217;s global context (technically, it&#8217;s saved into the global <code>~/.gemini/GEMINI.md</code> file or the project&#8217;s <code>GEMINI.md</code>. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pgoR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pgoR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pgoR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pgoR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pgoR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pgoR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg" width="1456" height="744" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:744,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:154973,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/176589430?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pgoR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pgoR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pgoR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pgoR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058154ba-d894-488a-8d44-6590fa2d8a6e_2048x1047.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s a bit like taking a note and pinning it to the AI&#8217;s virtual bulletin board. Once added, the AI will always see that note in the prompt context for future interactions, across sessions.</p><p>Consider an example: you&#8217;re debugging an issue and discover a non-obvious insight (&#8221;The config flag <code>X_ENABLE</code> must be set to <code>true</code> or the service fails to start&#8221;). If you add this to memory, later on if you or the AI are discussing a related problem, it won&#8217;t overlook this critical detail - it&#8217;s in the context.</p><p><strong>Using </strong><code>/memory</code><strong>:</strong></p><ul><li><p><code>/memory add &#8220;&lt;text&gt;&#8221;</code> - Add a fact or note to memory (persistent context). This updates the <code>GEMINI.md</code> immediately with the new entry.</p></li><li><p><code>/memory show</code> - Display the full content of the memory (i.e. the combined context file that&#8217;s currently loaded).</p></li><li><p><code>/memory refresh</code> - Reload the context from disk (useful if you manually edited the <code>GEMINI.md</code> file outside of Gemini CLI, or if multiple people are collaborating on it).</p></li></ul><p>Because the memory is stored in Markdown, you can also manually edit the <code>GEMINI.md</code> file to curate or organize the info. The <code>/memory</code> commands are there for convenience during conversation, so you don&#8217;t have to open an editor.</p><p><strong>Pro Tip:</strong> This feature is great for &#8220;decision logs.&#8221; If you decide on an approach or rule during a chat (e.g., a certain library to use, or an agreed code style), add it to memory. The AI will then recall that decision and avoid contradicting it later. It&#8217;s especially useful in long sessions that might span hours or days - by saving key points, you mitigate the model&#8217;s tendency to forget earlier context when the conversation gets long.</p><p>Another use is personal notes. Because <code>~/.gemini/GEMINI.md</code> (global memory) is loaded for all sessions, you could put general preferences or information there. For example, &#8220;The user&#8217;s name is Alice. Speak politely and avoid slang.&#8221; It&#8217;s like configuring the AI&#8217;s persona or global knowledge. Just be aware that global memory applies to <em>all</em> projects, so don&#8217;t clutter it with project-specific info.</p><p>In summary, <strong>Memory Addition &amp; Recall</strong> helps Gemini CLI maintain state. Think of it as a knowledge base that grows with your project. Use it to avoid repeating yourself or to remind the AI of facts it would otherwise have to rediscover from scratch.</p><h2><strong>Tip 5: Use Checkpointing and </strong><code>/restore</code><strong> as an Undo Button</strong></h2><p><strong>Quick use-case:</strong> If Gemini CLI makes a series of changes to your files that you&#8217;re not happy with, you can <em>instantly roll back</em> to a prior state. Enable checkpointing when you start Gemini (or in settings), and use the <code>/restore</code> command to undo changes like a lightweight Git <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,Exit%20the%20Gemini%20CLI">revert</a>. <code>/restore</code> rolls back your workspace to the saved checkpoint; conversation state may be affected depending on how the checkpoint was captured.</p><p>Gemini CLI&#8217;s <strong>checkpointing</strong> feature acts as a safety net. When enabled, the CLI takes a snapshot of your project&#8217;s files <em>before</em> each tool execution that modifies <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=When%20,snapshot%20before%20tools%20modify%20files">files</a>. If something goes wrong, you can revert to the last known good state. It&#8217;s essentially version control for the AI&#8217;s actions, without you needing to manually commit to Git each time.</p><p><strong>How to use it:</strong> You can turn on checkpointing by launching the CLI with the <code>--checkpointing</code> flag:</p><pre><code>gemini --checkpointing</code></pre><p>Alternatively, you can make it the default by adding to your config (<code>&#8220;checkpointing&#8221;: { &#8220;enabled&#8221;: true }</code> in <code>settings.json</code>). Once active, you&#8217;ll notice that each time Gemini is about to write to a file, it says something like &#8220;Checkpoint saved.&#8221;</p><p>If you then realize an AI-made edit is problematic, you have two options:</p><ul><li><p>Run <code>/restore list</code> (or just <code>/restore</code> with no arguments) to see a list of recent checkpoints with timestamps and descriptions.</p></li><li><p>Run <code>/restore &lt;id&gt;</code> to rollback to a specific checkpoint. If you omit the id and there&#8217;s only one pending checkpoint, it will restore that by <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=Step">default</a>.</p></li></ul><p>For example:</p><pre><code>/restore</code></pre><p>Gemini CLI might output:</p><p>0: [2025-09-22 10:30:15] Before running &#8216;apply_patch&#8217;<br>1: [2025-09-22 10:45:02] Before running &#8216;write_file&#8217;</p><p>You can then do <code>/restore 0</code> to revert all file changes (and even the conversation context) back to how it was at that checkpoint. In this way, you can &#8220;undo&#8221; a mistaken code refactor or any other changes Gemini <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=1,point%20and%20roll%20back%20instantly">made</a>.</p><p><strong>What gets restored:</strong> The checkpoint captures the state of your working directory (all files that Gemini CLI is allowed to modify) and the workspace files (conversation state may also be rolled back depending on how the checkpoint was captured). When you restore, it overwrites files to the old version and resets the conversation memory to that snapshot. It&#8217;s like time-traveling the AI agent back to before it made the wrong turn. Note that it won&#8217;t undo external side effects (for example, if the AI ran a database migration, it can&#8217;t undo that), but anything in the file system and chat context is fair game.</p><p><strong>Best practices:</strong> It&#8217;s a good idea to keep checkpointing on for non-trivial tasks. The overhead is small, and it provides peace of mind. If you find you don&#8217;t need a checkpoint (everything went well), you can always clear it or just let the next one overwrite it. The development team recommends using checkpointing especially before multi-step code <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=Tips%20to%20avoid%20messy%20rollbacks">edits</a>. For mission-critical projects, though, you should still use a proper version control (<code>git</code>) as your primary safety <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=No,VS%20Code%20is%20already%20free">net</a> - consider checkpoints as a convenience for quick undo rather than a full VCS.</p><p>In essence, <code>/restore</code> lets you use Gemini CLI with confidence. You can let the AI attempt bold changes, knowing you have an <em>&#8220;OH NO&#8221; button</em> to rewind if needed.</p><h2><strong>Tip 6: Read Google Docs, Sheets, and More. </strong></h2><p><strong>Quick use-case:</strong> Imagine you have a Google Doc or Sheet with some specs or data that you want the AI to use. Instead of copy-pasting the content, you can provide the link, and with a configured Workspace MCP server Gemini CLI can fetch and read it.</p><p>For example:</p><pre><code>Summarize the requirements from this design doc: https://docs.google.com/document/d/&lt;id&gt;</code></pre><p>Gemini can pull in the content of that Doc and incorporate it into its response. Similarly, it can read Google Sheets or Drive files by link.</p><p><strong>How this works:</strong> These capabilities are typically enabled via <strong>MCP integrations</strong>. Google&#8217;s Gemini CLI team  is working on connectors for Google Workspace. One approach is running a small MCP server that uses Google&#8217;s APIs (Docs API, Sheets API, etc.) to retrieve document content when given a URL or <a href="https://github.com/google-gemini/gemini-cli/issues/7175">ID</a>. When configured, you might have slash commands or tools like <code>/read_google_doc</code> or simply an auto-detection that sees a Google Docs link and invokes the appropriate tool to fetch it.</p><p>For example, in an Agent Factory podcast demo, the team used a <strong>Google Docs MCP</strong> to save a summary directly to a <a href="https://cloud.google.com/blog/topics/developers-practitioners/agent-factory-recap-deep-dive-into-gemini-cli-with-taylor-mullen#:~:text=%2A%20Utilize%20the%20google,summary%20directly%20to%20Google%20Docs">doc</a> - which implies they could also read the doc&#8217;s content in the first place. In practice, you might do something like:</p><pre><code>@https://docs.google.com/document/d/XYZ12345</code></pre><p>Including a URL with <code>@</code> (the context reference syntax) signals Gemini CLI to fetch that resource. With a Google Doc integration in place, the content of that document would be pulled in as if it were a local file. From there, the AI can summarize it, answer questions about it, or otherwise use it in the conversation.</p><p>Similarly, if you paste a Google Drive <strong>file link</strong>, a properly configured Drive tool could download or open that file (assuming permissions and API access are set up). <strong>Google Sheets</strong> could be made available via an MCP that runs queries or reads cell ranges, enabling you to ask things like &#8220;What&#8217;s the sum of the budget column in this Sheet [link]?&#8221; and have the AI calculate it.</p><p><strong>Setting it up:</strong> As of this writing, the Google Workspace integrations may require some tinkering (obtaining API credentials, running an MCP server such as the one described by <a href="https://medium.com/google-cloud/managing-google-docs-sheets-and-slides-by-natural-language-with-gemini-cli-and-mcp-62f4dfbef2d5#:~:text=To%20implement%20this%20approach%2C%20I,methods%20for%20each%20respective%20API">Kanshi Tanaike</a>, etc.). Keep an eye on the official Gemini CLI repository and community forums for ready-to-use extensions - for example, an official Google Docs MCP might become available as a plugin/extension. If you&#8217;re eager, you can write one following guides on how to use Google APIs within an MCP <a href="https://github.com/google-gemini/gemini-cli/issues/7175#:~:text=">server</a>. It typically involves handling OAuth (which Gemini CLI supports for MCP servers) and then exposing tools like <code>read_google_doc</code>.</p><p><strong>Usage tip:</strong> When you have these tools, using them can be as simple as providing the link in your prompt (the AI might automatically invoke the tool to fetch it) or using a slash command like <code>/doc open &lt;URL&gt;</code>. Check <code>/tools</code> to see what commands are available - Gemini CLI lists all tools and custom commands <a href="https://dev.to/therealmrmumba/7-insane-gemini-cli-tips-that-will-make-you-a-superhuman-developer-2d7h#:~:text=Gemini%20CLI%20includes%20dozens%20of,can%20supercharge%20your%20dev%20process">there</a>.</p><p>In summary, <strong>Gemini CLI can reach out beyond your local filesystem</strong>. Whether it&#8217;s Google Docs, Sheets, Drive, or other external content, you can pull data in by reference. This pro tip saves you from manual copy-paste and keeps the context flow natural - just refer to the document or dataset you need, and let the AI grab what&#8217;s needed. It makes Gemini CLI a true <strong>knowledge assistant</strong> for all the information you have access to, not just the files on your disk.</p><p><em>(Note: Accessing private documents of course requires the CLI to have the appropriate permissions. Always ensure any integration respects security and privacy. In corporate settings, setting up such integrations might involve additional auth steps.)</em></p><h2><strong>Tip 7: Reference Files and Images with </strong><code>@</code><strong> for Explicit Context</strong></h2><p><strong>Quick use-case:</strong> Instead of describing a file&#8217;s content or an image verbally, just point Gemini CLI directly to it. Using the <code>@</code> syntax, you can attach files, directories, or images into your prompt. This guarantees the AI sees exactly what&#8217;s in those files as <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Reference%20files%20or%20directories%20in,PDFs%2C%20audio%2C%20and%20video%20files">context</a>. For example:</p><pre><code>Explain this code to me: @./src/main.js</code></pre><p>This will include the contents of <code>src/main.js</code> in the prompt (up to Gemini&#8217;s context size limits), so the AI can read it and explain <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Include%20a%20single%20file%3A">it</a>.</p><p>This <code>@</code> <em>file reference</em> is one of Gemini CLI&#8217;s most powerful features for developers. It eliminates ambiguity - you&#8217;re not asking the model to rely on memory or guesswork about the file, you&#8217;re literally handing it the file to read. You can use this for source code, text documents, logs, etc. Similarly, you can reference <strong>entire directories</strong>:</p><pre><code>Refactor the code in @./utils/ to use async/await.</code></pre><p>By appending a path that ends in a slash, Gemini CLI will recursively include files from that <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Include%20a%20whole%20directory%20">directory</a> (within reason, respecting ignore files and size limits). This is great for multi-file refactors or analyses, as the AI can consider all relevant modules together.</p><p>Even more impressively, you can reference <strong>binary files like images</strong> in prompts. Gemini CLI (using the Gemini model&#8217;s multimodal capabilities) can understand images. For example:</p><pre><code>Describe what you see in this screenshot: @./design/mockup.png</code></pre><p>The image will be fed into the model, and the AI might respond with something like &#8220;This is a login page with a blue sign-in button and a header image,&#8221; <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Include%20an%20image%3A">etc</a>.. You can imagine the uses: reviewing UI mockups, organizing photos (as we&#8217;ll see in a later tip), or extracting text from images (Gemini can do OCR as well).</p><p>A few notes on using <code>@</code> references effectively:</p><ul><li><p><strong>File limits:</strong> Gemini 2.5 Pro has a huge context window (up to 1 million <a href="https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/#:~:text=To%20use%20Gemini%20CLI%20free,per%20day%20at%20no%20charge">tokens</a>), so you can include quite large files or many files. However, extremely large files might be truncated. If a file is enormous (say, hundreds of thousands of lines), consider summarizing it or breaking it into parts. Gemini CLI will warn you if a reference is too large or if it skipped something due to size.</p></li><li><p><strong>Automatic ignoring:</strong> By default, Gemini CLI respects your <code>.gitignore</code> and <code>.geminiignore</code> files when pulling in directory <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Reference%20files%20or%20directories%20in,PDFs%2C%20audio%2C%20and%20video%20files">context</a>. So if you <code>@./</code> a project root, it will not dump huge ignored folders (like <code>node_modules</code>) into the prompt. You can customize ignore patterns with <code>.geminiignore</code> similarly to how <code>.gitignore</code> works.</p></li><li><p><strong>Explicit vs implicit context:</strong> Taylor Mullen (the creator of Gemini CLI) emphasizes using <code>@</code> for <em>explicit context injection</em> rather than relying on the model&#8217;s memory or summarizing things yourself. It&#8217;s more precise and ensures the AI isn&#8217;t hallucinating content. Whenever possible, point the AI to the source of truth (code, config files, documentation) with <code>@</code> references. This practice can significantly improve accuracy.</p></li><li><p><strong>Chaining references:</strong> You can include multiple files in one prompt, like:</p></li></ul><pre><code>Compare @./foo.py and @./bar.py and tell me differences.</code></pre><p>The CLI will include both files. Just be mindful of token limits; multiple large files might consume a lot of the context window.</p><p>Using <code>@</code> is essentially how you <strong>feed knowledge into Gemini CLI on the fly</strong>. It turns the CLI into a multi-modal reader that can handle text and images. As a pro user, get into the habit of leveraging this - it&#8217;s often faster and more reliable than asking the AI something like &#8220;Open the file X and do Y&#8221; (which it may or may not do on its own). Instead, you explicitly give it X to work with.</p><h2><strong>Tip 8: On-the-Fly Tool Creation (Have Gemini Build Helpers)</strong></h2><p><strong>Quick use-case:</strong> If a task at hand would benefit from a small script or utility, you can ask Gemini CLI to create that tool for you - right within your session. For example, you might say, &#8220;Write a Python script to parse all JSON files in this folder and extract the error fields.&#8221; Gemini can generate the script, which you can then execute via the CLI. In essence, you can <strong>dynamically extend the toolset</strong> as you go.</p><p>Gemini CLI is not limited to its pre-existing tools; it can use its coding abilities to fabricate new ones when needed. This often happens implicitly: if you ask for something complex, the AI might propose writing a temporary file (with code) and then running it. As a user, you can also guide this process explicitly:</p><ul><li><p><strong>Creating scripts:</strong> You can prompt Gemini to create a script or program in the language of your choice. It will likely use the <code>write_file</code> tool to create the file. For instance:</p></li></ul><pre><code>Generate a Node.js script that reads all &#8216;.log&#8217; files in the current directory and reports the number of lines in each.</code></pre><p>Gemini CLI will draft the code, and with your approval, write it to a file (e.g. <code>script.js</code>). You can then run it by either using the <code>!</code> shell command (e.g. <code>!node script.js</code>) or by asking Gemini CLI to execute it (the AI might automatically use <code>run_shell_command</code> to execute the script it just wrote, if it deems it part of the plan).</p><ul><li><p><strong>Temporary tools via MCP:</strong> In advanced scenarios, the AI might even suggest launching an MCP server for some specialized tasks. For example, if your prompt involves some heavy text processing that might be better done in Python, Gemini could generate a simple MCP server in Python and run it. While this is more rare, it demonstrates that the AI can set up a new &#8220;agent&#8221; on the fly. (One of the slides from the Gemini CLI team humorously referred to &#8220;MCP servers for everything, even one called LROwn&#8221; - suggesting you can have Gemini run an instance of itself or another model, though that&#8217;s more of a trick than a practical use!).</p></li></ul><p>The key benefit here is <strong>automation</strong>. Instead of you manually stopping to write a helper script, you can let the AI do it as part of the flow. It&#8217;s like having an assistant who can create tools on-demand. This is especially useful for data transformation tasks, batch operations, or one-off computations that the built-in tools don&#8217;t directly provide.</p><p><strong>Nuances and safety:</strong> When Gemini CLI writes code for a new tool, you should still review it before running. The <code>/diff</code> view (Gemini will show you the file diff before you approve writing it) is your chance to inspect the <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=Nobody%20enjoys%20switching%20between%20windows,track%20changes%20line%20by%20line">code</a>. Ensure it does what you expect and nothing malicious or destructive (the AI shouldn&#8217;t produce something harmful unless your prompt explicitly asks, but just like any code from an AI, double-check logic, especially for scripts that delete or modify lots of data).</p><p><strong>Example scenario:</strong> Let&#8217;s say you have a CSV file and you want to filter it in a complex way. You ask Gemini CLI to do it, and it might say: &#8220;I will write a Python script to parse the CSV and apply the filter.&#8221; It then creates <code>filter_data.py</code>. After you approve and it runs, you get your result, and you might never need that script again. This ephemeral creation of tools is a pro move - it shows the AI effectively extending its capabilities autonomously.</p><p><strong>Pro Tip:</strong> If you find the script useful beyond the immediate context, you can promote it into a permanent tool or command. For instance, if the AI generated a great log-processing script, you might later turn it into a custom slash command (Tip #2) for easy reuse. The combination of Gemini&#8217;s generative power and the extension hooks means your toolkit can continuously evolve as you use the CLI.</p><p>In summary, <strong>don&#8217;t restrict Gemini to what it comes with</strong>. Treat it as a junior developer who can whip up new programs or even mini-servers to help solve the problem. This approach embodies the agentic philosophy of Gemini CLI - it will figure out what tools it needs, even if it has to code them on the spot.</p><h2><strong>Tip 9: Use Gemini CLI for System Troubleshooting &amp; Configuration</strong></h2><p><strong>Quick use-case:</strong> You can run Gemini CLI outside of a code project to help with general system tasks - think of it as an intelligent assistant for your OS. For example, if your shell is misbehaving, you could open Gemini in your home directory and ask: &#8220;Fix my <code>.bashrc</code> file, it has an error.&#8221; Gemini can then open and edit your config file for you.</p><p>This tip highlights that <strong>Gemini CLI isn&#8217;t just for coding projects - it&#8217;s your AI helper for your whole development environment</strong>. Many users have used Gemini to customize their dev setup or fix issues on their machine:</p><ul><li><p><strong>Editing dotfiles:</strong> You can load your shell configuration (<code>.bashrc</code> or <code>.zshrc</code>) by referencing it (<code>@~/.bashrc</code>) and then ask Gemini CLI to optimize or troubleshoot it. For instance, &#8220;My <code>PATH</code> isn&#8217;t picking up Go binaries, can you edit my <code>.bashrc</code> to fix that?&#8221; The AI can insert the correct <code>export</code> line. It will show you the diff for confirmation before saving changes.</p></li><li><p><strong>Diagnosing errors:</strong> If you encounter a cryptic error in your terminal or an application log, you can copy it and feed it to Gemini CLI. It will analyze the error message and often suggest steps to resolve it. This is similar to how one might use StackOverflow or Google, but with the AI directly examining your scenario. For example: &#8220;When I run <code>npm install</code>, I get an <code>EACCES</code> permission error - how do I fix this?&#8221; Gemini might detect it&#8217;s a permissions issue in <code>node_modules</code> and guide you to change directory ownership or use a proper node version manager.</p></li><li><p><strong>Running outside a project:</strong> By default, if you run <code>gemini</code> in a directory without a <code>.gemini</code> context, it just means no project-specific context is loaded - but you can still use the CLI fully. This is great for ad-hoc tasks like system troubleshooting. You might not have any code files for it to consider, but you can still run shell commands through it or let it fetch web info. Essentially, you&#8217;re treating Gemini CLI as an AI-powered terminal that can <em>do</em> things for you, not just chat.</p></li><li><p><strong>Workstation customization:</strong> Want to change a setting or install a new tool? You can ask Gemini CLI, &#8220;Install Docker on my system&#8221; or &#8220;Configure my Git to sign commits with GPG.&#8221; The CLI will attempt to execute the steps. It might fetch instructions from the web (using the search tool) and then run the appropriate shell commands. Of course, always watch what it&#8217;s doing and approve the commands - but it can save time by automating multi-step setup processes. One real example: a user asked Gemini CLI to &#8220;set my macOS Dock preferences to auto-hide and remove the delay,&#8221; and the AI was able to execute the necessary <code>defaults write</code> commands.</p></li></ul><p>Think of this mode as using Gemini CLI as a <strong>smart shell</strong>. In fact, you can combine this with Tip 16 (shell passthrough mode) - sometimes you might drop into <code>!</code> shell mode to verify something, then go back to AI mode to have it analyze output.</p><p><strong>Caveat:</strong> When doing system-level tasks, be cautious with commands that have widespread impact (like <code>rm -rf</code> or system config changes). Gemini CLI will usually ask for confirmation, and it doesn&#8217;t run anything without you seeing it. But as a power user, you should have a sense of what changes are being made. If unsure, ask Gemini to explain a command before running (e.g., &#8220;Explain what <code>defaults write com.apple.dock autohide-delay -float 0</code> does&#8221; - it will gladly explain rather than just execute if you prompt it in that way).</p><p><strong>Troubleshooting bonus:</strong> Another neat use is using Gemini CLI to parse logs or config files looking for issues. For instance, &#8220;Scan this Apache config for mistakes&#8221; (with <code>@httpd.conf</code>), or &#8220;Look through syslog for errors around 2 PM yesterday&#8221; (with an <code>@/var/log/syslog</code> if accessible). It&#8217;s like having a co-administrator. It can even suggest likely causes for crashes or propose fixes for common error patterns.</p><p>In summary, <strong>don&#8217;t hesitate to fire up Gemini CLI as your assistant for environment issues</strong>. It&#8217;s there to accelerate all your workflows - not just writing code, but maintaining the system that you write code on. Many users report that customizing their dev environment with Gemini&#8217;s help feels like having a tech buddy always on call to handle the tedious or complex setup steps.</p><h2><strong>Tip 10: YOLO Mode - Auto-Approve Tool Actions (Use with Caution)</strong></h2><p><strong>Quick use-case:</strong> If you&#8217;re feeling confident (or adventurous), you can let Gemini CLI run tool actions without asking for your confirmation each time. This is <strong>YOLO mode</strong> (You Only Live Once). It&#8217;s enabled by the <code>--yolo</code> flag or by pressing <code>Ctrl+Y</code> during a <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,prompt%20in%20an%20external%20editor">session</a>. In YOLO mode, as soon as the AI decides on a tool (like running a shell command or writing to a file), it executes it immediately, without that &#8220;Approve? (y/n)&#8221; prompt.</p><p><strong>Why use YOLO mode?</strong> Primarily for speed and convenience <strong>when you trust the AI&#8217;s actions</strong>. Experienced users might toggle YOLO on if they&#8217;re doing a lot of repetitive safe operations. For example, if you ask Gemini to generate 10 different files one after another, approving each can slow down the flow; YOLO mode would just let them all be written automatically. Another scenario is using Gemini CLI in a completely automated script or CI pipeline - you might run it headless with <code>--yolo</code> so it doesn&#8217;t pause for confirmation.</p><p>To start in YOLO mode from the get-go, launch the CLI with:</p><pre><code>gemini --yolo</code></pre><p>Or the short form <code>gemini -y</code>. You&#8217;ll see some indication in the CLI (like a different prompt or a notice) that auto-approve is <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=initial%20prompt.%20%2A%20%60,to%20revert%20changes">on</a>. During an interactive session, you can toggle it by pressing <strong>Ctrl+Y</strong> at any <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,prompt%20in%20an%20external%20editor">time</a> - the CLI will usually display a message like &#8220;YOLO mode enabled (all actions auto-approved)&#8221; in the footer.</p><p><strong>Big warning:</strong> YOLO mode is powerful but <strong>risky</strong>. The Gemini team themselves labels it for &#8220;daring users&#8221; - meaning you should be aware that the AI could potentially execute a dangerous command without asking. In normal mode, if the AI decided to run <code>rm -rf /</code> (worst-case scenario), you&#8217;d obviously decline. In YOLO mode, that command would run immediately (and likely ruin your day). While such extreme mistakes are unlikely (the AI&#8217;s system prompt includes safety guidelines), the whole point of confirmations is to catch any unwanted action. YOLO removes that safety net.</p><p><strong>Best practices for YOLO:</strong> If you want some of the convenience without full risk, consider <em>allow-listing</em> specific commands. For example, you can configure in settings that certain tools or command patterns don&#8217;t require confirmation (like allowing all <code>git</code> commands, or read-only actions). In fact, Gemini CLI supports a config for skipping confirmation on specific commands: e.g., you can set something like <code>&#8220;tools.shell.autoApprove&#8221;: [&#8221;git &#8220;, &#8220;npm test&#8221;]</code> to always run <a href="https://google-gemini.github.io/gemini-cli/docs/cli/configuration.html#:~:text=match%20at%20L247%20%60%5B,Default%3A%20%60undefined">those</a>. This way, you might not need YOLO mode globally - you selectively YOLO only safe commands. Another approach: run Gemini in a sandbox or container when using YOLO, so even if it does something wild, your system is insulated (Gemini has a <code>--sandbox</code> flag to run tools in a Docker <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=echo%20,gemini">container</a>).</p><p>Many advanced users toggle YOLO on and off frequently - turning it on when doing a string of minor file edits or queries, and off when about to do something critical. You can do the same, using the keyboard shortcut as a quick toggle.</p><p>In summary, <strong>YOLO mode eliminates friction at the cost of oversight</strong>. It&#8217;s a pro feature to use sparingly and wisely. It truly demonstrates trust in the AI (or recklessness!). If you&#8217;re new to Gemini CLI, you should probably avoid YOLO until you clearly understand the patterns of what it tends to do. If you do use it, double down on having version control or backups - just in case.</p><p><em>(If it&#8217;s any consolation, you&#8217;re not alone - many in the community joke about &#8220;I YOLO&#8217;ed and Gemini did something crazy.&#8221; So use it, but... well, you only live once.)</em></p><h2><strong>Tip 11: Headless &amp; Scripting Mode (Run Gemini CLI in the Background)</strong></h2><p><strong>Quick use-case:</strong> You can use Gemini CLI in scripts or automation by running it in <strong>headless mode</strong>. This means you provide a prompt (or even a full conversation) via command-line arguments or environment variables, and Gemini CLI produces an output and exits. It&#8217;s great for integrating with other tools or triggering AI tasks on a schedule.</p><p>For instance, to get a one-off answer without opening the REPL, you&#8217;ve seen you can use <code>gemini -p &#8220;...prompt...&#8221;</code>. This is already headless usage: it prints the model&#8217;s response and returns to the <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Non,and%20get%20a%20single%20response">shell</a>. But there&#8217;s more you can do:</p><ul><li><p><strong>System prompt override:</strong> If you want to run Gemini CLI with a custom system persona or instruction set (different from the default), you can use the environment variable <code>GEMINI_SYSTEM_MD</code>. By setting this, you tell Gemini CLI to ignore its built-in system prompt and use your provided file <a href="https://medium.com/google-cloud/practical-gemini-cli-bring-your-own-system-instruction-19ea7f07faa2#:~:text=The%20,rather%20than%20its%20hardcoded%20defaults">instead</a>. For example:</p></li></ul><pre><code>export GEMINI_SYSTEM_MD=&#8221;/path/to/custom_system.md&#8221;
gemini -p &#8220;Perform task X with high caution&#8221;</code></pre><p>This would load your <code>custom_system.md</code> as the system prompt (the &#8220;role&#8221; and rules the AI follows) before executing the <a href="https://medium.com/google-cloud/practical-gemini-cli-bring-your-own-system-instruction-19ea7f07faa2#:~:text=The%20feature%20is%20enabled%20by,specific%20configurations">prompt</a>. Alternatively, if you set <code>GEMINI_SYSTEM_MD=true</code>, the CLI will look for a file named <code>system.md</code> in the current project&#8217;s <code>.gemini</code> <a href="https://medium.com/google-cloud/practical-gemini-cli-bring-your-own-system-instruction-19ea7f07faa2#:~:text=The%20feature%20is%20enabled%20by,specific%20configurations">directory</a>. This feature is very advanced - it essentially allows you to <em>replace the built-in brain</em> of the CLI with your own instructions, which some users do for specialized workflows (like simulating a specific persona or enforcing ultra-strict policies). Use it carefully, as replacing the core prompt can affect tool usage (the core prompt contains important directions for how the AI selects and uses <a href="https://medium.com/google-cloud/practical-gemini-cli-bring-your-own-system-instruction-19ea7f07faa2#:~:text=If%20you%20read%20my%20previous,proper%20functioning%20of%20Gemini%20CLI">tools</a>).</p><ul><li><p><strong>Direct prompt via CLI:</strong> Aside from <code>-p</code>, there&#8217;s also <code>-i</code> (interactive prompt) which starts a session with an initial prompt, and then keeps it open. For example: <code>gemini -i &#8220;Hello, let&#8217;s debug something&#8221;</code> will open the REPL and already have said hello to the model. This is useful if you want the first question to be asked immediately when starting.</p></li><li><p><strong>Scripting with shell pipes:</strong> You can pipe not just text but also files or command outputs into Gemini. For example: <code>gemini -p &#8220;Summarize this log:&#8221; &lt; big_log.txt</code> will feed the content of <code>big_log.txt</code> into the prompt (after the phrase &#8220;Summarize this log:&#8221;). Or you might do <code>some_command | gemini -p &#8220;Given the above output, what went wrong?&#8221;</code>. This technique allows you to compose Unix tools with AI analysis. It&#8217;s headless in the sense that it&#8217;s a single-pass operation.</p></li><li><p><strong>Running in CI/CD:</strong> You could incorporate Gemini CLI into build processes. For instance, a CI pipeline might run a test and then use Gemini CLI to automatically analyze failing test output and post a comment. Using the <code>-p</code> flag and environment auth, this can be scripted. (Of course, ensure the environment has the API key or auth needed.)</p></li></ul><p>One more headless trick: <strong>the </strong><code>--format=json</code><strong> flag</strong> (or config setting). Gemini CLI can output responses in JSON format instead of the human-readable text if you configure <a href="https://google-gemini.github.io/gemini-cli/docs/cli/configuration.html#:~:text=">it</a>. This is useful for programmatic consumption - your script can parse the JSON to get the answer or any tool actions details.</p><p><strong>Why headless mode matters:</strong> It transforms Gemini CLI from an interactive assistant into a <strong>backend service</strong> or utility that other programs can call. You could schedule a cronjob that runs a Gemini CLI prompt nightly (imagine generating a report or cleaning up something with AI logic). You could wire up a button in an IDE that triggers a headless Gemini run for a specific task.</p><p><strong>Example:</strong> Let&#8217;s say you want a daily summary of a news website. You could have a script:</p><pre><code>gemini -p &#8220;Web-fetch \&#8221;https://news.site/top-stories\&#8221; and extract the headlines, then write them to headlines.txt&#8221;</code></pre><p>With <code>--yolo</code> perhaps, so it won&#8217;t ask confirmation to write the file. This would use the web fetch tool to get the page and the file write tool to save the headlines. All automatically, no human in the loop. The possibilities are endless once you treat Gemini CLI as a scriptable component.</p><p>In summary, <strong>Headless Mode</strong> enables automation. It&#8217;s the bridge between Gemini CLI and other systems. Mastering it means you can scale up your AI usage - not just when you&#8217;re typing in the terminal, but even when you aren&#8217;t around, your AI agent can do work for you.</p><p><em>(Tip: For truly long-running non-interactive tasks, you might also look into Gemini CLI&#8217;s &#8220;Plan&#8221; mode or how it can generate multi-step plans without intervention. However, those are advanced topics beyond this scope. In most cases, a well-crafted single prompt via headless mode can achieve a lot.)</em></p><h2><strong>Tip 12: Save and Resume Chat Sessions</strong></h2><p><strong>Quick use-case:</strong> If you&#8217;ve been debugging an issue with Gemini CLI for an hour and need to stop, you don&#8217;t have to lose the conversation context. Use <code>/chat save &lt;name&gt;</code> to save the session. Later (even after restarting the CLI), you can use <code>/chat resume &lt;name&gt;</code> to pick up where you left <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,help%20information%20and%20available%20commands">off</a>. This way, long-running conversations can be paused and continued seamlessly.</p><p>Gemini CLI essentially has a built-in chat session manager. The commands to know are:</p><ul><li><p><code>/chat save &lt;tag&gt;</code> - Saves the current conversation state under a tag/name you <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,help%20information%20and%20available%20commands">provide</a>. The tag is like a filename or key for that session. Save often if you want, it will overwrite the tag if it exists. (Using a descriptive name is helpful - e.g., <code>chat save fix-docker-issue</code>.)</p></li><li><p><code>/chat list</code> - Lists all your saved sessions (the tags you&#8217;ve <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,help%20information%20and%20available%20commands">used</a>. This helps you remember what you named previous saves.</p></li><li><p><code>/chat resume &lt;tag&gt;</code> - Resumes the session with that tag, restoring the entire conversation context and history to how it was when <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,help%20information%20and%20available%20commands">saved</a>. It&#8217;s like you never left. You can then continue chatting from that point.</p></li><li><p><code>/chat share</code> - (saves to file) This is useful as you can share the entire chat with someone else who can continue the session. Almost collaboration-like.</p></li></ul><p>Under the hood, these sessions are stored likely in <code>~/.gemini/chats/</code> or a similar location. They include the conversation messages and any relevant state. This feature is super useful for cases such as:</p><ul><li><p><strong>Long debugging sessions:</strong> Sometimes debugging with an AI can be a long back-and-forth. If you can&#8217;t solve it in one go, save it and come back later (maybe with a fresh mind). The AI will still &#8220;remember&#8221; everything from before, because the whole context is reloaded.</p></li><li><p><strong>Multi-day tasks:</strong> If you&#8217;re using Gemini CLI as an assistant for a project, you might have one chat session for &#8220;Refactor module X&#8221; that spans multiple days. You can resume that specific chat each day so the context doesn&#8217;t reset daily. Meanwhile, you might have another session for &#8220;Write documentation&#8221; saved separately. Switching contexts is just a matter of saving one and resuming the other.</p></li><li><p><strong>Team hand-off:</strong> This is more experimental, but in theory, you could share the content of a saved chat with a colleague (the saved files are likely portable). If they put it in their <code>.gemini</code> directory and resume, they could see the same context. The <strong>practical simpler approach</strong> for collaboration is just copying the relevant Q&amp;A from the log and using a shared <code>GEMINI.md</code> or prompt, but it&#8217;s interesting to note that the session data is yours to keep.</p></li></ul><p><strong>Usage example:</strong></p><pre><code>/chat save api-upgrade</code></pre><p><em>(Session saved as &#8220;api-upgrade&#8221;)</em></p><pre><code>/quit</code></pre><p><em>(Later, reopen CLI)</em></p><pre><code>$ gemini
gemini&gt; /chat list</code></pre><p><em>(Shows: api-upgrade)</em></p><pre><code>gemini&gt; /chat resume api-upgrade</code></pre><p>Now the model greets you with the last exchange&#8217;s state ready. You can confirm by scrolling up that all your previous messages are present.</p><p><strong>Pro Tip:</strong> Use meaningful tags when saving <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=Naming%20conventions%20to%20keep%20projects,organized">chats</a>. Instead of <code>/chat save session1</code>, give it a name related to the topic (e.g. <code>/chat save memory-leak-bug</code>). This will help you find the right one later via <code>/chat list</code>. There is no strict limit announced on how many sessions you can save, but cleaning up old ones occasionally might be wise just for organization.</p><p>This feature turns Gemini CLI into a persistent advisor. You don&#8217;t lose knowledge gained in a conversation; you can always pause and resume. It&#8217;s a differentiator compared to some other AI interfaces that forget context when closed. For power users, it means <strong>you can maintain parallel threads of work</strong> with the AI. Just like you&#8217;d have multiple terminal tabs for different tasks, you can have multiple chat sessions saved and resume the one you need at any given time.</p><h2><strong>Tip 13: Multi-Directory Workspace - One Gemini, Many Folders</strong></h2><p><strong>Quick use-case:</strong> Do you have a project split across multiple repositories or directories? You can launch Gemini CLI with access to <em>all of them</em> at once, so it sees a unified workspace. For example, if your frontend and backend are separate folders, you can include both so that Gemini can edit or reference files in both.</p><p>There are two ways to use <strong>multi-directory mode</strong>:</p><ul><li><p><strong>Launch flag:</strong> Use the <code>--include-directories</code> (or <code>-I</code>) flag when starting Gemini CLI. For example:</p></li></ul><pre><code>gemini --include-directories &#8220;../backend:../frontend&#8221;</code></pre><p>This assumes you run the command from, say, a <code>scripts</code> directory and want to include two sibling folders. You provide a colon-separated list of paths. Gemini CLI will then treat all those directories as part of one big workspace.</p><ul><li><p><strong>Persistent setting:</strong> In your <code>settings.json</code>, you can define <code>&#8220;includeDirectories&#8221;: [&#8221;path1&#8221;, &#8220;path2&#8221;, [...]]</code>. This is useful if you always want certain common directories loaded (e.g., a shared library folder that multiple projects use). The paths can be relative or absolute. Environment variables in the paths (like <code>~/common-utils</code>) are <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=,61AFEF%22%2C%20%22AccentPurple">allowed</a>.</p></li></ul><p>When multi-dir mode is active, the CLI&#8217;s context and tools consider files across all included locations. The <code>&gt; /directory show</code> command will list which directories are in the current <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=How%20to%20add%20multiple%20directories,step">workspace</a>. You can also dynamically add directories during a session with <code>/directory add [&lt;path&gt;]</code> - it will then load that on the fly (potentially scanning it for context like it does on startup).</p><p><strong>Why use multi-directory mode?</strong> In microservice architectures or modular codebases, it&#8217;s common that one piece of code lives in one repo and another piece in a different repo. If you only ran Gemini in one, it wouldn&#8217;t &#8220;see&#8221; the others. By combining them, you enable cross-project reasoning. For example, you could ask, &#8220;Update the API client in the frontend to match the backend&#8217;s new API endpoints&#8221; - Gemini can open the backend folder to see the API definitions and simultaneously open the frontend code to modify it accordingly. Without multi-dir, you&#8217;d have to do one side at a time and manually carry info over.</p><p><strong>Example:</strong> Let&#8217;s say you have <code>client/</code> and <code>server/</code>. You start:</p><pre><code>cd client
gemini --include-directories &#8220;../server&#8221;</code></pre><p>Now at the <code>gemini&gt;</code> prompt, if you do <code>&gt; !ls</code>, you&#8217;ll see it can list files in both <code>client</code> and <code>server</code> (it might show them as separate paths). You could do:</p><pre><code>Open server/routes/api.py and client/src/api.js side by side to compare function names.</code></pre><p>The AI will have access to both files. Or you might say:</p><pre><code>The API changed: the endpoint &#8220;/users/create&#8221; is now &#8220;/users/register&#8221;. Update both backend and frontend accordingly.</code></pre><p>It can simultaneously create a patch in the backend route and adjust the frontend fetch call.</p><p>Under the hood, Gemini merges the file index of those directories. There might be some performance considerations if each directory is huge, but generally it handles multiple small-medium projects fine. The cheat sheet notes that this effectively creates one workspace with multiple <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=%22includeDirectories%22%3A%20%5B%22..%2Fshared,98C379%22%2C%20%22AccentYellow">roots</a>.</p><p><strong>Tip within a tip:</strong> Even if you don&#8217;t use multi-dir all the time, know that you can still reference files across the filesystem by absolute path in prompts (<code>@/path/to/file</code>). However, without multi-dir, Gemini might not have permission to edit those or know to load context from them proactively. Multi-dir formally includes them in scope so it&#8217;s aware of all files for tasks like search or code generation across the whole set.</p><p><strong>Remove directories:</strong> If needed, <code>/directory remove &lt;path&gt;</code> (or a similar command) can drop a directory from the workspace. This is less common, but maybe if you included something accidentally, you can remove it.</p><p>In summary, <strong>multi-directory mode unifies your context</strong>. It&#8217;s a must-have for polyrepo projects or any situation where code is split up. It makes Gemini CLI act more like an IDE that has your entire solution open. As a pro user, this means no part of your project is out of the AI&#8217;s reach.</p><h2><strong>Tip 14: Organize and Clean Up Your Files with AI Assistance</strong></h2><p><strong>Quick use-case:</strong> Tired of a messy <code>Downloads</code> folder or disorganized project assets? You can enlist Gemini CLI to act as a smart organizer. By providing it an overview of a directory, it can classify files and even move them into subfolders (with your approval). For instance, &#8220;Clean up my <code>Downloads</code>: move images to an <code>Images</code> folder, PDFs to <code>Documents</code>, and delete temporary files.&#8221;</p><p>Because Gemini CLI can read file names, sizes, and even peek into file contents, it can make informed decisions about file <a href="https://github.com/google-gemini/gemini-cli/discussions/7890#:~:text=We%20built%20a%20CLI%20tool,trash%20folder%20for%20manual%20deletion">organization</a>. One community-created tool dubbed <strong>&#8220;Janitor AI&#8221;</strong> showcases this: it runs via Gemini CLI to categorize files as important vs junk, and groups them <a href="https://github.com/google-gemini/gemini-cli/discussions/7890#:~:text=We%20built%20a%20CLI%20tool,trash%20folder%20for%20manual%20deletion">accordingly</a>. The process involved scanning the directory, using Gemini&#8217;s reasoning on filenames and metadata (and content if needed), then moving files into categories. Notably, it didn&#8217;t automatically delete junk - rather, it moved them to a <code>Trash</code> folder for <a href="https://github.com/google-gemini/gemini-cli/discussions/7890#:~:text=organize%20files,trash%20folder%20for%20manual%20deletion">review</a>.</p><p>Here&#8217;s how you might replicate such a workflow with Gemini CLI manually:</p><ol><li><p><strong>Survey the directory:</strong> Use a prompt to have Gemini list and categorize. For example:</p></li></ol><pre><code>List all files in the current directory and categorize them as &#8220;images&#8221;, &#8220;videos&#8221;, &#8220;documents&#8221;, &#8220;archives&#8221;, or &#8220;others&#8221;.</code></pre><p>Gemini might use <code>!ls</code> or similar to get the file list, then analyze the names/extensions to produce categories.</p><ol><li><p><strong>Plan the organization:</strong> Ask Gemini how it would like to reorganize. For example:</p></li></ol><pre><code>Propose a new folder structure for these files. I want to separate by type (Images, Videos, Documents, etc.). Also identify any files that seem like duplicates or unnecessary.</code></pre><p>The AI might respond with a plan: e.g., <em>&#8220;Create folders: </em><code>Images/</code><em>, </em><code>Videos/</code><em>, </em><code>Documents/</code><em>, </em><code>Archives/</code><em>. Move </em><code>X.png</code><em>, </em><code>Y.jpg</code><em> to </em><code>Images/</code><em>; move </em><code>A.mp4</code><em> to </em><code>Videos/</code><em>; etc. The file </em><code>temp.txt</code><em> looks unnecessary (maybe a temp file).&#8221;</em></p><ol><li><p><strong>Execute moves with confirmation:</strong> You can then instruct it to carry out the plan. It may use shell commands like <code>mv</code> for each file. Since this modifies your filesystem, you&#8217;ll get confirmation prompts for each (unless you YOLO it). Carefully approve the moves. After completion, your directory will be neatly organized as suggested.</p></li></ol><p>Throughout, Gemini&#8217;s natural language understanding is key. It can reason, for instance, that <code>IMG_001.png</code> is an image or that <code>presentation.pdf</code> is a document, even if not explicitly stated. It can even open an image (using its vision capability) to see what&#8217;s in it - e.g., differentiating between a screenshot vs a photo vs an icon - and name or sort it <a href="https://dev.to/therealmrmumba/7-insane-gemini-cli-tips-that-will-make-you-a-superhuman-developer-2d7h#:~:text=If%20your%20project%20folder%20is,using%20relevant%20and%20descriptive%20terms">accordingly</a>.</p><p><strong>Renaming files by content:</strong> A particularly magical use is having Gemini rename files to be more descriptive. The Dev Community article &#8220;7 Insane Gemini CLI Tips&#8221; describes how Gemini can <strong>scan images and automatically rename them</strong> based on their <a href="https://dev.to/therealmrmumba/7-insane-gemini-cli-tips-that-will-make-you-a-superhuman-developer-2d7h#:~:text=If%20your%20project%20folder%20is,using%20relevant%20and%20descriptive%20terms">content</a>. For example, a file named <code>IMG_1234.jpg</code> might be renamed to <code>login_screen.jpg</code> if the AI sees it&#8217;s a screenshot of a login <a href="https://dev.to/therealmrmumba/7-insane-gemini-cli-tips-that-will-make-you-a-superhuman-developer-2d7h#:~:text=If%20your%20project%20folder%20is,using%20relevant%20and%20descriptive%20terms">screen</a>. To do this, you could prompt:</p><pre><code>For each .png image here, look at its content and rename it to something descriptive.</code></pre><p>Gemini will open each image (via vision tool), get a description, then propose a <code>mv IMG_1234.png login_screen.png</code> <a href="https://dev.to/therealmrmumba/7-insane-gemini-cli-tips-that-will-make-you-a-superhuman-developer-2d7h#:~:text=If%20your%20project%20folder%20is,using%20relevant%20and%20descriptive%20terms">action</a>. This can dramatically improve the organization of assets, especially in design or photo folders.</p><p><strong>Two-pass approach:</strong> The Janitor AI discussion noted a two-step process: first broad categorization (important vs junk vs other), then refining <a href="https://github.com/google-gemini/gemini-cli/discussions/7890#:~:text=organize%20files,trash%20folder%20for%20manual%20deletion">groups</a>. You can emulate this: first separate files that likely can be deleted (maybe large installer <code>.dmg</code> files or duplicates) from those to keep. Then focus on organizing the keepers. Always double-check what the AI flags as junk; its guess might not always be right, so manual oversight is needed.</p><p><strong>Safety tip:</strong> When letting the AI loose on file moves or deletions, have backups or at least be ready to undo (with <code>/restore</code> or your own backup). It&#8217;s wise to do a dry-run: ask Gemini to print the commands it <em>would</em> run to organize, without executing them, so you can review. For instance: &#8220;List the <code>mv</code> and <code>mkdir</code> commands needed for this plan, but don&#8217;t execute them yet.&#8221; Once you review the list, you can either copy-paste execute them, or instruct Gemini to proceed.</p><p>This is a prime example of using Gemini CLI for &#8220;non-obvious&#8221; tasks - it&#8217;s not just writing code, it&#8217;s doing <strong>system housekeeping with AI smarts</strong>. It can save time and bring a bit of order to chaos. After all, as developers we accumulate clutter (logs, old scripts, downloads), and an AI janitor can be quite handy.</p><h2><strong>Tip 15: Compress Long Conversations to Stay Within Context</strong></h2><p><strong>Quick use-case:</strong> If you&#8217;ve been chatting with Gemini CLI for a long time, you might hit the model&#8217;s context length limit or just find the session getting unwieldy. Use the <code>/compress</code> command to summarize the conversation so far, replacing the full history with a concise <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Command%20Description%20,files">summary</a>. This frees up space for more discussion without starting from scratch.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lF0Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lF0Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 424w, https://substackcdn.com/image/fetch/$s_!lF0Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 848w, https://substackcdn.com/image/fetch/$s_!lF0Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 1272w, https://substackcdn.com/image/fetch/$s_!lF0Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lF0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png" width="866" height="418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:418,&quot;width&quot;:866,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!lF0Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 424w, https://substackcdn.com/image/fetch/$s_!lF0Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 848w, https://substackcdn.com/image/fetch/$s_!lF0Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 1272w, https://substackcdn.com/image/fetch/$s_!lF0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27408abe-16e2-4b3a-95ae-3b0b3ac4f8eb_866x418.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Large language models have a fixed context window (Gemini 2.5 Pro&#8217;s is very large, but not infinite). If you exceed it, the model may start forgetting earlier messages or lose coherence. The <code>/compress</code> feature is essentially an <strong>AI-generated tl;dr</strong> of your session that keeps important points.</p><p><strong>How it works:</strong> When you type <code>/compress</code>, Gemini CLI will take the entire conversation (except system context) and produce a summary. It then replaces the chat history with that summary as a single system or assistant message, preserving essential details but dropping minute-by-minute dialogue. It will indicate that compression happened. For example, after <code>/compress</code>, you might see something like:</p><p>--- Conversation compressed ---<br>Summary of discussion: The user and assistant have been debugging a memory leak in an application. Key points: The issue is likely in <code>DataProcessor.js</code>, where objects aren&#8217;t being freed. The assistant suggested adding logging and identified a possible infinite loop. The user is about to test a fix.<br>--- End of summary ---</p><p>From that point on, the model only has that summary (plus new messages) as context for what happened before. This usually is enough if the summary captured the salient info.</p><p><strong>When to compress:</strong> Ideally before you <em>hit</em> the limit. If you notice the session is getting lengthy (several hundred turns or a lot of code in context), compress proactively. The cheat sheet mentions an automatic compression setting (e.g., compress when context exceeds 60% of <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=%22includeDirectories%22%3A%20%5B%22..%2Fshared,98C379%22%2C%20%22AccentYellow">max</a>). If you enable that, Gemini might auto-compress and let you know. Otherwise, manual <code>/compress</code> is in your toolkit.</p><p><strong>After compressing:</strong> You can continue the conversation normally. If needed, you can compress multiple times in a very long session. Each time, you lose some granularity, so don&#8217;t compress too frequently for no reason - you might end up with an overly brief remembrance of a complex discussion. But generally the model&#8217;s own summarization is pretty good at keeping the key facts (and you can always restate anything critical yourself).</p><p><strong>Context window example:</strong> Let&#8217;s illustrate. Suppose you fed in a large codebase by referencing many files and had a 1M token context (the max). If you then want to shift to a different part of the project, rather than starting a new session (losing all that understanding), you could compress. The summary will condense the knowledge gleaned from the code (like &#8220;We loaded modules A, B, C. A has these functions... B interacts with C in these ways...&#8221;). Now you can proceed to ask about new things with that knowledge retained abstractly.</p><p><strong>Memory vs Compression:</strong> Note that compression doesn&#8217;t save to long-term memory, it&#8217;s local to the conversation. If you have facts you <em>never</em> want lost, consider Tip 4 (adding to <code>/memory</code>) - because memory entries will survive compression (they&#8217;ll just be reinserted anyway since they are in <code>GEMINI.md</code> context). Compression is more about ephemeral chat content.</p><p><strong>A minor caution:</strong> after compression, the AI&#8217;s style might slightly change because it&#8217;s effectively seeing a &#8220;fresh&#8221; conversation with a summary. It might reintroduce itself or change tone. You can instruct it like &#8220;Continue from here... (we compressed)&#8221; to smooth it out. In practice, it often continues fine.</p><p>To summarize (pun intended), <strong>use </strong><code>/compress</code><strong> as your session grows long</strong> to maintain performance and relevance. It helps Gemini CLI focus on the bigger picture instead of every detail of the conversation&#8217;s history. This way, you can have marathon debugging sessions or extensive design discussions without running out of the &#8220;mental paper&#8221; the AI is writing on.</p><h2><strong>Tip 16: Passthrough Shell Commands with </strong><code>!</code><strong> (Talk to Your Terminal)</strong></h2><p><strong>Quick use-case:</strong> At any point in a Gemini CLI session, you can run actual shell commands by prefixing them with <code>!</code>. For example, if you want to check the git status, just type <code>!git status</code> and it will execute in your <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Run%20a%20single%20command%3A">terminal</a>. This saves you from switching windows or context - you&#8217;re still in the Gemini CLI, but you&#8217;re essentially telling it &#8220;let me run this command real quick.&#8221;</p><p>This tip is about <strong>Shell Mode</strong> in Gemini CLI. There are two ways to use it:</p><ul><li><p><strong>Single command:</strong> Just put <code>!</code> at the start of your prompt, followed by any command and arguments. This will execute that command in the current working directory and display the output <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Run%20shell%20commands%20directly%20in,the%20CLI">in-line</a>. For example:</p></li></ul><pre><code>!ls -lh src/</code></pre><p>will list the files in the <code>src</code> directory, outputting something like you&#8217;d see in a normal terminal. After the output, the Gemini prompt returns so you can continue chatting or issue more commands.</p><ul><li><p><strong>Persistent shell mode:</strong> If you enter <code>!</code> alone and hit Enter, Gemini CLI switches into a sub-mode where you get a shell prompt (often it looks like <code>shell&gt;</code> or <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=">similar</a>. Now you can type multiple shell commands interactively. It&#8217;s basically a mini-shell within the CLI. You exit this mode by typing <code>!</code> on an empty line again (or <code>exit</code>). For instance:</p></li></ul><pre><code>!
shell&gt; pwd
/home/alice/project
shell&gt; python --version
Python 3.x.x
shell&gt; !</code></pre><p>After the final <code>!</code>, you&#8217;re back to the normal Gemini prompt.</p><p><strong>Why is this useful?</strong> Because development is a mix of actions and inquiries. You might be discussing something with the AI and realize you need to compile the code or run tests to see something. Instead of leaving the conversation, you can quickly do it and feed the result back into the chat. In fact, Gemini CLI often does this for you as part of its tool usage (it might automatically run <code>!pytest</code> when you ask to fix tests, for <a href="https://genmind.ch/posts/Howto-Supercharge-Your-Terminal-with-Gemini-CLI/#:~:text=">example</a>). But as the user, you have full control to do it manually too.</p><p><strong>Examples:</strong></p><ul><li><p>After Gemini suggests a fix in code, you can do <code>!npm run build</code> to see if it compiles, then copy any errors and ask Gemini to help with those.</p></li><li><p>If you want to open a file in <code>vim</code> or <code>nano</code>, you could even launch it via <code>!nano filename</code> (though note that since Gemini CLI has its own interface, using an interactive editor inside it might be a bit awkward - better to use the built-in editor integration or copy to your editor).</p></li><li><p>You can use shell commands to gather info for the AI: e.g., <code>!grep TODO -R .</code> to find all TODOs in the project, then you might ask Gemini to help address those TODOs.</p></li><li><p>Or simply use it for environment tasks: <code>!pip install some-package</code> if needed, etc., without leaving the CLI.</p></li></ul><p><strong>Seamless interplay:</strong> One cool aspect is how the conversation can refer to outputs. For example, you could do <code>!curl http://example.com</code> to fetch some data, see the output, then immediately say to Gemini, &#8220;Format the above output as JSON&#8221; - since the output was printed in the chat, the AI has it in context to work with (provided it&#8217;s not too large).</p><p><strong>Terminal as a default shell:</strong> If you find yourself always prefacing commands with <code>!</code>, you can actually make the shell mode persistent by default. One way is launching Gemini CLI with a specific tool mode (there&#8217;s a concept of default tool). But easier: just drop into shell mode (<code>!</code> with nothing) at session start if you plan to run a lot of manual commands and only occasionally talk to AI. Then you can exit shell mode whenever you want to ask a question. It&#8217;s almost like turning Gemini CLI into your normal terminal that happens to have an AI readily available.</p><p><strong>Integration with AI planning:</strong> Sometimes Gemini CLI itself will propose to run a shell command. If you approve, it effectively does the same as <code>!command</code>. Understanding that, you know you can always intervene. If Gemini is stuck or you want to try something, you don&#8217;t have to wait for it to suggest - you can just do it and then continue.</p><p>In summary, the <code>!</code> <strong>passthrough</strong> means <em>you don&#8217;t have to leave Gemini CLI for shell tasks</em>. It collapses the boundary between chatting with the AI and executing commands on your system. As a pro user, this is fantastic for efficiency - your AI and your terminal become one continuous environment.</p><h2><strong>Tip 17: Treat Every CLI Tool as a Potential Gemini Tool</strong></h2><p><strong>Quick use-case:</strong> Realize that Gemini CLI can leverage <strong>any</strong> command-line tool installed on your system as part of its problem-solving. The AI has access to the shell, so if you have <code>cURL</code>, <code>ImageMagick</code>, <code>git</code>, <code>Docker</code>, or any other tool, Gemini can invoke it when appropriate. In other words, <em>your entire </em><code>$PATH</code><em> is the AI&#8217;s toolkit</em>. This greatly expands what it can do - far beyond its built-in tools.</p><p>For example, say you ask: &#8220;Convert all PNG images in this folder to WebP format.&#8221; If you have ImageMagick&#8217;s <code>convert</code> utility installed, Gemini CLI might plan something like: use a shell loop with <code>convert</code> command for each <a href="https://genmind.ch/posts/Howto-Supercharge-Your-Terminal-with-Gemini-CLI/#:~:text=%3E%20%21for%20f%20in%20,png%7D.webp%22%3B%20done">file</a>. Indeed, one of the earlier examples from a blog showed exactly this, where the user prompted to batch-convert images, and Gemini executed a shell one-liner with the <code>convert</code> <a href="https://genmind.ch/posts/Howto-Supercharge-Your-Terminal-with-Gemini-CLI/#:~:text=">tool</a>.</p><p>Another scenario: &#8220;Deploy my app to Docker.&#8221; If <code>Docker CLI</code> is present, the AI could call <code>docker build</code> and <code>docker run</code> steps as needed. Or &#8220;Use FFmpeg to extract audio from <code>video.mp4</code>&#8220; - it can construct the <code>ffmpeg</code> command.</p><p>This tip is about mindset: <strong>Gemini isn&#8217;t limited to what&#8217;s coded into it</strong> (which is already extensive). It can figure out how to use other programs available to achieve a <a href="https://medium.com/google-cloud/gemini-cli-tutorial-series-part-4-built-in-tools-c591befa59ba#:~:text=In%20this%20part%2C%20we%20looked,In%20the%20next%20part%2C%20we">goal</a>. It knows common syntax and can read help texts if needed (it could call <code>--help</code> on a tool). The only limitation is safety: by default, it will ask confirmation for any <code>run_shell_command</code> it comes up with. But as you become comfortable, you might allow certain benign commands automatically (see YOLO or allowed-tools config).</p><p><strong>Be mindful of the environment:</strong> &#8220;With great power comes great responsibility.&#8221; Since every shell tool is fair game, you should ensure that your <code>$PATH</code> doesn&#8217;t include anything you wouldn&#8217;t want the AI to run inadvertently. This is where Tip 19 (custom PATH) comes in - some users create a restricted <code>$PATH</code> for Gemini, so it can&#8217;t, say, directly call system destructive commands or maybe not call <code>gemini</code> recursively (to avoid loops). The point is, by default if <code>gcc</code> or <code>terraform</code> or anything is in <code>$PATH</code>, Gemini could invoke it. It doesn&#8217;t mean it will randomly do so - only if the task calls for it - but it&#8217;s possible.</p><p><strong>Train of thought example:</strong> Imagine you ask Gemini CLI: &#8220;Set up a basic HTTP server that serves the current directory.&#8221; The AI might think: &#8220;I can use Python&#8217;s built-in server for this.&#8221; It then issues <code>!python3 -m http.server 8000</code>. Now it just used a system tool (Python) to launch a server. That&#8217;s an innocuous example. Another: &#8220;Check the memory usage on this Linux system.&#8221; The AI might use the <code>free -h</code> command or read from <code>/proc/meminfo</code>. It&#8217;s effectively doing what a sysadmin would do, by using available commands.</p><p><strong>All tools are extensions of the AI:</strong> This is somewhat futuristic, but consider that any command-line program can be seen as a &#8220;function&#8221; the AI can call to extend its capability. Need to solve a math problem? It could call <code>bc</code> (calculator). Need to manipulate an image? It could call an image processing tool. Need to query a database? If the CLI client is installed and credentials are there, it can use it. The possibilities are expansive. In other AI agent frameworks, this is known as tool use, and Gemini CLI is designed with a lot of trust in its agent to decide the right <a href="https://cloud.google.com/blog/topics/developers-practitioners/agent-factory-recap-deep-dive-into-gemini-cli-with-taylor-mullen#:~:text=The%20Gemini%20CLI%20%20is,understanding%20of%20the%20developer%20workflow">tool</a>.</p><p><strong>When it goes wrong:</strong> The flip side is if the AI misunderstands a tool or has a hallucination about one. It might try to call a command that doesn&#8217;t exist, or use wrong flags, resulting in errors. This isn&#8217;t a big deal - you&#8217;ll see the error and can correct or clarify. In fact, the system prompt of Gemini CLI likely guides it to first do a dry-run (just propose the command) rather than executing blindly. So you often get a chance to catch these. Over time, the developers are improving the tool selection logic to reduce these missteps.</p><p>The main takeaway is to <strong>think of Gemini CLI as having a very large Swiss Army knife</strong> - not just the built-in blades, but every tool in your OS. You don&#8217;t have to instruct it on how to use them if it&#8217;s something standard; usually it knows or can find out. This significantly amplifies what you can accomplish. It&#8217;s like having a junior dev or devops engineer who knows how to run pretty much any program you have installed.</p><p>As a pro user, you can even install additional CLI tools specifically to give Gemini more powers. For example, if you install a CLI for a cloud service (AWS CLI, GCloud CLI, etc.), in theory Gemini can utilize it to manage cloud resources if prompted to. Always ensure you understand and trust the commands run, especially with powerful tools (you wouldn&#8217;t want it spinning up huge cloud instances accidentally). But used wisely, this concept - <strong>everything is a Gemini tool</strong> - is what makes it <em>exponentially</em> more capable as you integrate it into your environment.</p><h2><strong>Tip 18: Utilize Multimodal AI - Let Gemini See Images and More</strong></h2><p><strong>Quick use-case:</strong> Gemini CLI isn&#8217;t limited to text - it&#8217;s multimodal. This means it can analyze images, diagrams, or even PDFs if given. Use this to your advantage. For instance, you could say &#8220;Here&#8217;s a screenshot of an error dialog, <code>@./error.png</code> - help me troubleshoot this.&#8221; The AI will &#8220;see&#8221; the image and respond accordingly.</p><p>One of the standout features of Google&#8217;s Gemini model (and its precursor PaLM2 in Codey form) is image understanding. In Gemini CLI, if you reference an image with <code>@</code>, the model receives the image data. It can output descriptions, classifications, or reason about the image&#8217;s content. We already discussed renaming images by content (Tip 14) and describing screenshots (Tip 7). But let&#8217;s consider other creative uses:</p><ul><li><p><strong>UI/UX feedback:</strong> If you&#8217;re a developer working with designers, you can drop a UI image and ask Gemini for feedback or to generate code. &#8220;Look at this UI mockup <code>@mockup.png</code> and produce a React component structure for it.&#8221; It could identify elements in the image (header, buttons, etc.) and outline code.</p></li><li><p><strong>Organizing images:</strong> Beyond renaming, you might have a folder of mixed images and want to sort by content. &#8220;Sort the images in <code>./photos/</code> into subfolders by theme (e.g., sunsets, mountains, people).&#8221; The AI can look at each photo and categorize it (this is similar to what some photo apps do with AI - now you can do it with your own script via Gemini).</p></li><li><p><strong>OCR and data extraction:</strong> If you have a screenshot of error text or a photo of a document, Gemini can often read the text from it. For example, &#8220;Extract the text from <code>invoice.png</code> and put it into a structured format.&#8221; As shown in a Google Cloud blog example, Gemini CLI can process a set of invoice images and output a table of their <a href="https://medium.com/google-cloud/gemini-cli-tutorial-series-part-4-built-in-tools-c591befa59ba#:~:text=Press%20enter%20or%20click%20to,view%20image%20in%20full%20size">info</a>. It basically did OCR + understanding to get invoice numbers, dates, amounts from pictures of invoices. That&#8217;s an advanced use-case but entirely possible with the multimodal model under the hood.</p></li><li><p><strong>Understanding graphs or charts:</strong> If you have a graph screenshot, you could ask &#8220;Explain this chart&#8217;s key insights <code>@chart.png</code>.&#8221; It might interpret the axes and trends. Accuracy can vary, but it&#8217;s a nifty try.</p></li></ul><p>To make this practical: when you <code>@image.png</code>, ensure the image isn&#8217;t too huge (though the model can handle reasonably large images). The CLI will likely encode it and send it to the model. The response might include descriptions or further actions. You can mix text and image references in one prompt too.</p><p><strong>Non-image modalities:</strong> The CLI and model potentially can handle PDFs and audio too, by converting them via tools. For example, if you <code>@report.pdf</code>, Gemini CLI might use a PDF-to-text tool under the hood to extract text and then summarize. If you <code>@audio.mp3</code> and ask for a transcript, it might use an audio-to-text tool (like a speech recognition function). The cheat sheet suggests referencing PDFs, audio, video files is <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Reference%20files%20or%20directories%20in,PDFs%2C%20audio%2C%20and%20video%20files">supported</a>, presumably by invoking appropriate internal tools or APIs. So, &#8220;transcribe this interview audio: <code>@interview.wav</code>&#8220; could actually work (if not now, likely soon, since underlying Google APIs for speech-to-text could be plugged in).</p><p><strong>Rich outputs:</strong> Multimodal also means the AI can return images in responses if integrated (though in CLI it usually won&#8217;t <em>display</em> them directly, but it could save an image file or output ASCII art, etc.). The MCP capability mentioned that tools can return <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Capabilities%3A">images</a>. For instance, an AI drawing tool could generate an image and Gemini CLI could present it (maybe by opening it or giving a link).</p><p><strong>Important:</strong> The CLI itself is text-based, so you won&#8217;t <em>see</em> the image in the terminal (unless it&#8217;s capable of ASCII previews). You&#8217;ll just get the analysis. So this is mostly about reading images, not displaying them. If you&#8217;re in VS Code integration, it might show images in the chat view.</p><p>In summary, <strong>don&#8217;t forget the &#8220;I&#8221; in GUI when using Gemini CLI</strong> - it can handle the visual just as well as the textual in many cases. This opens up workflows like visual debugging, design help, data extraction from screenshots, etc., all under the same tool. It&#8217;s a differentiator that some other CLI tools may not have yet. And as models improve, this multimodal support will only get more powerful, so it&#8217;s a future-proof skill to exploit.</p><h2><strong>Tip 19: Customize the </strong><code>$PATH</code><strong> (and Tool Availability) for Stability</strong></h2><p><strong>Quick use-case:</strong> If you ever find Gemini CLI getting confused or invoking the wrong programs, consider running it with a tailored <code>$PATH</code>. By limiting or ordering the available executables, you can prevent the AI from, say, calling a similarly named script that you didn&#8217;t intend. Essentially, you sandbox its tool access to known-good tools.</p><p>For most users, this isn&#8217;t an issue, but for pro users with lots of custom scripts or multiple versions of tools, it can be helpful. One reason mentioned by the developers is avoiding infinite loops or weird <a href="https://github.com/google-gemini/gemini-cli/discussions/7890#:~:text=We%20built%20a%20CLI%20tool,trash%20folder%20for%20manual%20deletion">behavior</a>. For example, if <code>gemini</code> itself is in <code>$PATH</code>, an AI gone awry might recursively call <code>gemini</code> from within Gemini (a strange scenario, but theoretically possible). Or perhaps you have a command named <code>test</code> that conflicts with something - the AI might call the wrong one.</p><p><strong>How to set PATH for Gemini:</strong> Easiest is inline on launch:</p><pre><code>PATH=/usr/bin:/usr/local/bin gemini</code></pre><p>This runs Gemini CLI with a restricted <code>$PATH</code> of just those directories. You might exclude directories where experimental or dangerous scripts lie. Alternatively, create a small shell script wrapper that purges or adjusts <code>$PATH</code> then exec&#8217;s <code>gemini</code>.</p><p>Another approach is using environment or config to explicitly disable certain tools. For instance, if you absolutely never want the AI to use <code>rm</code> or some destructive tool, you could technically create an alias or dummy <code>rm</code> in a safe <code>$PATH</code> that does nothing (though this could interfere with normal operations, so maybe not that one). A better method is the <strong>exclude list</strong> in settings. In an extension or <code>settings.json</code>, you can exclude tool <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=">names</a>. E.g.,</p><pre><code>&#8220;excludeTools&#8221;: [&#8221;run_shell_command&#8221;]</code></pre><p>This extreme example would stop <em>all</em> shell commands from running (making Gemini effectively read-only). More granular, there was mention of skipping confirmation for some; similarly you might configure something like:</p><pre><code>&#8220;tools&#8221;: {
  &#8220;exclude&#8221;: [&#8221;apt-get&#8221;, &#8220;shutdown&#8221;]
}</code></pre><p><em>(This syntax is illustrative; consult docs for exact usage.)</em></p><p>The principle is, by controlling the environment, you reduce risk of the AI doing something dumb with a tool it shouldn&#8217;t. It&#8217;s akin to child-proofing the house.</p><p><strong>Prevent infinite loops:</strong> One user scenario was a loop where Gemini kept reading its own output or re-reading files <a href="https://support.google.com/gemini/thread/337650803/infinite-loops-with-tool-code-in-answers?hl=en#:~:text=Community%20support,screen%20with%20weird%20scrolling">repeatedly</a>. Custom <code>$PATH</code> can&#8217;t directly fix logic loops, but one cause could be if the AI calls a command that triggers itself. Ensuring it can&#8217;t accidentally spawn another AI instance (like calling <code>bard</code> or <code>gemini</code> command, if it thought to do so) is good. Removing those from <code>$PATH</code> (or renaming them for that session) helps.</p><p><strong>Isolation via sandbox:</strong> Another alternative to messing with <code>$PATH</code> is using <code>--sandbox</code> mode (which uses Docker or Podman to run tools in an isolated <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=echo%20,gemini">environment</a>). In that case, the AI&#8217;s actions are contained and have only the tools that sandbox image provides. You could supply a Docker image with a curated set of tools. This is heavy-handed but very safe.</p><p><strong>Custom PATH for specific tasks:</strong> You might have different <code>$PATH</code> setups for different projects. For example, in one project you want it to use a specific version of Node or a local toolchain. Launching <code>gemini</code> with the <code>$PATH</code> that points to those versions will ensure the AI uses the right one. Essentially, treat Gemini CLI like any user - it uses whatever environment you give it. So if you need it to pick <code>gcc-10</code> vs <code>gcc-12</code>, adjust <code>$PATH</code> or <code>CC</code> env var accordingly.</p><p><strong>In summary:</strong> <em>Guard rails.</em> As a power user, you have the ability to fine-tune the operating conditions of the AI. If you ever find a pattern of undesirable behavior tied to tool usage, tweaking <code>$PATH</code> is a quick remedy. For everyday use, you likely won&#8217;t need this, but it&#8217;s a pro tip to keep in mind if you integrate Gemini CLI into automation or CI: give it a controlled environment. That way, you know exactly what it can and cannot do, which increases reliability.</p><div><hr></div><h2><strong>Tip 20: Track and reduce token spend with token caching and stats</strong></h2><p>If you run long chats or repeatedly attach the same big files, you can cut cost and latency by turning on token caching and monitoring usage. With an API key or Vertex AI auth, Gemini CLI automatically reuses previously sent system instructions and context, so follow&#8209;up requests are cheaper. You can see the savings live in the CLI.</p><p><strong>How to use it</strong></p><p>Use an auth mode that enables caching. Token caching is available when you authenticate with a Gemini API key or Vertex AI. It is not available with OAuth login today. <a href="https://google-gemini.github.io/gemini-cli/docs/cli/token-caching.html">Google Gemini</a></p><p>Inspect your usage and cache hits. Run the <code>stats</code> command during a session. It shows total tokens and a <code>cached</code> field when caching is active.</p><pre><code>/stats</code></pre><p>The command&#8217;s description and cached reporting behavior are documented in the commands reference and FAQ. <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html?utm_source=chatgpt.com">Google Gemini+1</a></p><p>Capture metrics in scripts. When running headless, output JSON and parse the <code>stats</code> block, which includes <code>tokens.cached</code> for each model:</p><pre><code>gemini -p &#8220;Summarize README&#8221; --output-format json</code></pre><p>The headless guide documents the JSON schema with cached token counts. <a href="https://google-gemini.github.io/gemini-cli/docs/cli/headless.html">Google Gemini</a></p><p>Save a session summary to file: For CI or budget tracking, write a JSON session summary to disk.</p><pre><code>gemini -p &#8220;Analyze logs&#8221; --session-summary usage.json</code></pre><p>This flag is listed in the changelog. <a href="https://google-gemini.github.io/gemini-cli/docs/changelogs/">Google Gemini</a></p><p>With API key or Vertex auth, the CLI automatically reuses previously sent context so later turns send fewer tokens. Keeping <code>GEMINI.md</code> and large file references stable across turns increases cache hits; you&#8217;ll see that reflected in stats as cached tokens.</p><h2><strong>Tip 21: Use </strong><code>/copy</code><strong> for Quick Clipboard Copy</strong></h2><p><strong>Quick use-case:</strong> Instantly copy the latest answer or code snippet from Gemini CLI to your system clipboard, without any extraneous formatting or line <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,for%20easy%20sharing%20or%20reuse">numbers</a>. This is perfect for quickly pasting AI-generated code into your editor or sharing a result with a teammate.</p><p>When Gemini CLI provides an answer (especially a multi-line code block), you often want to reuse it elsewhere. The <code>/copy</code> slash command makes this effortless by copying <em>the last output produced by the CLI</em> directly to your <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,for%20easy%20sharing%20or%20reuse">clipboard</a>. Unlike manual selection (which can grab line numbers or prompt text), <code>/copy</code> grabs only the raw response content. For example, if Gemini just generated a 50-line Python script, simply typing <code>/copy</code> will put that entire script into your clipboard, ready to paste - no need to scroll and select text. Under the hood, Gemini CLI uses the appropriate clipboard utility for your platform (e.g. <code>pbcopy</code> on macOS, <code>clip</code> on <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,clip">Windows</a>. Once you run the command, you&#8217;ll typically see a confirmation message, and then you can paste the copied text wherever you need it.</p><p><strong>How it works:</strong> The <code>/copy</code> command requires that your system has a clipboard tool <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,clip">available</a>. On macOS and Windows, the required tools (<code>pbcopy</code> and <code>clip</code> respectively) are usually pre-installed. On Linux, you may need to install <code>xclip</code> or <code>xsel</code> for <code>/copy</code> to <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,clip">function</a>. After ensuring that, you can use <code>/copy</code> anytime after Gemini CLI prints an answer. It will capture the <em>entire</em> last response (even if it&#8217;s long) and omit any internal numbering or formatting the CLI may show on-screen. This saves you from dealing with unwanted artifacts when transferring the content. It&#8217;s a small feature, but a huge time-saver when you&#8217;re iterating on code or compiling a report generated by the AI.</p><p><strong>Pro Tip:</strong> If you find the <code>/copy</code> command isn&#8217;t working, double-check that your clipboard utilities are installed and accessible. For instance, Ubuntu users should run <code>sudo apt install xclip</code> to enable clipboard <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,clip">copying</a>. Once set up, <code>/copy</code> lets you share Gemini&#8217;s outputs with zero friction - copy, paste, and you&#8217;re done.</p><h2><strong>Tip 22: Master </strong><code>Ctrl+C</code><strong> for Shell Mode and Exiting</strong></h2><p><strong>Quick use-case:</strong> Cleanly interrupt Gemini CLI or exit shell mode with a single keypress - and quit the CLI entirely with a quick double-tap - thanks to the versatile <strong>Ctrl+C</strong> <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Shortcut%20Description%20,Press%20twice%20to%20confirm">shortcut</a>. This gives you immediate control when you need to stop or exit.</p><p>Gemini CLI operates like a REPL, and knowing how to break out of operations is essential. Pressing <strong>Ctrl+C</strong> once will cancel the current action or clear any input you&#8217;ve started typing, essentially acting as an &#8220;abort&#8221; <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Shortcut%20Description%20,Press%20twice%20to%20confirm">command</a>. For example, if the AI is generating a lengthy answer and you&#8217;ve seen enough, hit <code>Ctrl+C</code> - the generation stops immediately. If you had started typing a prompt but want to discard it, <code>Ctrl+C</code> will wipe the input line so you can start <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Shortcut%20Description%20,Press%20twice%20to%20confirm">fresh</a>. Additionally, if you are in <strong>shell mode</strong> (activated by typing <code>!</code> to run shell commands), a single <code>Ctrl+C</code> will exit shell mode and return you to the normal Gemini prompt (it sends an interrupt to the shell process <a href="https://milvus.io/ai-quick-reference/how-do-i-use-gemini-cli-for-shell-command-generation#:~:text=The%20shell%20integration%20also%20includes,where%20you%20can%20generate%20commands">running</a>. This is extremely handy if a shell command is hanging or you simply want to get back to AI mode.</p><p>Pressing <strong>Ctrl+C twice</strong> in a row is the shortcut to exit Gemini CLI <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Shortcut%20Description%20,Press%20twice%20to%20confirm">entirely</a>. Think of it as &#8220;<code>Ctrl+C</code> to cancel, and <code>Ctrl+C</code> again to quit.&#8221; This double-tap signals the CLI to terminate the session (you&#8217;ll see a goodbye message or the program will close). It&#8217;s a faster alternative to typing <code>/quit</code> or closing the terminal window, allowing you to gracefully shut down the CLI from the keyboard. Do note that a single <code>Ctrl+C</code> will not quit if there&#8217;s input to clear or an operation to interrupt - it requires that second press (when the prompt is idle) to fully <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Shortcut%20Description%20,Press%20twice%20to%20confirm">exit</a>. This design prevents accidentally closing the session when you only meant to stop the current output.</p><p><strong>Pro Tip:</strong> In shell mode, you can also press the <strong>Esc</strong> key to leave shell mode and return to Gemini&#8217;s chat mode without terminating the <a href="https://milvus.io/ai-quick-reference/how-do-i-use-gemini-cli-for-shell-command-generation#:~:text=The%20shell%20integration%20also%20includes,where%20you%20can%20generate%20commands">CLI</a>. And if you prefer a more formal exit, the <code>/quit</code> command is always available to cleanly end the session. Lastly, Unix users can use <strong>Ctrl+D</strong> (EOF) at an empty prompt to exit as well - Gemini CLI will prompt for confirmation if <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Shortcut%20Description%20,Press%20twice%20to%20confirm">needed</a>. But for most cases, mastering the single- and double-tap of <code>Ctrl+C</code> is the quickest way to stay in control.</p><h2><strong>Tip 23: Customize Gemini CLI with </strong><code>settings.json</code></h2><p><strong>Quick use-case:</strong> Adapt the CLI&#8217;s behavior and appearance to your preferences or project conventions by editing the <code>settings.json</code> config file, instead of sticking with one-size-fits-all <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=%2A%20%60autoAccept%60%3A%20Auto,to%20disable%20usage%20statistics">defaults</a>. This lets you enforce things like theme, tool usage rules, or editor mode across all your sessions.</p><p>Gemini CLI is highly configurable. In your home directory (<code>~/.gemini/</code>) or project folder (<code>.gemini/</code> within your repo), you can create a <code>settings.json</code> file to override default <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Customize%20the%20CLI%20by%20creating,applied%20with%20the%20following%20precedence">settings</a>. Nearly every aspect of the CLI can be tuned here - from visual theme to tool permissions. The CLI merges settings from multiple levels: system-wide defaults, your user settings, and project-specific settings (project settings override user <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Customize%20the%20CLI%20by%20creating,applied%20with%20the%20following%20precedence">settings</a>. For example, you might have a global preference for a dark theme, but a particular project might require stricter tool sandboxing; you can handle this via different <code>settings.json</code> files at each level.</p><p>Inside <code>settings.json</code>, options are specified as JSON key-value pairs. Here&#8217;s a snippet illustrating some useful customizations:</p><pre><code>{
&#8220;theme&#8221;: &#8220;GitHub&#8221;,
&#8220;autoAccept&#8221;: false,
&#8220;vimMode&#8221;: true,
&#8220;sandbox&#8221;: &#8220;docker&#8221;,
&#8220;includeDirectories&#8221;: [&#8221;../shared-library&#8221;, &#8220;~/common-utils&#8221;],
&#8220;usageStatisticsEnabled&#8221;: true
}</code></pre><p>In this example, we set the theme to &#8220;GitHub&#8221; (a popular color scheme), disable <code>autoAccept</code> (so the CLI will always ask before running potentially altering tools), enable Vim keybindings for the input editor, and enforce using Docker for tool sandboxing. We also added some directories to the workspace context (<code>includeDirectories</code>) so Gemini can see code in shared paths by <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=%7B%20,utils">default</a>. Finally, we kept <code>usageStatisticsEnabled</code> true to collect basic usage stats (which feeds into telemetry, if <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=%2A%20%60autoAccept%60%3A%20Auto,to%20disable%20usage%20statistics">enabled</a>. There are many more settings available - like defining custom color themes, adjusting token limits, or whitelisting/blacklisting specific tools - all documented in the configuration <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=%2A%20%60autoAccept%60%3A%20Auto,to%20disable%20usage%20statistics">guide</a>. By tailoring these, you ensure Gemini CLI behaves optimally for <em>your</em> workflow (for instance, some developers always want <code>vimMode</code> on for efficiency, while others might prefer the default editor).</p><p>One convenient way to edit settings is via the built-in settings UI. Run the command <code>/settings</code> in Gemini CLI, and it will open an interactive editor for your <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,their%20current%20values%2C%20and%20modify">configuration</a>. This interface lets you browse and search settings with descriptions, and prevents JSON syntax errors by validating inputs. You can tweak colors, toggle features like <code>yolo</code> (auto-approval), adjust checkpointing (file save/restore behavior), and more through a friendly <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,their%20current%20values%2C%20and%20modify">menu</a>. Changes are saved to your <code>settings.json</code>, and some take effect immediately (others might require restarting the CLI).</p><p><strong>Pro Tip:</strong> Maintain separate project-specific <code>settings.json</code> files for different needs. For example, on a team project you might set <code>&#8220;sandbox&#8221;: &#8220;docker&#8221;</code> and <code>&#8220;excludeTools&#8221;: [&#8221;run_shell_command&#8221;]</code> to lock down dangerous operations, while your personal projects might allow direct shell commands. Gemini CLI will automatically pick up the nearest <code>.gemini/settings.json</code> in your project directory tree and merge it with your global <code>~/.gemini/settings.json</code>. Also, don&#8217;t forget you can quickly adjust visual preferences: try <code>/theme</code> to interactively switch themes without editing the file, which is great for finding a comfortable <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Command%20Description%20,tag%3E%60Save%20the%20current%20conversation">look</a>. Once you find one, put it in <code>settings.json</code> to make it permanent.</p><h2><strong>Tip 24: Leverage IDE Integration (VS Code) for Context &amp; Diffs</strong></h2><p><strong>Quick use-case:</strong> Supercharge Gemini CLI by hooking it into VS Code - the CLI will automatically know which files you&#8217;re working on and even open AI-proposed code changes in VS Code&#8217;s diff editor for <a href="https://developers.googleblog.com/en/gemini-cli-vs-code-native-diffing-context-aware-workflows/?source=post_page-----26afd3422028---------------------------------------#:~:text=,working%20on%20at%20the%20moment">you</a>. This creates a seamless loop between AI assistant and your coding workspace.</p><p>One of Gemini CLI&#8217;s powerful features is its <strong>IDE integration</strong> with Visual Studio Code. By installing the official <em>Gemini CLI Companion</em> extension in VS Code and connecting it, you allow Gemini CLI to become &#8220;context-aware&#8221; of your <a href="https://developers.googleblog.com/en/gemini-cli-vs-code-native-diffing-context-aware-workflows/?source=post_page-----26afd3422028---------------------------------------#:~:text=,working%20on%20at%20the%20moment">editor</a>. What does this mean in practice? When connected, Gemini knows about the files you have open, your current cursor location, and any text you&#8217;ve selected in VS <a href="https://developers.googleblog.com/en/gemini-cli-vs-code-native-diffing-context-aware-workflows/?source=post_page-----26afd3422028---------------------------------------#:~:text=,working%20on%20at%20the%20moment">Code</a>. All that information is fed into the AI&#8217;s context. So if you ask, &#8220;Explain this function,&#8221; Gemini CLI can see the exact function you&#8217;ve highlighted and give a relevant answer, without you needing to copy-paste code into the prompt. The integration shares up to your 10 most recently opened files, plus selection and cursor info, giving the model a rich understanding of your <a href="https://gemini-cli.xyz/docs/en/ide-integration#:~:text=,reject%20the%20suggested%20changes%20seamlessly">workspace</a>.</p><p>Another huge benefit is <strong>native diffing</strong> of code changes. When Gemini CLI suggests modifications to your code (for example, &#8220;refactor this function&#8221; and it produces a patch), it can open those changes in VS Code&#8217;s diff viewer <a href="https://developers.googleblog.com/en/gemini-cli-vs-code-native-diffing-context-aware-workflows/?source=post_page-----26afd3422028---------------------------------------#:~:text=%2A%20Native%20in,the%20code%20right%20within%20this">automatically</a>. You&#8217;ll see a side-by-side diff in VS Code showing the proposed edits. You can then use VS Code&#8217;s familiar interface to review the changes, make any manual tweaks, and even accept the patch with a click. The CLI and editor stay in sync - if you accept the diff in VS Code, Gemini CLI knows and continues the session with those changes applied. This tight loop means you no longer have to copy code from the terminal to your editor; the AI&#8217;s suggestions flow straight into your development environment.</p><p><strong>How to set it up:</strong> If you start Gemini CLI inside VS Code&#8217;s integrated terminal, it will detect VS Code and usually prompt you to install/connect the extension <a href="https://medium.com/google-cloud/gemini-cli-tutorial-series-part-10-gemini-cli-vs-code-integration-26afd3422028#:~:text=Press%20enter%20or%20click%20to,view%20image%20in%20full%20size">automatically</a>. You can agree and it will run the necessary <code>/ide install</code> step. If you don&#8217;t see a prompt (or you&#8217;re enabling it later), simply open Gemini CLI and run the command: <code>/ide install</code>. This will fetch and install the &#8220;Gemini CLI Companion&#8221; extension into VS Code for <a href="https://developers.googleblog.com/en/gemini-cli-vs-code-native-diffing-context-aware-workflows/?source=post_page-----26afd3422028---------------------------------------#:~:text=2%3A%20One,install%20the%20necessary%20companion%20extension">you</a>. Next, run <code>/ide enable</code> to establish the <a href="https://developers.googleblog.com/en/gemini-cli-vs-code-native-diffing-context-aware-workflows/?source=post_page-----26afd3422028---------------------------------------#:~:text=3%3A%20Toggle%20integration%3A%20After%20the,can%20easily%20manage%20the%20integration">connection</a> - the CLI will then indicate it&#8217;s linked to VS Code. You can verify at any time with <code>/ide status</code>, which will show if it&#8217;s connected and list which editor and files are being <a href="https://gemini-cli.xyz/docs/en/ide-integration#:~:text=Checking%20the%20Status">tracked</a>. From then on, Gemini CLI will automatically receive context from VS Code (open files, selections) and will open diffs in VS Code when needed. It essentially turns Gemini CLI into an AI pair programmer that lives in your terminal but operates with full awareness of your IDE.</p><p>Currently, VS Code is the primary supported editor for this <a href="https://gemini-cli.xyz/docs/en/ide-integration#:~:text=better%20and%20enables%20powerful%20features,editor%20diffing">integration</a>. (Other editors that support VS Code extensions, like VSCodium or some JetBrains via a plugin, may work via the same extension, but officially it&#8217;s VS Code for now.) The design is open though - there&#8217;s an IDE Companion Spec for developing similar integrations with other <a href="https://gemini-cli.xyz/docs/en/ide-integration#:~:text=better%20and%20enables%20powerful%20features,editor%20diffing">editors</a>. So down the road we might see first-class support for IDEs like IntelliJ or Vim via community extensions.</p><p><strong>Pro Tip:</strong> Once connected, you can use VS Code&#8217;s Command Palette to control Gemini CLI without leaving the <a href="https://gemini-cli.xyz/docs/en/ide-integration#:~:text=,Ctrl%2BShift%2BP">editor</a>. For example, press <strong>Ctrl+Shift+P</strong> (Cmd+Shift+P on Mac) and try commands like <strong>&#8220;Gemini CLI: Run&#8221;</strong> (to launch a new CLI session in the terminal), <strong>&#8220;Gemini CLI: Accept Diff&#8221;</strong> (to approve and apply an open diff), or <strong>&#8220;Gemini CLI: Close Diff Editor&#8221;</strong> (to reject <a href="https://gemini-cli.xyz/docs/en/ide-integration#:~:text=,Ctrl%2BShift%2BP">changes</a>. These shortcuts can streamline your workflow even further. And remember, you don&#8217;t always have to start the CLI manually - if you enable the integration, Gemini CLI essentially becomes an AI co-developer inside VS Code, watching context and ready to help as you work on code.</p><h2><strong>Tip 25: Automate Repo Tasks with </strong><code>Gemini CLI GitHub Action</code></h2><p><strong>Quick use-case:</strong> Put Gemini to work on GitHub - use the <strong>Gemini CLI GitHub Action</strong> to autonomously triage new issues and review pull requests in your repository, acting as an AI teammate that handles routine dev <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=1,write%20tests%20for%20this">tasks</a>.</p><p>Gemini CLI isn&#8217;t just for interactive terminal sessions; it can also run in CI/CD pipelines via GitHub Actions. Google has provided a ready-made <strong>Gemini CLI GitHub Action</strong> (currently in beta) that integrates into your repo&#8217;s <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=It%E2%80%99s%20now%20in%20beta%2C%20available,cli">workflows</a>. </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;723a644c-821d-4567-a544-298f360ab582&quot;,&quot;duration&quot;:null}"></div><p>This effectively deploys an AI agent into your project on GitHub. It runs in the background, triggered by repository <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=Triggered%20by%20events%20like%20new,do%2C%20and%20gets%20it%20done">events</a>. For example, when someone opens a <strong>new issue</strong>, the Gemini Action can automatically analyze the issue description, apply relevant labels, and even prioritize it or suggest duplicates (this is the &#8220;intelligent issue triage&#8221; <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=1,attention%20on%20what%20matters%20most">workflow</a>. When a <strong>pull request</strong> is opened, the Action kicks in to provide an <strong>AI code review</strong> - it will comment on the PR with insights about code quality, potential bugs, or stylistic <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=attention%20on%20what%20matters%20most,more%20complex%20tasks%20and%20decisions">improvements</a>. This gives maintainers immediate feedback on the PR before any human even looks at it. Perhaps the coolest feature is <strong>on-demand collaboration</strong>: team members can mention <code>@gemini-cli</code> in an issue or PR comment and give it an instruction, like &#8220;<code>@gemini-cli</code> please write unit tests for this&#8221;. The Action will pick that up and Gemini CLI will attempt to fulfill the request (adding a commit with new tests, for <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=freeing%20up%20reviewers%20to%20focus,write%20tests%20for%20this">instance</a>. It&#8217;s like having an AI assistant living in your repo, ready to do chores when asked.</p><p>Setting up the Gemini CLI GitHub Action is straightforward. First, ensure you have Gemini CLI version <strong>0.1.18 or later</strong> installed locally (this ensures compatibility with the <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=Gemini%20CLI%20GitHub%20Actions%20is,for%20individual%20users%20available%20soon">Action</a>. Then, in Gemini CLI run the special command: <code>/setup-github</code>. This command generates the necessary workflow files in your repository (it will guide you through authentication if needed). Specifically, it adds YAML workflow files (for issue triage, PR review, etc.) under <code>.github/workflows/</code>. You will need to add your Gemini API key to the repo&#8217;s secrets (as <code>GEMINI_API_KEY</code>) so the Action can use the Gemini <a href="https://github.com/google-github-actions/run-gemini-cli#:~:text=Store%20your%20API%20key%20as,in%20your%20repository">API</a>. Once that&#8217;s done and the workflows are committed, the GitHub Action springs to life - from that point on, Gemini CLI will autonomously respond to new issues and PRs according to those workflows.</p><p>Because this Action is essentially running Gemini CLI in an automated way, you can customize it just like you would your CLI. The default setup comes with three workflows (issue triage, PR review, and a general mention-triggered assistant) which are **fully open-source and <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=Think%20of%20these%20initial%20workflows,into%20Gemini%20CLI%20GitHub%20Actions">editable**</a>. You can tweak the YAML to adjust what the AI does, or even add new workflows. For instance, you might create a nightly workflow that uses Gemini CLI to scan your repository for outdated dependencies or to update a README based on recent code changes - the possibilities are endless. The key benefit here is offloading mundane or time-consuming tasks to an AI agent so that human developers can focus on harder problems. And since it runs on GitHub&#8217;s infrastructure, it doesn&#8217;t require your intervention - it&#8217;s truly a &#8220;set and forget&#8221; AI helper.</p><p><strong>Pro Tip:</strong> Keep an eye on the Action&#8217;s output in the GitHub Actions logs for transparency. The Gemini CLI Action logs will show what prompts it ran and what changes it made or suggested. This can both build trust and help you refine its behavior. Also, the team has built enterprise-grade safeguards into the Action - e.g., you can require that all shell commands the AI tries to run in a workflow are allow-listed by <a href="https://blog.google/technology/developers/introducing-gemini-cli-github-actions/#:~:text=in%20your%20environment%2C%20drastically%20reducing,your%20preferred%20observability%20platform%2C%20like">you</a>. So don&#8217;t hesitate to use it even on serious projects. And if you come up with a cool custom workflow using Gemini CLI, consider contributing it back to the community - the project welcomes new ideas in their repo!</p><h2><strong>Tip 26: Enable Telemetry for Insights and Observability</strong></h2><p><strong>Quick use-case:</strong> Gain deeper insight into how Gemini CLI is being used and performing by turning on its built-in <strong>OpenTelemetry</strong> instrumentation - monitor metrics, logs, and traces of your AI sessions to analyze usage patterns or troubleshoot <a href="https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text=,across%20teams%2C%20track%20costs%2C%20ensure">issues</a>.</p><p>For developers who like to measure and optimize, Gemini CLI offers an observability feature that exposes what&#8217;s happening under the hood. By leveraging <strong>OpenTelemetry (OTEL)</strong>, Gemini CLI can emit structured telemetry data about your <a href="https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text=Built%20on%20OpenTelemetry%20%E2%80%94%20the,Gemini%20CLI%E2%80%99s%20observability%20system%20provides">sessions</a>. This includes things like metrics (e.g. how many tokens used, response latency), logs of actions taken, and even traces of tool calls. With telemetry enabled, you can answer questions like: <em>Which custom command do I use most often? How many times did the AI edit files in this project this week? What&#8217;s the average response time when I ask the CLI to run tests?</em> Such data is invaluable for understanding usage patterns and <a href="https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text=,across%20teams%2C%20track%20costs%2C%20ensure">performance</a>. Teams can use it to see how developers are interacting with the AI assistant and where bottlenecks might be.</p><p>By default, telemetry is <strong>off</strong> (Gemini respects privacy and performance). You can opt-in by setting <code>&#8220;telemetry.enabled&#8221;: true</code> in your <code>settings.json</code> or by starting Gemini CLI with the flag <code>--telemetry</code>. Additionally, you choose the <strong>target</strong> for the telemetry data: it can be logged <strong>locally</strong> or sent to a backend like Google Cloud. For a quick start, you might set <code>&#8220;telemetry.target&#8221;: &#8220;local&#8221;</code> - with this, Gemini will simply write telemetry data to a local file (by default) or to a custom path you specify via <code>[&#8221;outfile&#8221;]</code>. The local telemetry includes JSON logs you can parse or feed into tools. For more robust monitoring, set <code>&#8220;target&#8221;: &#8220;gcp&#8221;</code> (Google Cloud) or even integrate with other OpenTelemetry-compatible systems like Jaeger or <a href="https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text=,between%20backends%20without%20changing%20your">Datadog</a>. In fact, Gemini CLI&#8217;s OTEL support is vendor-neutral - you can export data to just about any observability stack you prefer (Google Cloud Operations, Prometheus, <a href="https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text=,between%20backends%20without%20changing%20your">etc.</a>. Google provides a streamlined path for Cloud: if you point to GCP, the CLI can send data directly to Cloud Logging and Cloud Monitoring in your project, where you can use the usual dashboards and alerting <a href="https://google-gemini.github.io/gemini-cli/docs/cli/telemetry.html#:~:text=2,explorer%20%2A%20Traces%3A%20https%3A%2F%2Fconsole.cloud.google.com%2Ftraces%2Flist">tools</a>.</p><p>What kind of insights can you get? The telemetry captures events like tool executions, errors, and important milestones. It also records metrics such as prompt processing time and token counts per <a href="https://medium.com/google-cloud/gemini-cli-tutorial-series-part-13-gemini-cli-observability-c410806bc112#:~:text=,integrate%20with%20existing%20monitoring%20infrastructure">prompt</a>. For usage analytics, you might aggregate how many times each slash command is used across your team, or how often code generation is invoked. For performance monitoring, you could track if responses have gotten slower, which might indicate hitting API rate limits or model changes. And for debugging, you can see errors or exceptions thrown by tools (e.g., a <code>run_shell_command</code> failure) logged with context. All this data can be visualized if you send it to a platform like Google Cloud&#8217;s Monitoring - for example, you can create a dashboard of &#8220;tokens used per day&#8221; or &#8220;error rate of tool X&#8221;. It essentially gives you a window into the AI&#8217;s &#8220;brain&#8221; and your usage, which is especially helpful in enterprise settings to ensure everything runs <a href="https://medium.com/google-cloud/gemini-cli-tutorial-series-part-13-gemini-cli-observability-c410806bc112#:~:text=resource%20utilization%20%2A%20%20Real,integrate%20with%20existing%20monitoring%20infrastructure">smoothly</a>.</p><p>Enabling telemetry does introduce some overhead (extra data processing), so you might not keep it on 100% of the time for personal use. However, it&#8217;s fantastic for debugging sessions or for intermittent health checks. One approach is to enable it on a CI server or in your team&#8217;s shared environment to collect stats, while leaving it off locally unless needed. Remember, you can always toggle it on the fly: update settings and use <code>/memory refresh</code> if needed to reload, or restart Gemini CLI with <code>--telemetry</code> flag. Also, all telemetry is under your control - it respects your environment variables for endpoint and credentials, so data goes only where you intend it to. This feature turns Gemini CLI from a black box into an observatory, shining light on how the AI agent interacts with your world, so you can continuously improve that interaction.</p><p><strong>Pro Tip:</strong> If you just want a quick view of your current session&#8217;s stats (without full telemetry), use the <code>/stats</code> command. It will output metrics like token usage and session length right in the <a href="https://www.howtouselinux.com/post/the-complete-google-gemini-cli-cheat-sheet-and-guide#:~:text=Command%20Description%20,tag%3E%60Save%20the%20current%20conversation">CLI</a>. This is a lightweight way to see immediate numbers. But for long-term or multi-session analysis, telemetry is the way to go. And if you&#8217;re sending telemetry to a cloud project, consider setting up dashboards or alerts (e.g., alert if error rate spikes or token usage hits a threshold) - this can proactively catch issues in how Gemini CLI is being used in your team.</p><h2><strong>Tip 27: Keep an Eye on the Roadmap (Background Agents &amp; More)</strong></h2><p><strong>Quick use-case:</strong> Stay informed about upcoming Gemini CLI features - by following the public <strong>Gemini CLI roadmap</strong>, you&#8217;ll know about major planned enhancements (like <em>background agents for long-running tasks</em>) before they <a href="https://google-gemini.github.io/gemini-cli/ROADMAP.html#:~:text=quality.%20,related%20to%20security%20and%20privacy">arrive</a>, allowing you to plan and give feedback.</p><p>Gemini CLI is evolving rapidly, with new releases coming out frequently, so it&#8217;s wise to track what&#8217;s on the horizon. Google maintains a <strong>public roadmap</strong> for Gemini CLI on GitHub, detailing the key focus areas and features targeted for the near <a href="https://google-gemini.github.io/gemini-cli/ROADMAP.html#:~:text=This%20document%20outlines%20our%20approach,live%20in%20our%20GitHub%20Issues">future</a>. This is essentially a living document (and set of issues) where you can see what the developers are working on and what&#8217;s in the pipeline. </p><p>For instance, one exciting item on the roadmap is support for <strong>background agents</strong> - the ability to spawn autonomous agents that run in the background to handle tasks continuously or <a href="https://google-gemini.github.io/gemini-cli/ROADMAP.html#:~:text=quality.%20,related%20to%20security%20and%20privacy">asynchronously</a>. According to the roadmap discussion, these background agents would let you delegate long-running processes to Gemini CLI without tying up your interactive session. You could, say, start a background agent that monitors your project for certain events or periodically executes tasks, either on your local machine or even by deploying to a service like Cloud <a href="https://github.com/google-gemini/gemini-cli/issues/4168#:~:text=How%20will%20it%20work%3F">Run</a>. This feature aims to &#8220;enable long-running, autonomous tasks and proactive assistance&#8221; right from the <a href="https://google-gemini.github.io/gemini-cli/ROADMAP.html#:~:text=quality.%20,related%20to%20security%20and%20privacy">CLI</a>, essentially extending Gemini CLI&#8217;s usefulness beyond just on-demand queries.</p><p>By keeping tabs on the roadmap, you&#8217;ll also learn about other planned features. These could include new tool integrations, support for additional Gemini model versions, UI/UX improvements, and more. The roadmap is usually organized by &#8220;areas&#8221; (for example, <em>Extensibility</em>, <em>Model</em>, <em>Background</em>, etc.) and often tagged with milestones (like a target quarter for <a href="https://google-gemini.github.io/gemini-cli/ROADMAP.html#:~:text=Our%20roadmap%20is%20managed%20directly,more%20detailed%20list%20of%20tasks">delivery</a>]. It&#8217;s not a guarantee of when something will land, but it gives a good idea of the team&#8217;s priorities. Since the project is open-source, you can even dive into the linked GitHub issues for each roadmap item to see design proposals and progress. For developers who rely on Gemini CLI, this transparency means you can anticipate changes - maybe an API is adding a feature you need, or a breaking change might be coming that you want to prepare for.</p><p>Following the roadmap can be as simple as bookmarking the GitHub project board or issue labeled &#8220;Roadmap&#8221; and checking periodically. Some major updates (like the introduction of Extensions or the IDE integration) were hinted at in the roadmap before they were officially announced, so you get a sneak peek. Additionally, the Gemini CLI team often encourages community feedback on those future features. If you have ideas or use cases for something like background agents, you can usually comment on the issue or discussion thread to influence its development.</p><p><strong>Pro Tip:</strong> Since Gemini CLI is open source (Apache 2.0 licensed), you can do more than just watch the roadmap - you can participate! The maintainers welcome contributions, especially for items aligned with the <a href="https://google-gemini.github.io/gemini-cli/ROADMAP.html#:~:text=As%20an%20Apache%202,opening%20an%20issue%20for%20discussion">roadmap</a>. If there&#8217;s a feature you really care about, consider contributing code or testing once it&#8217;s in preview. At the very least, you can open a feature request if something you need isn&#8217;t on the roadmap <a href="https://google-gemini.github.io/gemini-cli/ROADMAP.html#:~:text=As%20an%20Apache%202,opening%20an%20issue%20for%20discussion">yet</a>. The roadmap page itself provides guidance on how to propose changes. Engaging with the project not only keeps you in the loop but also lets you shape the tool that you use. After all, Gemini CLI is built with community involvement in mind, and many recent features (like certain extensions and tools) started as community suggestions.</p><h2><strong>Tip 28: Extend Gemini CLI with </strong><code>Extensions</code></h2><p><strong>Quick use-case:</strong> Add new capabilities to Gemini CLI by installing plug-and-play <strong>extensions</strong> - for example, integrate with your favorite database or cloud service - expanding the AI&#8217;s toolset without any heavy lifting on your <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=Gemini%20CLI%20is%20an%20open,design%20platforms%20to%20payment%20services">part</a>. It&#8217;s like installing apps for your CLI to teach it new tricks.</p><p>Extensions are a game-changer introduced in late 2025: they allow you to <strong>customize and expand</strong> Gemini CLI&#8217;s functionality in a modular <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=Gemini%20CLI%20is%20an%20open,design%20platforms%20to%20payment%20services">way</a>. An extension is essentially a bundle of configurations (and optionally code) that connects Gemini CLI to an external tool or service. One of my <a href="https://x.com/rseroter/status/1973809454564134970">favorite examples</a> was the Nano Banana extension, as highlighted by Richard Seroter:</p><blockquote><p>Fire up the Gemini CLI and install the nano-banana extension. I just did. Generate or edit images, create icons, even produce technical diagrams or mockups.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Cuq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Cuq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 424w, https://substackcdn.com/image/fetch/$s_!0Cuq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 848w, https://substackcdn.com/image/fetch/$s_!0Cuq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 1272w, https://substackcdn.com/image/fetch/$s_!0Cuq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Cuq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png" width="1456" height="1210" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1210,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!0Cuq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 424w, https://substackcdn.com/image/fetch/$s_!0Cuq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 848w, https://substackcdn.com/image/fetch/$s_!0Cuq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 1272w, https://substackcdn.com/image/fetch/$s_!0Cuq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a01a315-17e0-45b3-89a2-19c8d569717f_1820x1512.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Google also released a suite of extensions for Google Cloud - there&#8217;s one that helps deploy apps to Cloud Run, one for managing BigQuery, one for analyzing application security, and <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=In%20just%20three%20months%20since,source%20community">more</a>. Partners and community developers have built extensions for all sorts of things: Dynatrace (monitoring), Elastic (search analytics), Figma (design assets), Shopify, Snyk (security scans), Stripe (payments), and the list is <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=In%20just%20three%20months%20since,source%20community">growing</a>. By installing an appropriate extension, you instantly grant Gemini CLI the ability to use new domain-specific tools. The beauty is that these extensions come with a pre-defined <strong>&#8220;playbook&#8221;</strong> that teaches the AI how to use the new tools <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=Gemini%20CLI%20is%20an%20open,design%20platforms%20to%20payment%20services">effectively</a>. That means once installed, you can ask Gemini CLI to perform tasks with those services and it will know the proper APIs or commands to invoke, as if it had that knowledge built-in.</p><p>Using extensions is very straightforward. The CLI has a command to manage them: <code>gemini extensions install &lt;URL&gt;</code>. Typically, you provide the URL of the extension&#8217;s GitHub repo or a local path, and the CLI will fetch and install <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=It%E2%80%99s%20easy%20to%20install%20an,%E2%80%9D%20from%20your%20command%20line">it</a>. For example, to install an official extension, you might run: <code>gemini extensions install https://github.com/google-gemini/gemini-cli-extension-cloud-run</code>. Within seconds, the extension is added to your environment (stored under <code>~/.gemini/extensions/</code> or your project&#8217;s <code>.gemini/extensions/</code> folder). You can then see it by running <code>/extensions</code> in the CLI, which lists active <a href="https://google-gemini.github.io/gemini-cli/docs/cli/commands.html#:~:text=,See%20Gemini%20CLI%20Extensions">extensions</a>. From that point on, the AI has new tools at its disposal. If it&#8217;s a Cloud Run extension, you could say &#8220;Deploy my app to Cloud Run,&#8221; and Gemini CLI will actually be able to execute that (by calling the underlying <code>gcloud</code> commands through the extension&#8217;s tools). Essentially, extensions function as first-class expansions of Gemini CLI&#8217;s capabilities, but you opt-in to the ones you need.</p><p>There&#8217;s an <strong>open ecosystem</strong> around extensions. Google has an official Extensions page listing available <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=Access%20an%20open%2C%20growing%20ecosystem,of%20partners%20and%20builders">extensions</a>, and because the framework is open, anyone can create and share their own. If you have a particular internal API or workflow, you can build an extension for it so that Gemini CLI can assist with it. Writing an extension is easier than it sounds: you typically create a directory (say, <code>my-extension/</code>) with a file <code>gemini-extension.json</code> describing what tools or context to <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Extensions">add</a>. You might define new slash commands or specify remote APIs the AI can call. No need to modify Gemini CLI&#8217;s core - just drop in your extension. The CLI is designed to load these at runtime. Many extensions consist of adding custom <em>MCP tools</em> (Model Context Protocol servers or functions) that the AI can use. For example, an extension could add a <code>/translate</code> command by hooking into an external translation API; once installed, the AI knows how to use <code>/translate</code>. The key benefit is <strong>modularity</strong>: you install only the extensions you want, keeping the CLI lightweight, but you have the option to integrate virtually anything.</p><p>To manage extensions, besides the <code>install</code> command, you can update or remove them via similar CLI commands (<code>gemini extensions update</code> or just by removing the folder). It&#8217;s wise to occasionally check for updates on extensions you use, as they may receive improvements. The CLI might introduce an &#8220;extensions marketplace&#8221; style interface in the future, but for now, exploring the GitHub repositories and official catalog is the way to discover new ones. Some popular ones at launch include the GenAI <strong>Genkit</strong> extension (for building generative AI apps), and a variety of Google Cloud extensions that cover CI/CD, database admin, and more.</p><p><strong>Pro Tip:</strong> If you&#8217;re building your own extension, start by looking at existing ones for examples. The official documentation provides an <strong>Extensions Guide</strong> with the schema and <a href="https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=Extensions">capabilities</a>. A simple way to create a private extension is to use the <code>@include</code> functionality in <code>GEMINI.md</code> to inject scripts or context, but a full extension gives you more power (like packaging tools). Also, since extensions can include context files, you can use them to preload domain knowledge. Imagine an extension for your company&#8217;s internal API that includes a summary of the API and a tool to call it - the AI would then know how to handle requests related to that API. In short, extensions open up a new world where Gemini CLI can interface with anything. Keep an eye on the extensions marketplace for new additions, and don&#8217;t hesitate to share any useful extension you create with the community - you might just help thousands of other <a href="https://blog.google/technology/developers/gemini-cli-extensions/#:~:text=Gemini%20CLI%20extensions%20are%20here,and%20build%20your%20own%20extension">developers</a>.</p><h2><strong>Additional Fun: Corgi Mode Easter Egg &#128021;</strong></h2><p>Lastly, not a productivity tip but a delightful easter egg - try the command <code>*/corgi*</code> in Gemini CLI. This toggles <strong>&#8220;corgi mode&#8221;</strong>, which makes a cute corgi animation run across your <a href="https://medium.com/@ferreradaniel/gemini-cli-free-ai-tool-upgrade-5-new-features-you-need-right-now-04cfefac5e93#:~:text=Easter%20Egg%3A%20Corgi%20Mode%20in,Gemini%20CLI">terminal</a>! It doesn&#8217;t help you code any better, but it can certainly lighten the mood during a long coding session. You&#8217;ll see an ASCII art corgi dashing in the CLI interface. To turn it off, just run <code>/corgi</code> again.</p><p>This is a purely for-fun feature the team added (and yes, there&#8217;s even a tongue-in-cheek <a href="https://github.com/google-gemini/gemini-cli/issues/5674#:~:text=How%20about%20you%20NOT%20implement,this%20needed%3F%20Because%20people">debate</a> about spending dev time on corgi mode). It shows that the creators hide some whimsy in the tool. So when you need a quick break or a smile, give <code>/corgi</code> a try. &#128021;&#127881;</p><p><em>(Rumor has it there might be other easter eggs or modes - who knows? Perhaps a &#8220;/partyparrot&#8221; or similar. The cheat sheet or help command lists </em><code>/corgi</code><em>, so it&#8217;s not a secret, just underused. Now you&#8217;re in on the joke!)</em></p><div><hr></div><h1><strong>Conclusion</strong></h1><p>We&#8217;ve covered a comprehensive list of pro tips and features for Gemini CLI. From setting up persistent context with <code>GEMINI.md</code>, to writing custom commands and using advanced tools like MCP servers, to leveraging multi-modal inputs and automating workflows, there&#8217;s a lot this AI command-line assistant can do. As an external developer, you can integrate Gemini CLI into your daily routine - it&#8217;s like a powerful ally in your terminal that can handle tedious tasks, provide insights, and even troubleshoot your environment.</p><p>Gemini CLI is evolving rapidly (being open-source with community contributions), so new features and improvements are constantly on the horizon. By mastering the pro tips in this guide, you&#8217;ll be well-positioned to harness the full potential of this tool. It&#8217;s not just about using an AI model - it&#8217;s about integrating AI deeply into how you develop and manage software.</p><p>Happy coding with Gemini CLI, and have fun exploring just how far your &#8220;AI agent in the terminal&#8221; can take you.</p><p><strong>You now have a Swiss-army knife of AI at your fingertips - use it wisely, and it will make you a more productive (and perhaps happier) developer</strong>!</p><p><em>I&#8217;m excited to share I&#8217;ve released a new <a href="https://beyond.addy.ie/">AI-assisted engineering book</a> with O&#8217;Reilly. There are a number of free tips on the book site in case interested.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B4nC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B4nC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!B4nC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!B4nC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!B4nC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B4nC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png" width="1456" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:414461,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/176589430?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B4nC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!B4nC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!B4nC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!B4nC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0e235e-62c4-4acb-bc97-395c406eda51_5246x3496.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[How modern browsers work]]></title><description><![CDATA[A web developers guide to browser internals]]></description><link>https://addyo.substack.com/p/how-modern-browsers-work</link><guid isPermaLink="false">https://addyo.substack.com/p/how-modern-browsers-work</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Sat, 13 Sep 2025 14:30:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YPnw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em><strong>Note: </strong>For those eager to dive deep into how browsers work, an <strong>excellent</strong> resource is <strong>Browser Engineering</strong> by Pavel Panchekha and Chris Harrelson (available at <a href="https://browser.engineering/">browser.engineering</a><strong>). </strong>Please do check it out. This article is an overview of how browsers work.</em></p><p>Web developers often treat the browser as a <strong>black box</strong> that magically transforms HTML, CSS, and JavaScript into interactive web applications. In truth, a modern web browser like Chrome (<a href="https://www.chromium.org/chromium-projects/">Chromium</a>), Firefox (<a href="https://firefox-source-docs.mozilla.org/overview/gecko.html">Gecko</a>) or Safari (<a href="https://webkit.org/">WebKit</a>) is a complex piece of software. It orchestrates networking, parses and executes code, renders graphics with GPU acceleration, and isolates content in sandboxed processes for security.</p><p>This article dives into <strong>how modern browsers work</strong> - focusing on <strong>Chromium</strong>'s architecture and internals, while noting where other engines differ. We'll explore everything from the networking stack and parsing pipeline to the rendering process via <a href="https://www.chromium.org/blink/">Blink</a>, JavaScript engine via <a href="http://v8.dev">V8</a>, module loading, multi-process architecture, security sandboxing, and developer tooling. The goal is a developer-friendly explanation that demystifies what happens behind the scenes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zbep!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zbep!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zbep!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zbep!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zbep!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zbep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:624454,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/173324218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zbep!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zbep!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zbep!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zbep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75ce677-87a8-497a-ab9f-495c406c056c_2650x1502.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let's begin our journey through the browser's internals.</p><h2><strong>Networking and Resource Loading</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ixg8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ixg8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 424w, https://substackcdn.com/image/fetch/$s_!ixg8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 848w, https://substackcdn.com/image/fetch/$s_!ixg8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 1272w, https://substackcdn.com/image/fetch/$s_!ixg8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ixg8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png" width="1456" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ixg8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 424w, https://substackcdn.com/image/fetch/$s_!ixg8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 848w, https://substackcdn.com/image/fetch/$s_!ixg8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 1272w, https://substackcdn.com/image/fetch/$s_!ixg8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb2ffcb-56f8-4d93-8dd4-a64e0a2c44cc_1600x742.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every page load begins with the browser's networking stack fetching resources from the web. When you enter a URL or click a link, the browser's UI thread (running in the "<a href="https://www.chromium.org/developers/design-documents/multi-process-architecture/">browser process</a>") kicks off a navigation request.</p><blockquote><p>The <strong>browser process</strong> is the main, controlling process that manages all other processes and the browser's user interface. Everything that happens outside of a specific web page tab is controlled by the browser process. </p></blockquote><p>The steps include:</p><p><strong>URL parsing and security checks</strong>: The browser parses the URL to determine the scheme (http, https, etc.) and target domain. It also decides if the input is a search query or URL (in Chrome's omnibox, for example). Security features like blocklists may be checked here to avoid phishing sites.</p><p><strong>DNS lookup</strong>: The network stack resolves the domain name to an IP address (unless it's cached). This may involve contacting a DNS server. Modern browsers might use OS DNS services or even DNS over HTTPS (DoH) if configured, but ultimately they obtain an IP for the host.</p><p><strong>Establishing a connection</strong>: If no open connection to the server exists, the browser opens one. For HTTPS URLs, this includes a TLS handshake to securely exchange keys and verify certificates. The browser's network thread handles protocols like TCP/TLS setup transparently.</p><p><strong>Sending the HTTP request</strong>: Once connected, an HTTP GET request (or other method) is sent for the resource. Browsers today default to HTTP/2 or HTTP/3 if the server supports it, which allows multiplexing multiple resource requests over one connection. This improves performance by avoiding the old limit of ~6 parallel connections per host (HTTP/1.1). For example, with HTTP/2 the HTML, CSS, JS, images can all be fetched concurrently on one TCP/TLS link, and with HTTP/3 (over QUIC UDP) setup latency is further reduced.</p><p><strong>Receiving the response</strong>: The server responds with an HTTP status and headers, followed by the response body (HTML content, JSON data, etc.). The browser reads the response stream. It may need to sniff the MIME type if the Content-Type header is missing or incorrect, to decide how to handle the content. For example, if a response looks like HTML but isn't labeled as such, the browser will still try to treat it as HTML (per permissive web standards). There are security measures here too: the network layer checks Content-Type and may block suspicious MIME mismatches or disallowed cross-origin data (Chrome's CORB - Cross-Origin Read Blocking - is one such mechanism). The browser also consults Safe Browsing or similar services to block known malicious payloads.</p><p><strong>Redirects and next steps</strong>: If the response is an HTTP redirect (e.g. 301 or 302 with a Location header), the network code will follow the redirect (after informing the UI thread) and repeat the request to the new URL. Only once a final response with actual content is obtained does the browser move on to processing that content.</p><p>All these steps happen in the network stack, which in Chromium is run in a dedicated Network Service (now typically a separate process, as part of Chrome's "<a href="https://www.chromium.org/servicification/">servicification</a>" effort). The browser process's network thread coordinates the low-level work of socket communication, using the OS networking APIs under the hood. Importantly, this design means the renderer (which will execute the page's code) doesn't directly access the network - it asks the browser process to fetch what it needs, a security win.</p><h3><strong>Speculative Loading and Resource Optimization</strong></h3><p>Modern browsers implement sophisticated performance optimizations in the networking stage. Chrome will proactively perform a DNS prefetch or open a TCP connection when you hover over a link or start typing a URL (using the Predictor or preconnect mechanisms) so that if you click, some latency is already shaved off. There's also HTTP caching: the network stack can satisfy requests from the browser cache if the resource is cached and fresh, avoiding a network trip.</p><p><strong>Preload scanner operation</strong>: Chromium implements a sophisticated <a href="https://web.dev/articles/preload-scanner">preload scanner</a> that tokenizes HTML markup ahead of the main parser. When the primary HTML parser is blocked by CSS or synchronous JavaScript, the preload scanner continues examining the raw markup to identify resources like images, scripts, and stylesheets that can be fetched in parallel. This mechanism is fundamental to modern browser performance and operates automatically without developer intervention. The preload scanner cannot discover resources injected via JavaScript, making such resources likely to be loaded consecutively rather than concurrently.</p><p><strong>Early Hints (HTTP 103)</strong>: <a href="https://developer.chrome.com/docs/web-platform/early-hints">Early Hints</a> allows servers to send resource hints while generating the main response, using HTTP 103 status codes. This enables preconnect and preload hints to be sent during server think-time, potentially improving Largest Contentful Paint by several hundred milliseconds. Early Hints are only available for navigation requests and support preconnect and preload directives, but not prefetch.</p><p><strong>Speculation Rules API</strong>: <a href="https://developer.chrome.com/docs/web-platform/implementing-speculation-rules">The Speculation Rules API</a> is a recent web standard that allows defining rules to dynamically prefetch and prerender URLs based on user interaction patterns. Unlike traditional link prefetch, this API can prerender entire pages including JavaScript execution, leading to near-instant load times. The API uses JSON syntax within script elements or HTTP headers to specify which URLs should be speculatively loaded. Chrome has limits to prevent overuse, with different capacity settings based on urgency levels.</p><p><strong>HTTP/2 and HTTP/3</strong>: Most Chromium-based browsers and Firefox support HTTP/2 fully, and <a href="https://alexandrehtrb.github.io/posts/2024/03/http2-and-http3-explained/">HTTP/3</a> (based on QUIC) is also widely supported (Chrome has it enabled by default for supporting sites). These protocols improve page load by allowing concurrent transfers and reducing handshake overhead. From a developer perspective, this means you may no longer need sprite sheets or domain sharding tricks - the browser can efficiently fetch many small files in parallel on one connection.</p><p><strong>Resource prioritization</strong>: The browser also prioritizes certain resources over others. Typically, HTML and CSS are high priority (as they block rendering), scripts might be medium (or high if marked defer/async appropriately), and images maybe lower. Chromium's network stack assigns weights and can even cancel or defer requests to prioritize what's needed for an initial render. Developers can use <a href="https://web.dev/articles/preload-critical-assets">link rel=preload</a> and <a href="https://web.dev/articles/fetch-priority">Fetch Priority</a> to influence resource prioritization.</p><p>By the end of the networking phase, the browser has the initial HTML for the page (assuming it was an HTML navigation). At this point, Chrome's browser process chooses a renderer process to handle the content. Chrome will often launch a new renderer process in parallel with the network request (speculatively) so that it's ready to go when the data arrives. This renderer process is isolated (more on multi-process architecture later) and will take over for parsing and rendering the page.</p><p>Once the response is fully received (or as it streams in), the browser process commits the navigation: it signals the renderer process to take the stream of bytes and start processing the page. At this moment, the address bar updates and the security indicator (HTTPS lock, etc.) is shown for the new site. Now the action moves to the renderer process: parsing the HTML, loading sub-resources, executing scripts, and painting the page.</p><h2><strong>Parsing HTML, CSS, and JavaScript</strong></h2><p>When the renderer process receives the HTML content, its main thread begins to parse it according to the HTML specification. The result of parsing HTML is the DOM (Document Object Model) - a tree of objects representing the page structure. Parsing is incremental and can interleave with network reading (browsers parse HTML in a streaming fashion, so the DOM can start being built even before the entire HTML file is downloaded).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A1DI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A1DI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 424w, https://substackcdn.com/image/fetch/$s_!A1DI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 848w, https://substackcdn.com/image/fetch/$s_!A1DI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 1272w, https://substackcdn.com/image/fetch/$s_!A1DI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A1DI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png" width="1456" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A1DI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 424w, https://substackcdn.com/image/fetch/$s_!A1DI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 848w, https://substackcdn.com/image/fetch/$s_!A1DI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 1272w, https://substackcdn.com/image/fetch/$s_!A1DI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9bf2c4-1b89-434d-81be-cf841789a6b2_1600x742.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>HTML parsing and DOM construction</strong>: HTML parsing is defined by the HTML Standard as a error-tolerant process that will produce a DOM no matter how malformed the markup is. This means even if you forget a closing &lt;/p&gt; tag or have nested tags incorrectly, the parser will implicitly fix or adjust the DOM tree so that it's valid. For example, &lt;p&gt;Hello &lt;div&gt;World&lt;/div&gt; will automatically end the &lt;p&gt; before the &lt;div&gt; in the DOM structure. The parser creates DOM elements and text nodes for each tag or text in the HTML. Each element is placed in a tree reflecting the nesting in the source.</p><p>One important aspect is that the HTML parser may encounter resources to fetch as it goes: for instance, encountering a &lt;link rel="stylesheet" href="..."&gt; will prompt the browser to request the CSS file (on the network thread), and encountering an &lt;img src="..."&gt; will trigger an image request. These happen in parallel to parsing. The parser can keep going while those loads occur, with one big exception: scripts.</p><p><strong>Handling &lt;script&gt; tags</strong>: If the HTML parser comes across a &lt;script&gt; tag, it pauses parsing and must execute the script before continuing (by default). This is because scripts can use document.write() or other DOM manipulation that can alter the page structure or content that's still coming in. By executing immediately at that point, the browser preserves the correct order of operations relative to the HTML. The parser therefore hands off the script to the JavaScript engine for execution, and only when the script finishes (and any DOM changes it did are applied) can HTML parsing resume. This script execution blocking behavior is why including large &lt;script&gt; files in the head can slow down page rendering - the HTML parsing can't continue until the script is downloaded and run.</p><p>However, developers can modify this behavior with attributes: adding <a href="https://web.dev/articles/efficiently-load-third-party-javascript">defer or async</a> to a &lt;script&gt; tag (or using modern ES module scripts) changes how the browser handles it. With async, the script file is fetched in parallel and executed as soon as it's ready, without pausing HTML parsing (the parse doesn't wait, and the script doesn't guarantee execution in original order relative to other async scripts). With defer, the script is fetched in parallel but execution is deferred until the HTML parsing is done (and will execute in the original order at that later time). In both cases, the parser isn't blocked waiting on the script, which is generally better for performance. ES6 modules (using &lt;script type="module"&gt;) are automatically deferred as well (and they can also use import statements - we'll cover module loading separately). By using these techniques, the browser can continue building the DOM without long pauses, making pages load faster.</p><p><strong>CSS Parsing and the CSSOM</strong>: Alongside HTML, CSS text must be parsed into a structure the browser can work with - often called the CSSOM (CSS Object Model). The <a href="https://web.dev/articles/critical-rendering-path/constructing-the-object-model">CSSOM</a> is essentially a representation of all the styles (rules, selectors, properties) that apply to the document. The browser's CSS parser reads CSS files (or &lt;style&gt; blocks) and turns them into a list of CSS rules (and lots of bloom filters etc to speed up style resolution). Then, as the DOM is being constructed (or once both DOM and CSSOM are ready), the browser will compute the style for each DOM node. This step is usually called style resolution or style calculation. The browser combines the DOM and CSSOM to determine, for each element, which CSS rules apply and what the final computed styles are (after applying the cascade, inheritance, and default styles). The output is often conceptualized as an association of each DOM node with a computed style (the resolved, final CSS properties for that element, e.g. an element's color, font, size, etc.).</p><p>It's worth noting that even without any author CSS, every element has default browser styles (the user-agent stylesheet). For example, a &lt;h1&gt; has a default font-size and margin in practically all browsers. The browser's built-in style rules are applied with the lowest priority, and they ensure some reasonable default presentation. Developers can view computed styles in DevTools to see exactly what CSS properties an element ends up with. The style calculation step uses all applicable styles (user agent, user styles, author styles) to finalize each element's styling.</p><p><strong>Render-blocking behavior</strong>: While HTML parsing can proceed without CSS fully loaded, there is a <a href="https://web.dev/learn/performance/understanding-the-critical-path">render-blocking relationship</a>: browsers typically wait to perform the first render until CSS is loaded (for CSS in the &lt;head&gt;). This is because applying an incomplete stylesheet could flash unstyled content. In practice, if a &lt;script&gt; that is not marked async/defer appears before a CSS &lt;link&gt; in HTML, it will additionally wait for the CSS to load before executing the script (since scripts may query style information via DOM APIs). As a rule of thumb, put stylesheet links in the head (they block rendering but are needed early) and put non-critical or large scripts with defer/async or at the bottom so they don't delay DOM parsing.</p><p>Now the browser has (1) the DOM constructed from HTML, (2) the CSS rules parsed (CSSOM), and (3) the computed styles for each DOM node. These together form the basis for the next stage: layout. But before moving on, we should consider the JavaScript side in more detail - specifically how the JS engine (V8 in Chrome's case) executes code. We touched on script blocking, but what happens when the JS runs? We'll dedicate a later section to the internals of V8 and JS execution. For now, assume that as scripts run, they might modify the DOM or CSSOM (for example, calling document.createElement or setting element styles). The browser may have to respond to those changes by recalculating styles or layout as needed (which can incur performance costs if done repeatedly). The initial run of scripts during parsing often includes things like setting up event handlers, or maybe manipulating the DOM (e.g. templating). After that, the page is usually fully parsed and we move into layout and rendering.</p><h2><strong>Styling and Layout</strong></h2><p>At this stage, the browser's renderer process knows the structure of the DOM and each element's computed style. The next question is: where do all these elements go on the screen? How big are they? This is the job of layout (also known as "reflow" or "layout calculation"). In this phase, the browser calculates the geometry of each element - their size and position - according to the CSS rules (flow, box model, flexbox or grid, etc.) and the DOM hierarchy.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SDsn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SDsn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 424w, https://substackcdn.com/image/fetch/$s_!SDsn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 848w, https://substackcdn.com/image/fetch/$s_!SDsn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!SDsn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SDsn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png" width="1456" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2066182,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/173324218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SDsn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 424w, https://substackcdn.com/image/fetch/$s_!SDsn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 848w, https://substackcdn.com/image/fetch/$s_!SDsn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!SDsn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85e725ec-3cf8-4ee9-935a-2d71a87b22bb_2272x1030.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Layout tree construction</strong>: The browser walks the DOM tree and generates a layout tree (sometimes called the render tree or frame tree). The layout tree is similar to the DOM tree in structure, but it omits non-visual elements (e.g. script or meta tags don't produce boxes) and may split some elements into multiple boxes if needed (for example, a single HTML element that is flowed across multiple lines might correspond to multiple layout boxes). Each node in the layout tree holds the computed style for that element and has information like the node's content (text or image) and computed properties that affect layout (like width, height, padding, etc.).</p><p>During layout, the browser computes the exact position (x, y coordinates) and size (width, height) for each element's box. This involves algorithms defined by CSS specifications: for example, in a normal document flow, block-level elements stack top-to-bottom, each taking full width by default, whereas inline elements flow within lines and cause line breaks as needed. Modern layout modes like <a href="https://web.dev/learn/css/flexbox">flexbox</a> or <a href="https://web.dev/learn/css/grid">grid</a> have their own algorithms. The engine has to consider font metrics to break lines (so text layout involves measuring text runs), and it must handle margins, padding, border, etc. There are many edge cases (e.g. margin collapsing rules, floats, absolutely positioned elements that are removed from flow, etc.), making layout a surprisingly complex process. Even a "simple" top-to-bottom layout has to figure out line breaks in text which depend on available width and font sizes. Browser engines have dedicated teams and many years of development to handle layout accurately and efficiently.</p><p>Some details about the layout tree:</p><ul><li><p>Elements with display:none are omitted entirely from the layout tree (they don't produce any box). In contrast, elements that are simply not visible (e.g. visibility:hidden) do get a layout box (taking up space), just not painted later.</p></li><li><p>Pseudo-elements like ::before or ::after that generate content are included in the layout tree (since they do have visual boxes).</p></li><li><p>The layout tree nodes know their geometry. For example, a &lt;p&gt; element's layout node will know its position relative to the viewport and its dimensions, and have children for each line or inline box inside it.</p></li></ul><p><strong>Layout calculation</strong>: Layout is typically a recursive process. Starting from the root (the &lt;html&gt; element), the browser computes the size of the viewport (for &lt;html&gt;/&lt;body&gt;) and then lays out child elements within it, and so on. Many elements' sizes depend on their children or parent (e.g. a container might expand to fit children, or a child might be 50% of its parent's width). The layout algorithm often has to do multiple passes for things like floats or for certain complex interactions, but generally it proceeds in one direction (top-down) with possible backtracking if needed.</p><p>By the end of this stage, each element's position and size on the page is known. We can now conceptually think of the page as a bunch of boxes (with text or images inside). But we still haven't actually drawn anything on the screen yet - that's the next step, painting.</p><p>However, one key concept: layout can be an expensive operation, especially if done repeatedly. If JavaScript later changes the size of an element or adds content, it can force a relayout of some or all of the page. Developers often hear advice about avoiding layout thrashing (like reading layout info in JS right after modifying DOM, which can force synchronous recalculation). The browser tries to optimize by noting what parts of the layout tree are "dirty" and only recomputing those. But worst-case, changes high up in the DOM could require recalculating the entire layout for large pages. This is why costly style/layout operations should be minimized for better performance.</p><p><strong>Style and layout recap</strong>: To summarize, from HTML and CSS the browser builds:</p><ul><li><p>DOM tree - structure and content</p></li><li><p>CSSOM - parsed CSS rules</p></li><li><p>Computed Styles - the result of matching CSS rules to each DOM node</p></li><li><p>Layout tree - DOM tree filtered to visual elements, with geometry for each node</p></li></ul><p>Each stage builds on the last. If any stage changes (e.g. if a script alters the DOM or modifies a CSS property), the subsequent stages may need to update. For example, if you change a CSS class on an element, the browser may recalc style for that element (and children if inheritance changes), then might have to redo layout if that style change affects geometry (say display or size), then would have to repaint. This chain means layout and paint are dependent on up-to-date style, and so on. We'll discuss performance implications of this in the DevTools section (as the browser provides tools to see when these steps occur and how long they take).</p><p>With layout done, we move to the next major phase: painting.</p><h2><strong>Painting, Compositing, and GPU Rendering</strong></h2><p>Painting is the process of taking the structured layout information and actually producing pixels on the screen. In traditional terms, the browser would traverse the layout tree and issue drawing commands for each node ("draw background, draw text, draw image at these coordinates"). Modern browsers still conceptually do this, but they often split the work into multiple stages and leverage the GPU for efficiency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A40I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A40I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 424w, https://substackcdn.com/image/fetch/$s_!A40I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 848w, https://substackcdn.com/image/fetch/$s_!A40I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!A40I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A40I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png" width="1456" height="1016" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1016,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A40I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 424w, https://substackcdn.com/image/fetch/$s_!A40I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 848w, https://substackcdn.com/image/fetch/$s_!A40I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!A40I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F871da49b-23af-4a7c-9671-e9228635348c_1600x1116.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Painting / Rasterization</strong>: On the renderer's main thread, after layout, Chrome generates paint records (or a display list) by walking the layout tree. This is basically a list of drawing operations with their coordinates, much like an artist planning how to paint the scene: e.g. "draw rect at (x,y) with width W and height H and fill color blue, then draw text 'Hello' at (x2,y2) with font XYZ, then draw an image at &#8230;" and so on. This list is in the correct z-index order (so that overlapping elements paint correctly). For example, if an element has a higher z-index, its paint commands will come later (on top of) lower z-index content. The browser must consider stacking contexts, transparency, etc. to get the right ordering.</p><p>In the past, browsers might have simply drawn each element directly to the screen in order. But that approach can be inefficient if parts of the page change (you'd have to repaint everything). Modern browsers instead often record these drawing commands and then use a compositing step to assemble the final image, especially when using GPU acceleration.</p><p><strong>Layering and compositing</strong>: Compositing is an optimization where the page is split into several layers that can be handled independently. For example, a positioned element with a CSS transform or an animation might get its own layer. Layers are like separate "scratch canvases" - the browser can rasterize (draw) each layer separately, and then the compositor can blend them together on the screen, often using the GPU. </p><p>In Chromium's pipeline, after paint records are generated, there's a step to build the layer tree (this corresponds to which elements are on which layer). Some layers are created automatically (e.g. a video element, or a canvas, or elements with certain CSS will be promoted to layers), and developers can hint by using will-change or CSS properties like transform to get a layer. The reason layers are helpful is that movement or opacity changes on a layer can be composited (i.e. just that layer re-rendered or moved) without re-painting the whole page. Too many layers, however, can be memory-heavy and add overhead, so browsers choose carefully.</p><p>After determining layers, Chrome's main thread hands off to the compositor thread. The compositor thread runs in the renderer process but separate from the main thread (so it can keep working even if the main JS thread is busy, which is great for smooth scrolling and animations). The compositor thread's job is to take the layers, rasterize them (convert the drawings into actual pixel bitmaps), and compose them into frames.</p><p><strong>Rasterization with GPU assistance</strong>: Raster work can also be distributed. In Chrome, the compositor thread breaks layers into smaller tiles (think 256x256 or 512x512 pixel chunks, which are often larger when GPU raster is on, almost always). It then dispatches these to several raster worker threads (which may even run across multiple CPU cores) for concurrent rasterization. Each raster worker takes a tile - essentially a list of drawing commands for that region of a layer - and produces a bitmap (pixel data). Importantly, Skia (Chrome's graphics library) can use the CPU or GPU to rasterize; in Chrome's case, these raster threads typically use CPU to render the pixels and then upload them to GPU memory. Firefox's newer WebRender takes a different approach we'll mention later. The rasterized tiles are stored in GPU memory as textures. Once all needed tiles are drawn, the compositor thread has essentially a set of textured layers ready.</p><p>The compositor then assembles a compositor frame - basically a message to the browser process that includes all the quads (tiles of layers) that make up the screen, their positions, etc. This compositor frame is submitted via IPC back to the browser process, where ultimately the browser's GPU process (a separate process in Chrome for accessing GPU) will take these and display them. The browser process's own UI (like the tab bar) is also drawn via compositor frames, and they all get mixed in the final step. The GPU process receives the frames, and uses the GPU (via OpenGL/DirectX/Metal etc.) to composite them - basically drawing each texture at the right place on screen, applying transforms, etc. very fast. The result is the final image you see displayed.</p><p>The advantage of this pipeline is apparent when you scroll or animate. For example, scrolling a page mostly just changes the viewport on a larger page texture. The compositor can just shift the layer positions and ask the GPU to redraw the new portion coming into view, without the main thread having to repaint everything. If an animation is just a transform (say moving an element that is its own layer), the compositor thread can update that element's position each frame and produce new frames without involving the main thread or re-running style and layout. This is why animations that are "compositing-only" (changing transform or opacity, which don't trigger layout) are recommended for better performance - they can run at 60 FPS smoothly even if the main thread is busy. In contrast, animating something like height or background-color might force re-layout or re-paint each frame, which janks if the main thread can't keep up.</p><p>To put it succinctly, Chrome's rendering pipeline is: DOM &#8594; style &#8594; layout &#8594; paint (record display items) &#8594; layerize &#8594; raster (tiles) &#8594; composite (GPU). Firefox's pipeline is conceptually similar up to the display list stage, but with WebRender it skips explicit layer construction and instead sends a display list to the GPU process, which then handles almost all drawing using GPU shaders (more on this in the comparison section). WebKit (Safari) also uses a multi-threaded compositor and GPU rendering via "CALayers" on macOS. All modern engines thus take advantage of GPUs for rendering, especially for compositing and rasterizing graphics-intensive parts, to achieve high frame rates and offload work from the CPU.</p><p>Before moving on, let's discuss the GPU's role in more detail. In Chromium, the GPU process is a separate process whose job is to interface with the graphics hardware. It receives drawing commands (mostly high-level, like "draw these textures at these coords") from all renderer compositors and also the browser UI. It then translates that into actual GPU API calls. By isolating it in a process, a buggy GPU driver that crashes won't take down the whole browser - only the GPU process, which can be restarted. Also, it provides a sandbox boundary (since GPUs process potentially untrusted content like canvas drawing, WebGL, etc. there have been security bugs in drivers - running them out-of-process mitigates risk).</p><p>The result of the compositing is finally sent to the display (the OS window or context the browser is running in). For each animation frame (target 60fps or 16.7ms per frame for smooth results), the compositor aims to produce a frame. If the main thread is busy (say JavaScript took a long time), the compositor might skip frames or can't update, leading to visible jank. Developer tools can show dropped frames in the performance timeline. Techniques like requestAnimationFrame align JS updates to frame boundaries to help with smooth rendering.</p><p>In summary, the browser's rendering engine carefully breaks down the page content and styles into a set of geometry (layout) and drawing instructions, then uses layers and GPU compositing to efficiently turn that into the pixels you see. This complex pipeline is what enables the rich graphics and animations on the web to run at interactive frame rates. Next, we will peek into the JavaScript engine to understand how the browser executes scripts (which we've so far treated as a black box).</p><h2><strong>Inside the JavaScript Engine (V8)</strong></h2><p>JavaScript drives the interactive behavior of web pages. In Chromium browsers, the V8 engine executes JavaScript (and WebAssembly). Understanding V8's workings can help developers write performant JS. While an exhaustive deep-dive would be book-length, we'll focus on the key stages of the JS execution pipeline: parsing/compiling the code, executing it, and managing memory (garbage collection). We'll also note how V8 handles modern features like Just-In-Time (JIT) compilation tiers and ES modules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x62F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x62F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 424w, https://substackcdn.com/image/fetch/$s_!x62F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 848w, https://substackcdn.com/image/fetch/$s_!x62F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!x62F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x62F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png" width="1456" height="1016" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1016,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x62F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 424w, https://substackcdn.com/image/fetch/$s_!x62F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 848w, https://substackcdn.com/image/fetch/$s_!x62F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!x62F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37778d69-97f7-432e-8028-39675d784a6f_1600x1116.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Modern V8 Parsing and Compilation Pipeline</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YJWi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YJWi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YJWi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YJWi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YJWi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YJWi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg" width="1456" height="504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88802,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/173324218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YJWi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YJWi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YJWi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YJWi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8306080-5174-480f-8e35-fb316aa97806_2400x830.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Background compilation</strong>: Starting with Chrome 66, V8 compiles JavaScript source code on a background thread, reducing the amount of time spent compiling on the main thread by between 5% to 20% on typical websites. Since version 41, Chrome has supported parsing of JavaScript source files on a background thread via V8's StreamedSource API. V8 can start parsing JavaScript source code as soon as the first chunk is downloaded from the network and continue parsing in parallel while streaming the file. Almost all script compilation occurs on background threads, with only short AST internalization and bytecode finalization steps happening on the main thread just before script execution. Currently, top-level script code and immediately invoked function expressions are compiled on background threads, while inner functions are still compiled lazily on the main thread when first executed.</p><p><strong>Parsing and bytecode</strong>: When a &lt;script&gt; is encountered (either during HTML parse or loaded later), V8 first parses the JavaScript source code. This produces an Abstract Syntax Tree (AST) representation of the code. The preparser is a copy of the parser that does the bare minimum needed to skip over functions. It verifies that functions are syntactically valid and produces all information needed for outer functions to be compiled correctly. When a preparsed function is later called, it is fully parsed and compiled on-demand.</p><p>Rather than interpreting directly from the AST, V8 uses a bytecode interpreter called Ignition (introduced in 2016). Ignition compiles the JavaScript into a compact bytecode format, which is essentially a sequence of instructions for a virtual machine. This initial compilation is quite fast and the bytecode is fairly low-level (Ignition is a register-based VM). The goal is to start executing the code quickly with minimal upfront cost (important for page load times).</p><p><strong>AST internalization process</strong>: AST internalization involves allocating literal objects (strings, numbers, object-literal boilerplate) on the V8 heap for use by generated bytecode. To enable background compilation, this process was moved later in the compilation pipeline, after bytecode compilation, requiring modifications to access raw literal values embedded in the AST instead of internalized on-heap values.</p><p><strong>Explicit Compile Hints</strong>: V8 has introduced a new feature called "<a href="https://v8.dev/blog/explicit-compile-hints">Explicit Compile Hints</a>" which allows developers to instruct V8 to parse and compile code immediately on load through eager compilation. Files with this hint are compiled on background threads, whereas deferred compilation happens on the main thread. Experiments with popular web pages showed performance improvements in 17 out of 20 cases, with an average 630ms reduction in foreground parse and compile times. Developers can add explicit compile hints to JavaScript files using special comments to enable eager compilation on background threads for critical code paths.</p><p><strong>Scanner and parser optimizations</strong>: V8's scanner has been significantly optimized, resulting in improvements across the board: single token scanning improved by roughly 1.4&#215;, string scanning by 1.3&#215;, multiline comment scanning by 2.1&#215;, and identifier scanning by 1.2-1.5&#215; depending on identifier length.</p><p>When the script runs, Ignition interprets the bytecode, executing the program. Interpretation is generally slower than optimized machine code, but it allows the engine to start running and also collect profiling information about the code's behavior. As the code runs, V8 gathers data on how it's being used: types of variables, which functions are called frequently, etc. This information will be used to make the code run faster in subsequent steps.</p><h3><strong>JIT Compilation Tiers</strong></h3><p>V8 doesn't stop at interpretation. It employs multiple tiers of Just-In-Time compilers to accelerate hot code. The idea is to spend more compilation effort on code that runs a lot, to make it faster, while not wasting time optimizing code that runs only once.</p><ol><li><p><strong>Ignition</strong> (interpreting the bytecode).</p></li><li><p><strong>Sparkplug</strong>: V8's baseline JIT called Sparkplug (launched around 2021). Sparkplug takes the bytecode and compiles it to machine code quickly, without heavy optimizations. This yields native code that is faster than interpretation but Sparkplug doesn't do deep analysis - it's meant to be almost as quick as the interpreter to start, but produce code that runs a bit faster.</p></li><li><p><strong>Maglev</strong>: In 2023, V8 introduced Maglev, a mid-tier optimizing compiler that is now actively deployed. Maglev generates code nearly 20 times slower than Sparkplug but 10 to 100 times faster than TurboFan, effectively bridging the gap for functions that are moderately hot but not hot enough for TurboFan optimization. Maglev comes into play for functions that are somewhat hot but not hot enough for TurboFan, or when TurboFan's compilation would be too costly. As of Chrome M117, Maglev can handle many cases, resulting in faster startup for web apps that spend time in "warm" code (not cold, not super hot) by bridging the gap between baseline and highest-tier JIT.</p></li><li><p><strong>TurboFan</strong>: As functions or loops get executed many times, V8 will engage its most powerful optimizing compiler. TurboFan takes the code and uses the collected type feedback to generate highly optimized machine code, applying advanced optimizations (inlining functions, eliding bounds checks, etc.). This optimized code can run much faster if the assumptions hold.</p></li></ol><p>So V8 now effectively has four execution tiers: Ignition interpreter, Sparkplug baseline JIT, Maglev optimizing JIT, and TurboFan optimizing JIT. This is analogous to how Java's HotSpot VM has multiple JIT levels (C1 and C2). The engine can dynamically decide which functions to optimize and when, based on execution profiles. If a function suddenly is called a million times, it will likely end up TurboFan-optimized for maximum speed.</p><p>Intel has also developed <a href="https://community.intel.com/t5/Blogs/Tech-Innovation/Client/Profile-Guided-Tiering-in-the-V8-JavaScript-Engine/post/1679340">Profile-Guided Tiering</a> that enhances V8's efficiency, leading to approximately 5% improvement on Speedometer 3 benchmarks. Recent V8 updates include static roots optimization, which allows accurate prediction of memory addresses at compile time for commonly used objects, significantly improving access speed.</p><p>One challenge with JIT optimization is that JavaScript is dynamically typed. V8 might optimize code under certain assumptions (e.g. this variable is always an integer). If a later call violates those assumptions (say the variable becomes a string), the optimized code is invalid. V8 then performs a deoptimization: it falls back to a less optimized version (or re-generates code with new assumptions). This mechanism relies on "inline caches" and type feedback to quickly adapt. The existence of deopt means sometimes peak performance isn't sustained if your code has unpredictable types, but generally V8 tries to handle typical patterns (like a function consistently being passed the same type of object).</p><h3><strong>Bytecode Flushing and Memory Management</strong></h3><p>V8 implements bytecode flushing where if a function remains unused after multiple garbage collections, its bytecode will be reclaimed. When executed again, the parser uses previously stored results to regenerate the bytecode more quickly. This mechanism is crucial for memory management but can lead to parsing inconsistencies in edge cases.</p><p><strong>Memory Management (Garbage Collection)</strong>: V8 automatically manages memory for JS objects using a garbage collector. Over the years, V8's GC has evolved into what's known as the Orinoco GC, which is a generational, incremental, and concurrent garbage collector. Key points:</p><ul><li><p><strong>Generational</strong>: V8 segregates objects by age. New objects are allocated in the young generation (or "nursery"). These are collected frequently with a very fast scavenging algorithm (copying live objects to a new space and reclaiming the rest). Objects that survive enough cycles get promoted to the old generation.</p></li><li><p><strong>Mark-and-sweep/compact</strong>: For the old generation, V8 uses a mark-and-sweep collector with compaction. This means it will occasionally stop the world (stop JS execution briefly), mark all reachable objects (tracing from roots like the global object), then sweep to reclaim memory from unreferenced objects. It may also compact memory (moving objects to reduce fragmentation). However, Orinoco has made much of the marking concurrent - it can do a lot of the marking on a background thread while JS is still running, to minimize pause times.</p></li><li><p><strong>Incremental GC</strong>: V8 performs garbage collection in small slices rather than one big pause when possible. This incremental approach spreads work out to avoid jank. For example, it can interleave a bit of marking work between script executions, using idle time.</p></li><li><p><strong>Parallel GC</strong>: On multi-core machines, V8 can perform parts of GC (like marking or sweeping) in parallel threads as well.</p></li></ul><p>The net effect is that the V8 team has managed to drastically reduce GC pause times over the years, making garbage collection mostly unnoticeable even in large applications. Minor GCs (new object scavenge) usually happen very fast. Major GCs (old gen) are rarer and mostly concurrent now. If you open Chrome's Task Manager or DevTools Memory panel, you might see V8's heap broken into "Young space" and "Old space" reflecting this generational design.</p><p>For developers, this means manual memory management isn't needed, but you should still be mindful: e.g. avoid creating tons of short-lived objects in tight loops (though V8 is quite good at handling short-lived objects) and be aware that holding onto large data structures will keep them around in memory. Tools like DevTools can force a garbage collection or record memory profiles to see what is using memory.</p><p><strong>V8 and Web APIs</strong>: It's worth mentioning that V8 covers the core JavaScript language and runtime (execution, standard JS objects, etc.), but many "browser APIs" (like DOM methods, alert(), network XHR/fetch, etc.) are not part of V8 itself. Those are provided by the browser and are exposed to JS via bindings. For instance, when you call document.querySelector, under the hood it enters the engine's binding to the C++ DOM implementation. V8 handles calling into C++ and getting results back, and there is a lot of machinery to make this boundary fast (Chrome uses an IDL to generate efficient bindings).</p><p>Having covered how the browser fetches resources, parses HTML/CSS, computes layout, paints with the GPU, and runs JS, we now have a picture of the entire process of loading and rendering a page. But there's more to explore: how ES modules are handled (since modules involve their own loading mechanism), how the browser's multi-process architecture is organized, and how security features like sandboxing and site isolation work.</p><h2><strong>Module Loading and Import Maps</strong></h2><p><a href="https://v8.dev/features/modules">JavaScript modules</a> (ES6 modules) introduce a different loading and execution model compared to classic &lt;script&gt; tags. Instead of a big script file that might create globals, modules are files that explicitly import/export values. Let's see how browsers (and specifically V8 in Chrome) load modules and how features like dynamic import() and import maps come into play.</p><p><strong>Static module imports</strong>: When the browser encounters a &lt;script type="module" src="main.js"&gt;, it treats main.js as a module entry point. The loading process works as follows: the browser will fetch main.js, then parse it as an ES module. During parsing, it will find any import statements (e.g. import { foo } from './utils.js';). Rather than executing code immediately, the browser constructs a module dependency graph. It will initiate fetching of any imported modules (utils.js in this case), and recursively, each of those modules is parsed for their imports, fetched, and so on. This happens asynchronously. Only once the entire graph of modules is fetched and parsed can the browser evaluate the modules. Module scripts are deferred by nature - the browser doesn't execute the module code until all dependencies are ready. Then it executes them in dependency order (ensuring that if module A imports B, B runs first).</p><p>This static import process is why ES modules can't be loaded from file:// in some cases unless allowed, and why they require CORS by default for cross-origin scripts - the browser is actively linking and loading multiple files, not just dropping a &lt;script&gt; into the page.</p><p><strong>Dynamic import()</strong>: In addition to static import statements, ES2020 introduced import(moduleSpecifier) as an expression. This allows code to load a module on the fly (returning a promise that resolves to the module exports). For example, you might do const module = await import('./analytics.js') in response to a user action, thereby code-splitting your application. Under the hood, import() triggers the browser to fetch the requested module (and its dependencies, if not already loaded), then instantiate and execute it, and resolve the promise with the module namespace object. V8 and the browser coordinate here: the browser's module loader handles fetching and parsing, V8 handles the compilation and execution once ready. Dynamic import is powerful because it can be used in non-module scripts too (e.g. an inline script can dynamically import a module). It essentially gives the developer control to load JS on demand. The difference from a static import is that static imports are resolved ahead of time (before any module code runs, the entire graph is loaded), whereas dynamic import behaves more like loading a new script at runtime (except with module semantics and promises).</p><p><strong>Import maps</strong>: One challenge with ES modules in the browser was the module specifiers. In Node or bundlers, you often import by package name (e.g. import { compile } from 'react'). On the web, without a bundler, 'react' is not a valid URL - the browser would treat it as a relative path (which would fail). This is where Import Maps come in. An import map is a JSON configuration that tells the browser how to resolve module specifiers to real URLs. It's provided via a &lt;script type="importmap"&gt; tag in HTML. For example, an import map might say that the specifier "react" maps to "https://cdn.example.com/react@19.0.0/index.js" (some full URL to the actual script). Then, when any module does import 'react', the browser uses the map to find the URL and loads that. Essentially, import maps allow "bare" specifiers (like package names) to work on the web by mapping them to CDN URLs or local paths.</p><p>Import maps have been a game-changer for unbundled development. Since 2023, import maps are supported in all major browsers (Chrome 89+, Firefox 108+, Safari 16.4+ - all three engines). They are especially useful for local development or simple apps where you want to use modules without a build step. For production, large apps often still bundle for performance (to reduce the number of requests), but as browsers and HTTP/2/3 improve, serving many small modules becomes more viable.</p><p>The module loader in the browser thus consists of: a module map (tracking what's been loaded), possibly an import map (for custom resolution), and the fetching/parsing logic. Once fetched and compiled, module code executes in strict mode and with its own top-level scope (no leaking to window unless explicitly attached). The exports are cached so if another module imports the same module later, it doesn't re-run it (it reuses the already evaluated module record).</p><p>One more aspect to mention is that ES modules, unlike scripts, defer execution and also execute in order for a given graph. If main.js imports util.js and util.js imports dep.js, the evaluation order will be: dep.js first, then util.js, then main.js (depth-first, post-order). This deterministic order can avoid the need for things like DOMContentLoaded in some cases, since by the time your main module runs, all its imports are loaded and executed.</p><p>From V8's perspective, modules are handled by the same compilation pipeline, but they create separate ModuleRecords. The engine ensures that a module's top-level code only runs once all dependencies are ready. V8 also has to deal with cyclic module imports (which are allowed and can lead to partially initialized exports). The details are per spec - but essentially, the engine will create all module instances, then resolve cycles by giving them placeholders, and then execute in an order that respects dependencies (the spec algorithm is the "DAG" topological sort of the module graph).</p><p>In summary, module loading in browsers is a coordinated dance between the network (fetching module files), the module resolver (using import maps or standard URL resolution), and the JS engine (compiling and evaluating modules in the correct order). It's more involved than old &lt;script&gt; loading, but results in a more modular and maintainable code structure. For developers, the key takeaways are: use modules to organize code, use import maps if you want bare imports, and know that you can dynamically load modules when needed via import(). The browser will handle the heavy lifting of making sure everything executes in the right sequence.</p><p>Now that we've covered how a single page's internals work, let's zoom out and examine the browser architecture that allows multiple pages, tabs, and web apps to all run simultaneously without interfering with each other. This brings us to the multi-process model.</p><h2><strong>Browser Multi-Process Architecture</strong></h2><p>Modern browsers (Chrome, Firefox, Safari, Edge, etc.) all use a multi-process architecture for stability, security, and performance isolation. Instead of running the entire browser as one giant process (which was how early browsers worked), different aspects of the browser run in different processes. Chrome was a pioneer of this approach in 2008, and others followed suit in various forms. Let's focus on Chromium's architecture and note differences in Firefox and Safari.</p><p>In Chromium (Chrome, Edge, Brave, etc.), there is one <strong>Browser Process</strong> that is central. This browser process is responsible for the UI (the address bar, bookmarks, menus - all the browser chrome) and for coordinating high-level tasks like resource loading and navigation. When you open Chrome and see one entry in your OS task manager, that's the browser process. It's also the parent that spawns other processes.</p><p>Then, for each tab (and sometimes for each site in a tab), Chrome creates a <strong>Renderer Process</strong>. A renderer process runs the Blink rendering engine and V8 JS engine for the content of that tab. In general, each tab gets at least one renderer process. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-21l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-21l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-21l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-21l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-21l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-21l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94200,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/173324218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-21l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-21l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-21l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-21l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb7fe0-b266-4767-bd4d-de835fac5126_1536x1024.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you have multiple unrelated sites open, they'll be in separate processes (Site A in one, Site B in another, etc.). Chrome even isolates cross-origin iframes into separate processes (more on that in site isolation). The renderer process is sandboxed and cannot directly access your file system or network arbitrarily - it has to go through the browser process for those privileged operations.</p><p>Other key processes in Chrome include:</p><ul><li><p><strong>GPU Process</strong>: a process dedicated to communicating with the GPU (as described earlier). All rendering and compositing requests from renderers go to the GPU process, which actually issues graphics API calls. This process is sandboxed and separate so that a GPU crash doesn't take down renderers.</p></li><li><p><strong>Network Process</strong>: (In older Chrome versions the network was a thread in browser process, but now it's often a separate process through "servicification"). This process handles network requests, DNS, etc. and can be sandboxed separately.</p></li><li><p><strong>Utility Processes</strong>: these are for various services (like audio playback, image decoding, etc.) that Chrome may offload.</p></li><li><p><strong>Plugin Process</strong>: in the era of Flash and NPAPI plugins, plugins ran in their own process. Flash is deprecated now, so this is less relevant, but the architecture remains ready for plugins to not run in the main browser process.</p></li><li><p><strong>Extension Processes</strong>: Chrome extensions (which are essentially scripts that can act on web pages or the browser) run in separate processes as well, isolated from websites for security.</p></li></ul><p>A simplified view is: one Browser process coordinates multiple Renderer processes (one per tab or per site instance), plus one GPU process and a few others for services. Chrome's task manager (Shift+Esc on Windows or via More Tools &gt; Task Manager) will actually list each process type and its memory usage.</p><p><strong>Benefits of Multi-Process</strong>: The primary benefits are:</p><ul><li><p><strong>Stability</strong>: If a web page (renderer process) crashes or leaks memory, it doesn't crash the whole browser - you can close that tab and the rest stays alive. In one-process browsers, a single bad script could tear down everything. Chrome can show the "Aw, Snap" error for a single tab when its process dies, and you can reload it independently.</p></li><li><p><strong>Security (Sandboxing)</strong>: By running web content in a restricted process, the browser can limit what that code can do on your system. Even if an attacker finds a vulnerability in the rendering engine, they are trapped in the sandbox - the renderer process typically cannot read your files or arbitrarily open network connections or launch programs. It must request the browser process for things like file access, which can be validated or denied. This sandbox is enforced at the OS level (using job objects, seccomp filters, etc. depending on platform).</p></li><li><p><strong>Performance Isolation</strong>: Intensive work in one tab (a heavy webapp or an infinite loop) is mostly confined to that tab's renderer process. Other tabs (different process) can remain responsive because their processes aren't blocked. Also, the OS can schedule processes on different CPU cores - so two heavy pages can run in parallel on a multi-core system better than if they were threads of one process.</p></li><li><p><strong>Memory segmentation</strong>: Each process has its own address space, so memory is not shared. This prevents one site from snooping on data of another and also means when a tab is closed, the OS can reclaim all memory from that process efficiently. The downside is some overhead due to duplicated resources and processes (each renderer loads its own copy of the JS engine, etc.).</p></li></ul><p><strong>Site Isolation</strong>: Initially, Chrome's model was one process per tab. Over time they evolved it to one process per site (especially after Spectre - see next section on security). As of 2024, site isolation is enabled by default for 99% of Chrome users across desktop platforms, with Android support continuing to be refined. This means if you have two tabs both open to example.com, Chrome might decide to use one process for both (to save memory, because they're the same site and thus less risky to put together). But a tab with example.com and an iframe of evil.com would by default put evil.com's iframe in a separate process from the parent page (to protect the example.com data). This enforcement is what Chrome calls "Strict Site Isolation" (launched around Chrome 67 as a default). Site isolation causes Chrome to use 10-13% more system resources due to increased process creation, but provides crucial security benefits.</p><p>Firefox's architecture, called <a href="https://blog.mozilla.org/addons/2016/04/11/the-why-of-electrolysis/">Electrolysis</a> (e10s), was historically one content process for all tabs (for many years Firefox was single-process and only enabled a few content processes around 2017). As of 2021, Firefox uses multiple content processes (by default 8 for web content). With <a href="https://blog.mozilla.org/security/2021/05/18/introducing-site-isolation-in-firefox/">Project Fission</a> (site isolation), Firefox is moving toward isolating sites similarly - it can spin up new processes for cross-site iframes, and in Firefox 108+ they enabled site isolation by default, increasing the number of processes to potentially one per site like Chrome. Firefox also has a GPU process (for WebRender and compositing) and a separate networking process, similar to Chrome's split. So in practice, Firefox now has a very Chrome-like model: a parent process, a GPU process, a network process, a few content (renderer) processes, and some utility processes (for extensions, media decoding, etc. - e.g. a media plugin can run isolated).</p><p>Safari (WebKit) likewise moved to a multi-process model (WebKit2) where each tab's content is in a separate WebContent process and a central UI process controls them. Safari's WebContent processes are also sandboxed and cannot directly access devices or files without going through the UI process. Safari also has a networking process that is shared (and possibly other helpers). So while implementations differ, the concept is consistent: isolate each webpage's code in its own sandboxed environment.</p><p>One important point is inter-process communication (IPC): How do these processes talk to each other? Browsers use IPC mechanisms (on Windows, often named pipes or other OS IPC; on Linux, maybe Unix domain sockets or shared memory; Chrome has its own IPC library Mojo). For example, when a network response arrives in the Network process, it needs to be delivered to the correct Renderer process (via the Browser process coordinating). Similarly, when you do a DOM fetch(), the JS engine will call into a network API which sends a request to the Network process and so on. IPC adds complexity, but browsers optimize heavily (e.g. using shared memory for transferring large data like images efficiently, and posting asynchronous messages to avoid blocking).</p><p><strong>Process Allocation Strategies</strong>: Chrome doesn't always create a brand new process for every single tab - there are limits (particularly on devices with low memory, it may reuse processes for same-site tabs). Chrome will reuse an existing renderer if you open another tab to the same site, to conserve memory (this is why sometimes two tabs of the same site share process). It also has a limit on total processes (which can scale based on RAM). When the limit is hit, it might start putting multiple unrelated sites in one process, though it tries hard to avoid mixing sites if site isolation is enabled. On Android, Chrome uses fewer processes because of the memory constraints (often a max of 5-6 processes for content).</p><p>One more concept in Chromium is <strong>servicification</strong>: splitting browser components into services that could run in separate processes. For example, the Network Service was made a separate module that can run out-of-process. The idea is modularity - powerful systems can run each service in its own process, whereas constrained devices might consolidate some services back into one process to save overhead. Chrome can decide at runtime or build time how to deploy these services. As noted in the snippet, on high-end it might split everything (UI, net, GPU, etc. all separate), and on low-end (Android) it might combine browser &amp; network in one process to cut down overhead.</p><p>The takeaway: Chromium's architecture is designed to run the browser UI and each site in different sandboxes, using processes as the isolation boundary. Firefox and Safari have converged on similar designs. This architecture greatly improves security and reliability at the cost of more memory usage. The web content processes are treated as untrusted, and that's where site isolation (next section) comes into play to even isolate different origins from each other within separate processes.</p><h2><strong>Site Isolation and Sandboxing</strong></h2><p>Site isolation and sandboxing are security features that build on the multi-process foundation. They aim to ensure that even if malicious code runs in the browser, it cannot easily steal data from other sites or access your system.</p><p><strong>Site Isolation</strong>: We've touched on this - it means that different websites (different sites, more strictly) run in different renderer processes. Chrome's site isolation was boosted after the <a href="https://developer.chrome.com/blog/meltdown-spectre">Spectre vulnerability</a> came to light in 2018. Spectre showed that malicious JavaScript could potentially read memory it shouldn't (by exploiting CPU speculative execution). If two sites were in the same process, a malicious site could use Spectre to snoop on memory of the sensitive site (like your banking site). The only robust solution is to not let them share a process at all. So Chrome made site isolation a default: every site gets its own process, including cross-origin iframes. Firefox has followed with Project Fission (enabled by default in recent versions), which aims for the same - they cite isolating every site in its own process for security. This is a significant change from the past where if you had a parent page and multiple iframes from various domains, they might all live in one process (especially if they were in one tab). Now, those iframes would be split so that e.g. an &lt;iframe src="https://evil.com"&gt; on a good site page is forced into a different process, preventing even low-level attacks from leaking info between them.</p><p>From a developer point of view, site isolation is mostly transparent. One implication is that communications between an embedded iframe and its parent might cross process boundaries now, so things like postMessage between them are implemented via IPC under the hood. But the browser makes this seamless; you as a dev just use the APIs as normal.</p><p><strong>Sandboxing</strong>: Each renderer process (and other auxiliary processes) run in a sandbox with restricted privileges. For example, on Windows, Chrome uses a job object and drops privileges so the renderer can't call most Win32 APIs that access the system. On Linux, it uses namespaces and seccomp filters to limit syscalls. The renderer basically can compute and render content but if it tries to open a file or camera or microphone, it will be blocked (unless going through proper channels that ask user permission via the browser process). WebKit's documentation explicitly notes that WebContent processes have no direct access to filesystem, clipboard, devices, etc. - they must request via the UI process which mediates. This is why, for example, when a site tries to use your microphone, the permission prompt is shown by the browser UI (browser process) and if allowed, the actual recording is done in a controlled process. The sandbox is a crucial line of defense. Even if an attacker finds a bug to run native code in the renderer, they then face the sandbox barrier - they'd need a separate exploit (an "escape") to break out to the system. This layered approach (called site isolation + sandbox) is state-of-the-art for browser security.</p><p>Firefox's sandboxing is also quite strict now (it was weaker in early e10s days but they ramped it up). Firefox content processes can't directly access much either; and Firefox also sandboxes the GPU process to handle graphics driver issues.</p><p><strong>Out-of-Process iframes (OOPIF)</strong>: In Chrome's implementation of site isolation, they invented the term <a href="https://www.chromium.org/developers/design-documents/oop-iframes/">OOPIF</a> for out-of-process iframe. From a user's perspective, nothing changes, but in Chrome's internal architecture, each frame of a page can potentially be backed by a different renderer process. The top-level frame and same-site frames share one process; cross-site frames use different processes. All those processes "cooperate" to render a single tab's content, coordinated by the browser process. This is pretty complex, but Chrome has a frame tree that can span processes. It means your one tab might be running N processes (one for the main document, others for each cross-site subdocument). They communicate via IPC for things like DOM events crossing the boundary or certain JavaScript calls that involve cross-context. The web platform (through specs like <a href="https://web.dev/articles/coop-coep">COOP/COEP</a>, <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer">SharedArrayBuffer,</a> etc.) is evolving with these constraints in mind after Spectre.</p><p><strong>Memory and Performance Costs</strong>: Site isolation does increase memory usage because more processes are used. Chrome devs noted it could be a 10-20% memory <a href="https://www.thurrott.com/mobile/chrome-os/162980/spectre-mitigation-increases-chrome-memory-usage-google-says">overhead</a> in some cases. They mitigated some by something called "best-effort process consolidation" for same-site, and by limiting how many processes can be spawned (we mentioned earlier). Firefox initially didn't isolate every site due to memory concerns but after Spectre they found ways to do it more efficiently with 8-privileged-process limit and on-demand process creation. Safari historically has a strong process model but I'm not sure if it isolates cross-site iframes; WebKit2 certainly isolates top-level pages. Apple's focus is often also on privacy (Intelligent Tracking Prevention will partition cookies, etc.), but that's a different layer.</p><p>Cross-site prefetches are limited for privacy reasons and will currently only work if the user has no cookies set for the destination site, preventing sites from tracking user activity via prefetched pages that may never be visited.</p><p>All in all, site isolation ensures that the principle of least privilege is applied: code from origin A cannot access data from origin B unless via web APIs with explicit consent (like postMessage or storage that's partitioned). And the sandbox ensures that even if code is rogue, it can't touch your system directly. These measures make browser exploits much harder - an attacker typically needs multiple chain exploits now (one to break renderer, one to escape sandbox) to do serious damage, which raises the bar significantly.</p><p>As a web developer, you might not directly feel site isolation, but you benefit from it through a safer web. One thing to be aware of is that cross-origin interactions might have slightly more overhead (because of IPC) and that some optimizations like in-process script sharing aren't possible across origins. But browsers are continuously optimizing the messaging between processes to minimize any performance hit.</p><p>Now, after covering security, let's turn to tools and performance instrumentation - essentially, how we developers can peek into this pipeline and measure or debug it.</p><h2><strong>Comparing Chromium, Gecko, and WebKit</strong></h2><p>We've mainly described Chrome/Chromium's behavior (Blink engine for HTML/CSS, V8 for JS, multi-process via Aura/Chromium infrastructure). Other major engines - Mozilla's Gecko (used in Firefox) and Apple's WebKit (used in Safari) - share the same fundamental goals and a broadly similar pipeline, but there are noteworthy differences and historical divergences.</p><p><strong>Shared Concepts</strong>: All engines parse HTML into a DOM, parse CSS into style data, compute layout, and paint/composite. All have JS engines with JITs and garbage collection. And all modern ones are multi-process (or at least multi-threaded) for parallelism and security.</p><h3><strong>Differences in CSS/Style System</strong></h3><p>One interesting difference is how CSS style computation is implemented by rendering engine:</p><ul><li><p><strong>Blink (Chromium)</strong>: Uses a single-threaded style engine in C++ (historically based on WebKit's). It computes style sequentially for the DOM tree. It has had incremental style invalidation optimizations, but by and large it's one thread doing the work (apart from some minor parallelization in animation).</p></li><li><p><strong>Gecko (Firefox)</strong>: In the Quantum project (2017), Firefox integrated Stylo, a new CSS engine written in Rust, which is multi-threaded. Firefox can calculate style for different DOM subtrees in parallel using all CPU cores. This was a major performance improvement for CSS in Gecko. So, style recalculation in Firefox might use 4 cores to do what Blink does on 1. This is one advantage of Gecko's approach (at the cost of complexity).</p></li><li><p><strong>WebKit (Safari)</strong>: WebKit's style engine is single-threaded like Blink (since Blink forked from WebKit in 2013, they shared architecture up to that point). WebKit has done interesting things like a bytecode JIT for CSS selectors matching. It may transform CSS selectors into bytecode and JIT compile a matcher for speed. Blink did not adopt that (it uses iterative matching).</p></li></ul><p>So, in CSS, Gecko stands out with parallel style computation via Rust. Blink and WebKit rely on optimized C++ and maybe some JIT tricks (in WebKit's case).</p><h3><strong>Layout and Graphics</strong></h3><p>All three engines implement the CSS box model and layout algorithms. Specific features might land in one before others (e.g. at one time WebKit was ahead in CSS Grid support, then Blink caught up - often they share code through standards bodies).</p><p>Firefox (Gecko) made a huge change by introducing <strong>WebRender</strong> as its compositor/rasterizer. WebRender is now the default rendering engine in Firefox and has contributed to significant performance improvements, particularly for graphics-intensive web content. WebRender (also Rust) basically takes the display list and renders it on the GPU directly, handling things like tessellating shapes, text, etc. with the GPU. It's like moving more painting work to the GPU. In Chrome's pipeline, rasterization is still done on CPU (for most content) then sent to GPU as bitmaps. WebRender tries to avoid making bitmaps for whole layers and instead draw vectors on GPU (except for text glyphs which it caches as atlas textures). This means Firefox can potentially animate more content at high performance because it doesn't need to re-rasterize everything if only small portions change - it can redraw via GPU very quickly. It's akin to how a game engine redraws a scene every frame using GPU calls. The downside is it's complex to implement and tune, and can stress the GPU more. But as GPU power grows, this approach is forward-looking. Chrome's team considered a similar approach ("SKIA GPU" path) but has not done a full WebRender style overhaul.</p><p>Safari (WebKit) uses an approach more similar to older Chrome: it has a converts the compositor with layers (called CALayer, since on Mac and iOS it uses Core Animation layers). Safari was early to move to GPU compositing (iPhone OS and Safari 4 in 2009 had hardware-accelerated compositing for certain CSS like transforms). Safari and Chrome diverged but conceptually both do tiling and compositing. Safari also offloads a lot to the GPU (and uses tiling, especially on iOS where tile drawing was fundamental for smooth scrolling).</p><p><strong>Mobile optimizations</strong>: Each engine has special cases for mobile. For example, WebKit has the concept of tile coverage for scrolling (used in iOS's UIWebView historically). Chrome on Android uses "tiling" and tries to keep raster tasks minimal to hit frame rates. Firefox's WebRender came from the mobile-first Servo project.</p><h3><strong>JavaScript Engines</strong></h3><ul><li><p><strong>V8 (Chromium)</strong> we described: Ignition, Sparkplug, TurboFan, Maglev as of 2023.</p></li><li><p><strong>SpiderMonkey (Firefox)</strong>: It historically had an interpreter, then a Baseline JIT and an optimizing JIT (IonMonkey). Recent work (Warp) changed how JIT tiers work, potentially simplifying Ion and making it more like TurboFan's approach to use cached bytecode and type info. SpiderMonkey also has a different GC (also generational, called Incremental GC since 2012, and now mostly incremental/concurrent).</p></li><li><p><strong>JavaScriptCore (Safari)</strong>: As noted, it has 4 tiers (LLInt, Baseline, DFG, FTL). It uses a different GC (WebKit's GC is a generational mark-sweep called Butterfly or Boehm variations historically, now bmalloc etc.). JSC's FTL uses LLVM to optimize, which is unique (V8 and SM have their own compilers, JSC leverages LLVM for one tier). This can yield very fast code, but the compilation is heavy. JSC tends to prioritize peak performance on certain benchmarks (it often shines on some, but V8 tends to catch up; they leapfrog).</p></li></ul><p>In terms of ES features, all three engines are pretty much up-to-date with the latest standards, thanks to test262 and each other's competition.</p><h3><strong>Multi-Process Model Differences</strong></h3><ul><li><p><strong>Chrome</strong>: each tab typically separate, site isolation at origin level, lots of processes (can be dozens).</p></li><li><p><strong>Firefox</strong>: fewer processes by default (8 content processes handling all tabs, plus more if needed for cross-site iframes with Fission). So, it's not necessarily one process per tab; tabs share content processes in a pool. This means Firefox might have lower memory usage under many-tab scenarios, but it also means one content process crash can take out multiple tabs (though it tries to group by site, so maybe all Facebook tabs in one process, etc.).</p></li><li><p><strong>Safari</strong>: likely one process per tab (or per a few tabs) - on iOS, WKWebView definitely isolates each webview. Safari desktop historically did each tab separate as well. Not sure if they isolate cross-origin iframes yet - Apple hasn't talked about Spectre mitigations much, but Safari does have process per domain for top-level at least.</p></li></ul><p><strong>Interprocess Coordination</strong>: All engines have to solve similar problems like how to implement alert() (which blocks JS) in a multi-process environment - typically the browser process shows the alert UI and pauses that script context. Or how to handle prompt/confirm, how to do Modal dialogs, etc. There are subtle differences (e.g. Chrome doesn't truly block the thread for alert - it spins a nested runloop in the renderer, etc. whereas Firefox might still freeze that tab's process).</p><p><strong>Crash handling</strong>: Chrome and Firefox both have crash reporters that can restart a crashed content process and show an error in the tab. Safari's Web Content process crash typically will display a simpler error message in the content area.</p><h3><strong>Feature Implementation Divergence</strong></h3><p>Some web platform features are engine-specific: e.g. Chrome has an experimental document.transition API for seamless DOM transitions, which relies on Blink's architecture. Firefox might implement something differently or later. But eventually, standards converge features.</p><p><strong>Developer tools</strong>: Chrome's DevTools is very advanced. Firefox's DevTools also very good (with some unique features like CSS Grid highlighters early on, shape editor). Safari's Web Inspector is fine but not as full-featured in some areas. These differences can matter to devs debugging in each browser.</p><h3><strong>Performance Trade-offs</strong></h3><p>Historically, Chrome was lauded for faster JS and overall performance due to multi-process and V8. Firefox with Quantum closed a lot of gaps, sometimes surpassing Chrome in graphics (WebRender can be very fast for complex pages). Safari often excels in graphics and low power usage on Apple hardware (they optimize for power a lot).</p><p><strong>Memory</strong>: Chrome has a reputation for high memory usage (all those processes). Firefox tries to be a bit more conservative. Safari is very memory efficient on iOS out of necessity (limited RAM), and they do a lot of memory optimization in WebKit.</p><p><strong>External Contributors</strong>: Interesting note - a lot of improvements in these engines come from external teams like Igalia (e.g. implementing CSS Grid in both WebKit and Blink). So sometimes features land roughly simultaneously.</p><p>From a web developer's perspective, the differences often manifest as:</p><ul><li><p>Needing to test on all engines because there might be slight differences or bugs in one's implementation of a CSS feature or an API.</p></li><li><p>Performance might differ (for example, a particular JS workload might be faster in one engine than another due to JIT heuristics).</p></li><li><p>Some APIs might not be available in one (Safari is often last to implement some new APIs like WebRTC or IndexedDB versions, etc. though they eventually do).</p></li></ul><p>But the core concepts we discussed (network -&gt; parse -&gt; layout -&gt; paint -&gt; composite -&gt; JS execution) apply to all, just with varying internal approaches or names:</p><ul><li><p>In Gecko: parse -&gt; frame tree -&gt; display list -&gt; WebRender scene or layer tree (if WebRender disabled) -&gt; composite.</p></li><li><p>In WebKit: parse -&gt; render tree -&gt; graphics layers -&gt; composite (via CoreAnimation).</p></li></ul><p>And all have analogous subsystems (DOM, styling, layout, graphics, JS engine, networking, processes/threads).</p><p>Knowing these helps in debugging: e.g. if something is janky in Safari but not Chrome, it could be WebKit's painting differs. Or if CSS is slow in Firefox, maybe it's hitting a path that isn't parallelized by Stylo (though that's rare).</p><p>To sum up, while Chromium, Gecko, and WebKit have different implementations and even some different innovations (parallel CSS in Gecko, WebRender GPU, etc.), they increasingly implement the same web standards and even collaborate on many. The choice of engine matters more for the platform vendors and open web diversity, but as a developer you mostly care that your site runs everywhere. Under the hood, each engine's unique architecture might lead to different performance profiles or bugs, which is why testing and using performance diagnostics in each (like Firefox's performance tool vs Chrome's) can be insightful. It's beyond our scope to list all differences, but hopefully this gives an idea of the landscape: they are convergent in high-level design (multi-process, similar pipelines) yet divergent in specific technical solutions.</p><h2><strong>Conclusion and Further Reading</strong></h2><p>We've journeyed through the life of a web page inside a modern browser - from the moment a URL is entered, through networking and navigation, HTML parsing, styling, layout, painting, and JavaScript execution, all the way to the GPU putting pixels on the screen. We've seen that browsers are essentially mini operating systems: managing processes, threads, memory, and a slew of complex subsystems to ensure web content loads fast and runs securely. For web developers, understanding these internals can demystify why certain best practices (like minimizing reflows or using async scripts) matter for performance, or why some security policies (like not mixing origins in iframes) exist.</p><p>A few key takeaways for developers:</p><p><strong>Optimize network usage</strong>: Fewer round trips and smaller files = faster start render. The browser can do a lot (HTTP/2, caching, speculative loading) but you should still leverage techniques like resource hints and efficient caching. The networking stack is high-performance, but latency is always a killer.</p><p><strong>Structure your HTML/CSS for efficiency</strong>: A well-structured DOM and lean CSS (avoid very deep trees or overly complex selectors) can help the parsing and style systems. Understand that CSS and DOM build computed style, then layout computes geometry - heavy DOM manipulations or style changes can trigger these recalculations.</p><p><strong>Batch DOM updates</strong>: to avoid repeated style/layout thrash. Use DevTools Performance panel to catch when your script is causing many layouts or paints.</p><p><strong>Use compositing-friendly CSS for animations</strong>: Animations of transform or opacity stay off the main thread and on the compositor, yielding smooth animations. Avoid animating layout-bound properties if possible.</p><p><strong>Mind the JS execution</strong>: Though JS engines are super-fast, long tasks will block the main thread. Break up long operations (so the page stays responsive) and in some cases consider Web Workers for background tasks. Also, remember that heavy JS can cause GC pauses (rarely long nowadays, but can happen if memory balloons).</p><p><strong>Security features</strong>: Embrace them - e.g. use iframe sandbox or rel=noopener when appropriate, because you now know the browser will isolate those anyway; cooperating with it is good.</p><p><strong>DevTools is your friend</strong>: The performance and network panels in particular are gold mines for seeing exactly what the browser is doing. If something is slow or janky, the tools often point to the cause (a long layout, a slow paint, etc.).</p><p><strong>For those eager to dive even deeper, an excellent resource is Browser Engineering by Pavel Panchekha and Chris Harrelson (available at <a href="http://browser.engineering">browser.engineering</a>).</strong> </p><p>It's essentially a free online book that guides you through building a simple web browser, covering networking, HTML/CSS parsing, layout, and more in an accessible way. It can serve as a more in-depth companion to everything we discussed, solidifying knowledge by example. Additionally, the Chrome team's multi-part series "<a href="https://developer.chrome.com/blog/inside-browser-part1">Inside look at modern web browser</a>" provides a readable overview with diagrams. The V8 blog (<a href="http://v8.dev">v8.dev</a>) and <a href="https://hacks.mozilla.org/">Mozilla's Hacks blog</a> are great for learning about engine advances (e.g. new JIT compiler tiers or WebRender internals).</p><p>In conclusion, modern browsers are marvels of software engineering. They successfully abstract away all this complexity so that as developers we mostly just write HTML/CSS/JS and trust the browser to handle it. Yet, by peering under the hood, we gain insights that help us write more performant, robust applications. We appreciate why certain techniques improve user experience (e.g. avoiding blocking the main thread, or reducing unnecessary DOM complexity) because we see how the browser has to work under the covers. The next time you debug a webpage or wonder why Chrome or Firefox behaves a certain way, you'll have a mental model of the browser's internals to guide you. </p><p>Happy building, and remember that the web platform's depth rewards those who explore it - there's always more to learn, and tools to help you learn it.</p><h3><strong>Further Reading</strong></h3><ul><li><p><strong><a href="https://browser.engineering/">Web Browser Engineering</a></strong> - How browsers work deep-dive book</p></li><li><p><strong><a href="https://www.youtube.com/playlist?list=PL9ioqAuyl6ULp1f36EEjIN1vSBEfsb-0a">Chromium University</a> - </strong>Free series of deep-dive videos into how Chromium works, including the excellent <a href="https://www.youtube.com/watch?v=K2QHdgAKP-s&amp;list=PL9ioqAuyl6ULp1f36EEjIN1vSBEfsb-0a&amp;index=3&amp;pp=iAQB">Life of a Pixel talk</a></p></li><li><p><strong><a href="https://developer.chrome.com/blog/inside-browser-part1">Inside the Browser (Chrome Developers Blog series)</a></strong> - parts 1-4 cover architecture, navigation flow, rendering pipeline, and input/controller threads.</p></li><li><p><strong><a href="https://addyosmani.com/blog/chrome-17th/">Google Chrome at 17 - A history of our browser</a></strong></p></li></ul><p><em>Illustrations in this piece were commissioned from Susie Lu. </em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YPnw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YPnw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!YPnw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!YPnw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!YPnw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YPnw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png" width="1456" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:741093,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/173324218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YPnw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!YPnw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!YPnw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!YPnw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a42e9cf-a2e1-4d77-b7bb-f9501c423f29_5246x3496.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Vibe coding is not the same as AI-Assisted engineering.]]></title><description><![CDATA[Can you really 'vibe' your way to production-ready software?]]></description><link>https://addyo.substack.com/p/vibe-coding-is-not-the-same-as-ai</link><guid isPermaLink="false">https://addyo.substack.com/p/vibe-coding-is-not-the-same-as-ai</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Sat, 30 Aug 2025 14:57:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wQ9T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Vibe coding is not the same as AI-Assisted engineering.</strong> A recent <a href="https://www.reddit.com/r/vibecoding/comments/1myakhd/how_we_vibe_code_at_a_faang/#:~:text=Software%20development.,Go%20to%20comments%201%20Share">Reddit post</a> described how a FAANG team uses AI and it sparked an important conversation about semantics: "vibe coding" and professional "AI-assisted engineering". </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iddJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iddJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iddJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iddJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iddJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iddJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg" width="1456" height="1020" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1020,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;how we vibe code at faang&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="how we vibe code at faang" title="how we vibe code at faang" srcset="https://substackcdn.com/image/fetch/$s_!iddJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iddJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iddJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iddJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122dc48-ab21-46f4-a005-f709ecb3550d_1688x1182.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While the post was framed as an example of the former, the process it detailed - complete with technical design documents, stringent code reviews, and test-driven development - is a clear example of the latter imo. This distinction is critical because conflating the two risks both devaluing the discipline of engineering and giving newcomers a dangerously incomplete picture of what it takes to build robust, production-ready software. </p><p><strong>Vibe coding is great for momentum, but without structure it collapses under production demands.</strong></p><p>As a reminder: "vibe coding" is about fully giving in to the creative flow with an AI (high-level prompting), essentially forgetting the code exists. It involves accepting AI suggestions without deep review and focusing on rapid, iterative experimentation, making it ideal for prototypes, MVPs, learning, and what Karpathy calls "throwaway weekend projects." This approach is a powerful way for developers to build intuition and for beginners to flatten the steep learning curve of programming. It prioritizes speed and exploration over the correctness and maintainability required for professional applications. </p><p>There is a spectrum between vibe coding and doing it with a little more planning, spec-driven development, including enough context etc and what is AI-assisted engineering across the software development lifecycle. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cYxB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cYxB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!cYxB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!cYxB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!cYxB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cYxB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp" width="514" height="342.78434065934067" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:514,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;AI-Assisted Development Spectrum&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="AI-Assisted Development Spectrum" title="AI-Assisted Development Spectrum" srcset="https://substackcdn.com/image/fetch/$s_!cYxB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!cYxB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!cYxB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!cYxB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bb76118-dc78-43e6-8bdb-8b9c6703753f_1536x1024.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In contrast to the post, the process described in the Reddit post is a methodical integration of AI into a mature software development lifecycle. This is "AI-assisted engineering," where AI acts as a powerful collaborator, not a replacement for engineering principles.  In this model, developers use AI as a "force multiplier" to handle tasks like generating boilerplate code or writing initial test cases, but always within a structured framework. <strong>Crucially, the big difference here is the human engineer remains firmly in control, responsible for the architecture, reviewing and understanding every line of AI-generated code, and ensuring the final product is secure, scalable, and maintainable.</strong> The 30% increase in development speed mentioned in the post is a result of augmenting a solid process, not abandoning it. </p><p>For engineers, labeling disciplined, AI-augmented workflows as "vibe coding" misrepresents the skill and rigor involved. For those new to the field, it creates the false and risky impression that one can simply prompt their way to a viable product without understanding the underlying code or engineering fundamentals. If you're looking to do this right, start with a solid design, subject everything to rigorous human review, and treat AI as an incredibly powerful tool in your engineering toolkit - not as a magic wand that replaces the craft itself.</p><h2>Vibe Coders, Rodeo Cowboys, and Prisoners</h2><p>The community is split: optimists see a revolution, skeptics see old cowboy coding in new clothes. Optimists call vibe coding the next abstraction layer - like moving from assembly to Python. Outsiders will push it forward until it works. </p><p>Realists use it for spikes, but enforce discipline after. <em>Use AI like a junior dev: helpful, but never unsupervised. </em>Skeptics dismiss it as marketing spin - <em>If I can tell it's vibe-coded, it's bad. Good software is good software.</em> The consensus middle-ground is pragmatic: vibe coding is a sandbox for creativity, but scaling demands engineering.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CX6m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CX6m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CX6m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CX6m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CX6m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CX6m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;diagram, venn diagram&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="diagram, venn diagram" title="diagram, venn diagram" srcset="https://substackcdn.com/image/fetch/$s_!CX6m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CX6m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CX6m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CX6m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa52f32d-d2fe-4c08-86f2-f99da6e28d88_1536x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A fun Venn diagram by <a href="https://forrestbrazeal.com/">Forrest Brazeal</a> contrasts three engineering personas in the age of AI assistance &#8211; "Vibe Coders," "Rodeo Cowboys," and "Prisoners" &#8211; through the metaphor of</em> <em>rope</em> <em>(freedom vs. constraint). Each archetype highlights extreme approaches that are</em> <em>unlikely to deliver production&#8209;grade software.</em></p><h2>Rope, Risk, and Developer Personas in AI Coding</h2><p>In the context of software development, "rope" represents the level of freedom and risk a developer is allowed &#8211; or allows themselves &#8211; when building.</p><p><strong>"Rodeo Cowboys" thrive with </strong><em><strong>too much</strong></em><strong> rope</strong>, embracing a wild-west style of coding with high risk tolerance and minimal oversight. They'll happily lasso together new features or fixes on the fly, sometimes <em>literally</em> riding on adrenaline. <strong>"Prisoners" operate with </strong><em><strong>too little</strong></em><strong> rope</strong>, bound by rigid constraints, heavy governance, or fear of mistakes &#8211; they move slowly and cautiously, if at all. And in between lies the modern <em>AI-powered</em> coder: <strong>"Vibe Coders" are given </strong><em><strong>just enough</strong></em><strong> rope to hang themselves</strong>.</p><p>They rapidly generate code by prompting AI with natural-language "vibes" and trust the output, often without fully understanding or testing it.</p><p>Each persona has a distinct relationship with AI assistance and risk:</p><ul><li><p><strong>Vibe Coders</strong> &#8211; These developers collaborate with large language models (LLMs) in a <em>free-flowing, conversational</em> manner, describing what they want and letting the AI fill in the implementation. It's an exhilarating level of freedom &#8211; <em>"just tell the AI to add a login page or fix this bug"</em>. The <strong>upside</strong> is speed and creativity; vibe coders act more as <em>orchestrators</em> than manual coders, focusing on ideas over syntax. But the <strong>downside</strong> is a lack of control. They often run with no safety harness: minimal code review, sparse tests, and blind trust in AI outputs. In other words, a <em>lot of rope with little guidance</em>, which can result in codebases that are brittle and opaque. One engineer noted that vibe coding without review is like <em><a href="https://www.reddit.com/r/programming/comments/1l4x5tu/the_illusion_of_vibe_coding_there_are_no/">"an electrician just threw a bunch of cables through your walls and hoped it all worked out, instead of running them with intention"</a></em> &#8211; things might function initially, but hidden flaws lurk behind the walls.</p></li><li><p><strong>Rodeo Cowboys</strong> &#8211; The classic "cowboy coder" isn't new to software engineering, and in the AI era this persona still exists (sometimes <em>augmented</em> by AI, sometimes not). Rodeo cowboys are those developers who push code to production with daring speed and little process &#8211; they'll prototype in production, hotfix live systems at 2 AM, and generally <strong>embrace risk in exchange for velocity</strong>. They have <strong>high risk tolerance</strong> (shared with vibe coders) but may rely more on their own intuition and experience than on AI. If vibe coders are guided by AI "voices," rodeo cowboys follow their gut. They do have <em>some</em> rope constraints (even rodeos occur in fenced arenas), but often just enough rope to nearly hang themselves. The overlap of these styles is obvious: a vibe coder can <em>become</em> a rodeo cowboy when they start shipping AI-generated code directly to production in a blaze of glory. The results can be spectacular&#8230; or disastrous.</p></li><li><p><strong>Prisoners</strong> &#8211; On the opposite extreme, we have engineers who are so constrained by process, bureaucracy, or self-imposed caution that they can barely move. These "prisoners" might work in heavily regulated industries or legacy systems where every line of code is a battle. They have <strong>almost no rope</strong> &#8211; tight guardrails, mandatory approvals, perhaps skepticism or outright bans on AI assistance. While this mindset ensures safety and predictability, it also stifles innovation. Prisoner-type developers may watch the AI revolution from the sidelines, unable to partake due to organizational rules or fear of the unknown. They won't hang themselves with rope because they're never given any slack, but they also might not deliver anything new and exciting. Interestingly, prisoners and vibe coders share one trait: <strong>being "ordered around by disembodied voices."</strong> In the prisoner's case, the voices are process checklists, ticketing systems, or bureaucratic policies dictating every move &#8211; whereas for vibe coders it's the AI's suggestions. Neither is truly in control.</p></li></ul><p>In reality, engineers aren't binary labels &#8211; a single person might embody elements of all three personas depending on the project and pressures. The Venn diagram's punchline is that all three extremes fail in the long run: too much freedom <em>or</em> too many constraints both hinder sustainable engineering. </p><p><strong>The key is finding balance &#8211; giving developers enough rope to innovate, but not so much that they (or the codebase) end up strangled by bugs and technical debt.</strong> The rest of this report explores why unchecked "vibe coding" has drawn sharp criticism from industry leaders and online communities, and how teams can harness AI-assisted development more responsibly.</p><p><strong>Bottom line from the community:</strong> vibe coding accelerates exploration but introduces hidden liabilities that blow up in production. Recent discussions highlight recurring problems: </p><ul><li><p><a href="https://www.vktr.com/ai-technology/vibe-coding-explained-use-cases-risks-and-developer-guidance/">Security flaws</a>: API keys left in code, no input sanitization, or naive auth logic</p></li><li><p><a href="https://www.reddit.com/r/programming/comments/1l4x5tu/the_illusion_of_vibe_coding_there_are_no/">Fragile debugging</a>: non-engineers hit walls when even minor changes cause cascading failures. </p></li></ul><h2>Will you vibe code your way to production?</h2><p>Across the software industry, seasoned engineering leaders are issuing a clear warning: <strong>AI-assisted "vibe coding" may rapidly create more problems than it solves in a production codebase.</strong> <a href="https://www.vktr.com/ai-technology/vibe-coding-explained-use-cases-risks-and-developer-guidance/">Canva's CTO Brendan Humphreys</a> captured this sentiment bluntly: </p><blockquote><p><strong>"No, you won't be vibe coding your way to production &#8211; not if you prioritize quality, safety, security and long-term maintainability at scale."</strong> </p></blockquote><p>These tools require <em>careful supervision by skilled engineers</em>, especially for mission-critical code. In other words, AI can accelerate development, but it cannot replace the hard disciplines of software engineering. When those disciplines are skipped, the result is often a fragile system and a mountain of hidden issues, as <a href="https://www.reddit.com/r/vibecoding/comments/1mge229/my_vibe_coding_journey/">visualized</a> by /r/vibecoding:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GNI0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GNI0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GNI0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GNI0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GNI0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GNI0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg" width="384" height="360.6" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:601,&quot;width&quot;:640,&quot;resizeWidth&quot;:384,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;r/vibecoding - Vibe- Coding Vibe- Debugging&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="r/vibecoding - Vibe- Coding Vibe- Debugging" title="r/vibecoding - Vibe- Coding Vibe- Debugging" srcset="https://substackcdn.com/image/fetch/$s_!GNI0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GNI0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GNI0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GNI0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded23f71-746e-44b2-ad4f-901483ab6f9e_640x601.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Recent findings back up these warnings. In an August 2025 survey by <a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">Final Round AI</a>, <strong>18 CTOs were asked about vibe coding and 16 reported experiencing production disasters directly caused by AI-generated code</strong>. These are tech leaders with no incentive to hype a trend &#8211; their perspective comes from hard lessons in the field. As one summary <a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">put it</a>:</p><blockquote><p>"AI promised to make us all 10x developers, but instead it's making juniors into prompt engineers and seniors into code janitors cleaning up AI's mess." </p></blockquote><p>The ability to ship features faster means little when those features are riddled with flaws that wake someone up at 2 AM.</p><p><strong>What kinds of failures are we talking about?</strong> The CTOs gave examples spanning performance meltdowns, security breaches, and maintainability nightmares:</p><ul><li><p>A <strong><a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">performance disaster</a></strong> was recounted by a CTO who watched an AI-generated database query <em>work perfectly in testing</em> but then bring their system to its knees in production. The query was syntactically correct (no obvious errors), so the developer assumed it was fine. But it was <em>woefully inefficient</em> at scale, something an experienced engineer or a proper code review could have caught. <em>"It worked for a small dataset, but as soon as real-world traffic hit, the system slowed to a crawl,"</em> the CTO said. The team wasted a week debugging why the app was hanging &#8211; a week they would not have lost had the code been thoughtfully designed. This highlights a key danger: <strong>AI doesn't understand your system's architecture or non-functional requirements</strong> unless you explicitly guide it. It can produce code that <em>looks</em> good and passes basic tests, yet falls apart under real workloads. As another leader put it, vibe coding creates an illusion of success until <em>"the system begins to wobble under workloads"</em> &#8211; then it <strong>catastrophically fails without warning</strong>.</p></li><li><p>A <strong>security lapse</strong> was described by an architect who caught a devastating bug in an AI-written authentication module. A junior developer had "vibed" their way through building a user permissions system by copy-pasting AI suggestions and Stack Overflow snippets. It passed the initial tests and even QA. But two weeks after launch, they discovered a critical flaw: <strong>users with deactivated accounts still had access to certain admin tools</strong>. The AI had inverted a truthy check (e.g. using a negation incorrectly), a subtle bug that slipped through. Because no one deeply understood that autogen code, the issue went unnoticed. <em>"It seemed to work at the time,"</em> the developer had said. This is a classic AI-generated mistake &#8211; logically inverted logic that a human might catch if they wrote it, but the AI's code was treated as a black box. The result was a serious security breach. A senior engineer spent two days untangling that one-line bug in a sea of AI code. The architect dubbed this <strong><a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">"trust debt"</a></strong> &#8211; <em>"it puts pressure on your senior engineers to be permanent code detectives, reverse-engineering vibe-driven logic just to ship a stable update."</em> <strong>In other words, every time you trust AI output without verification, you incur a debt that must be paid by someone combing through that code later to actually understand and fix it.</strong></p></li><li><p>A <strong>maintainability and complexity nightmare</strong> came from a story of an AI-generated feature that technically worked fine&#8230; until requirements changed. One team allowed a developer to vibe-code an entire user authentication flow with AI, stitching together random npm packages and Firebase rules in minutes. Initially, <em>"on the surface, things shipped &#8211; clients were happy, everyone's high-fiving,"</em> said one engineering manager. But when the team later needed to extend the auth system for new roles and region-specific privacy rules, <em>"it collapsed. No one could trace what was connected to what. Middleware was scattered across six files. There was</em> <em>no mental model, just vibes."</em> In the end, they had to <strong>rewrite the whole thing from scratch</strong>, because debugging the AI's spaghetti code was <em><a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">"like archaeology."</a></em> This highlights how <strong>lack of structure and consistency in AI outputs can lead to unmaintainable code.</strong> </p></li><li><p>A <strong>false sense of security</strong> is another insidious danger. AI-generated code often <em>appears perfectly neat and even</em> idiomatic*. It might pass unit tests you wrote. So developers let their guard down. One CTO observed that vibe coding's most dangerous characteristic is code that <a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">"appears to work perfectly until it </a><strong><a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">catastrophically fails</a></strong><a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">."</a> It lulls teams into production with a smile, then bites hard. Even code review can fail here: reviewing a 1000-line AI-written PR is not much easier than writing it from scratch, especially if the reviewer assumes the code is mostly correct. And if an AI is used to assist code review as well (yes, that's a thing), then we get the blind leading the blind &#8211; <a href="https://shiftmag.dev/the-illusion-of-vibe-coding-5297/">"trusting machines to verify machines"</a>, as one commentary put it. Maintainability suffers because no one truly owns the code's logic. </p></li></ul><p><strong>The consensus from leaders is that vibe coding puts critical software qualities at risk: security, clarity, maintainability, and team knowledge.</strong> </p><p>In summary, industry leaders aren't luddites resisting a new technology &#8211; they're the ones responsible for keeping systems running reliably. Their message is a <strong>critical but constructive</strong> one: <em>use AI to assist, not to abdicate.</em> Code still needs human judgment, especially if it's destined for production. As one veteran put it, <strong><a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">"AI tools are copilots, not autopilots."</a></strong> They can help fly the plane, but a human pilot must chart the course and be ready to grab the controls when turbulence hits.</p><h2>"This isn't engineering, it's hoping."</h2><p>It's not just CTOs and thought leaders sounding the alarm &#8211; the rank-and-file developer community (the ones in the trenches with AI tools daily) have been vigorously debating vibe coding throughout 2025. On <a href="https://www.reddit.com/r/programming/comments/1l4x5tu/the_illusion_of_vibe_coding_there_are_no/">Reddit</a> and <a href="https://news.ycombinator.com/item?id=44959069">Hacker News</a>, threads about vibe coding have garnered hundreds of upvotes, with seasoned developers sharing war stories and sharp critiques, as well as a few counterexamples and success stories. The overall mood: <strong>high skepticism of using un-reviewed AI code in serious projects, mixed with some optimism for limited use cases.</strong></p><p>On the critical side, developers recount how vibe coding has negatively impacted their workflows and team dynamics. A top comment on one Reddit thread <a href="https://www.reddit.com/r/programming/comments/1l4x5tu/comment/mwd6n9j/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button">lamented</a>: </p><blockquote><p>"I just wish people would stop pinging me on PRs they obviously haven't even read themselves, expecting me to review 1000 lines of completely new vibe-coded feature that isn't even passing CI." </p></blockquote><p>The frustration here is palpable &#8211; <em>code review</em> becomes a farce if the author themselves doesn't understand the AI-generated code or bother to run tests. It shifts the burden onto unwitting teammates. Another commenter <a href="https://www.reddit.com/r/programming/comments/1l4x5tu/comment/mwdhsz8/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button">responded</a> that this behavior <em>"feels so far below the minimum bar of professionalism"</em>, likening it to a tradesperson doing a shoddy job that others have to fix. <strong>Peer review and team trust break down</strong> when one developer dumps AI output on others without due diligence.</p><p>The cultural backlash is evident &#8211; nobody wants to work with a so-called engineer whose contribution is copy-pasting from ChatGPT and shrugging when things break.</p><p>The phrase <em><a href="https://shiftmag.dev/the-illusion-of-vibe-coding-5297/">"this isn't engineering, it's hoping"</a></em> has been used in these discussions (a paraphrase of a line from the ShiftMag article). It captures the sentiment that vibe coding without proper review/testing is akin to hoping the software works by magic. Many developers point out that <strong>coding is supposed to be an engineering discipline</strong>, not wish fulfillment. </p><p>However, amid the criticism, there are also <strong>counterpoints and nuanced views</strong> emerging from the community. Not everyone is writing off vibe coding entirely; some are experimenting and finding <em>niche scenarios where it excels</em>. A highly upvoted <a href="https://news.ycombinator.com/item?id=44959069">Hacker News comment</a> framed it through the classic <a href="https://news.ycombinator.com/item?id=44959069">Innovator's Dilemma</a>: today's experts dismiss vibe coding as a toy, but tomorrow it could evolve and render old methods obsolete. While many responses disagreed with the inevitability of that outcome, the discussion opened up a more <em>optimistic</em> perspective: maybe we're just in the early clunky phase of AI coding, and improvements in AI or methodologies will address current flaws.</p><p>A practical example was <a href="https://news.ycombinator.com/item?id=44960238">given</a> by one HN user who successfully <strong>vibe-coded a bespoke internal tool</strong> in record time. </p><blockquote><p>"Last week I converted a bunch of Docker Compose configs to run on Terraform (Opentofu) &#8211; took me maybe an hour or two with Claude, while watching TV. Would've been a week easy if I did it the artisanal way of reading docs and Stack Overflow." </p></blockquote><p>This developer wasn't building a customer-facing product, but rather automating a tedious infrastructure task. For that use-case, vibe coding was a huge win: it saved time and the code was "good enough" for an internal tool that he controlled. Many others chimed in to agree that <strong>for small-scale or one-off scripts and glue code, vibe coding can be a massive productivity booster</strong>. It's the classic 80/20 trade-off &#8211; if you need something quick and you're the only stakeholder, the AI can crank out a solution in minutes instead of days.</p><p>Crucially, the developer in this story <em>knew the limits</em>: he treated the AI as an assistant to speed up his own work (even multitasking entertainment while coding), and presumably he validated the output. This aligns with a refrain seen in several discussions: <em>"Good time saver,</em> <em>if you know what you're doing."</em> The implication is that an experienced dev can use vibe coding like a power tool &#8211; accelerating the grunt work &#8211; but they still architect, guide, and double-check the result. In inexperienced hands, the same power tool can wreak havoc.</p><p>There's also an interesting analogy drawn in these debates: <strong><a href="https://news.ycombinator.com/item?id=44959069">"Coding with agentic LLMs is just project management."</a></strong> Instead of writing code, your job becomes breaking down tasks for the AI, verifying each chunk, and integrating the results &#8211; essentially acting as a project manager for a very junior (but very fast) developer. Some developers say this <em>feels</em> easier or lazier &#8211; hence "vibe" &#8211; while others find it still requires solid development chops, just applied differently. The ones who succeed treat it systematically: break problems into small, verifiable prompts (tasks that fit in context windows), run tests at each step, and iterate. The ones who fail just throw a big vague prompt at GPT and paste whatever comes out.</p><p><strong>Bottom line from the community:</strong> <em>Vibe coding is not a silver bullet.</em> Experienced devs mostly view it as a fun or useful technique for prototyping and automating trivial tasks, <strong>not</strong> a replacement for disciplined development on <a href="https://news.ycombinator.com/item?id=44959069">complex systems</a>. The hype that "LLMs will write all our software" is being met with healthy skepticism on the ground. At the same time, there's recognition that AI coding assistants are here to stay and <em>can</em> provide huge productivity boosts when <a href="https://www.vktr.com/ai-technology/vibe-coding-explained-use-cases-risks-and-developer-guidance/">used judiciously</a>. The focus is shifting toward <strong>how to integrate AI into workflows without losing the rigor of software engineering</strong> &#8211; which we'll explore next.</p><h2>Where <em>can</em> Vibe Coding help?</h2><p>If pure vibe coding is risky for production, is it <em>good for anything</em>? The answer from both industry leaders and practitioners is <em>yes</em>: <strong>vibe coding shines in certain scenarios &#8211; especially in rapid prototyping, exploratory projects, and as a creative aid &#8211; but one must know when to stop vibing and start engineering</strong>.</p><p>Several CTOs in the <a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">FinalRound survey</a> acknowledged that they don't <a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">"condemn vibe coding entirely"</a>. Instead, they <strong>compartmentalize its use</strong> to get the best of both worlds. <a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">Matt Cumming</a>, founder at LittleHelp, shared that he can <em>"create and deploy a functional micro-SaaS web app in a day with no issues, which is insane."</em> Over a weekend, he took an idea from thought to a live product by leveraging AI for the heavy lifting. This kind of speed is unprecedented &#8211; essentially compressing what might be a 2&#8211;3 week MVP build into 48 hours. <strong>For hackathons, demos, internal tools, and validating product-market fit quickly, vibe coding can be a game-changer.</strong> It allows small teams (or even solo devs) to punch far above their weight in terms of feature output.</p><p>However &#8211; and this is a big caveat &#8211; every leader who touted such successes added <em>major warnings and limits</em>. Matt Cumming himself had learned the hard way that unleashing AI without guardrails can backfire. He described how an early project of his was <em><a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">"completely destroyed by AI in a few minutes"</a></em> after a month of work, due to some AI-generated corruption or error. Chastened by that experience, he established a disciplined approach: </p><blockquote><p>"We start any AI-assisted coding by collaborating with the AI to write the functional spec and steps required in a project document. The other crucial thing I learned is to get the agent to do a security check *on any new functionality you add." </p></blockquote><p>In other words, <strong>use the AI to help plan and review, not just to spit out code</strong>. By having the AI outline the design first, he ensures he's thought through the architecture. By having it perform security analysis on the output, he adds an automated sanity check for vulnerabilities. His team also confines vibe coding to "throwaway projects and prototypes, not production systems that need to scale". The vibe-coded app serves as a proof of concept, which they might later <em>rewrite or harden</em> for real-world use.</p><p>Another leader, <a href="https://www.finalroundai.com/blog/what-ctos-think-about-vibe-coding">Brett Farmiloe</a> (CEO at Featured), echoed this: </p><blockquote><p>"Vibe coding is great when you're starting from scratch... we built a site using v0 (an AI tool) and deployed quickly. But with our established production codebase, we only use vibe-coded components as a starting point &#8211; then technical team members take over to finish." </p></blockquote><p>Both he and Cumming treat AI-generated code as <strong>scaffolding</strong>. It's fantastic to erect a structure rapidly, see if it holds the shape you want, but you wouldn't leave the rickety scaffolding in place for the final building. You replace or reinforce it with solid materials. <strong>In software terms: the AI prototype must be refactored, tested, and owned by human engineers before it becomes the permanent solution.</strong></p><p>Another approved use case is <strong><a href="https://www.vktr.com/ai-technology/vibe-coding-explained-use-cases-risks-and-developer-guidance/">legacy code refactoring</a></strong>. As noted in one analysis, some companies let AI rewrite portions of old code into newer frameworks or languages as a "head start". Since that code is going to be reviewed and tested thoroughly anyway, using AI to do the brute-force translation or rote work can save time. Similarly, AI can help squash known bug patterns: e.g. <em>"solve these 5 specific defects in our code"</em> &#8211; a targeted application rather than carte blanche generation. In these cases, the <strong>scope is limited and the outputs are verified</strong>, aligning more with <strong>AI-assisted programming</strong> than blind vibe coding.</p><p>We can summarize where vibe coding <em>adds the most value</em> in 2025:</p><ul><li><p><strong>Rapid Prototyping &amp; hackathons:</strong> Need to demo an idea by tomorrow? AI code generation can materialize a working prototype incredibly fast. It's okay if the code is messy, as long as it showcases the concept. <strong>Speed and iteration matter more than robustness</strong> at this stage. Vibe coding lets you try three different approaches in a day &#8211; something traditional coding would never allow. </p></li><li><p><strong>One-off scripts and internal tools:</strong> If the code doesn't need long-term maintenance and is only used by the author, the risk is relatively low. Writing a quick data analysis script, converting file formats, automating server configs &#8211; those kinds of tasks can often be safely vibe-coded because if it breaks, the person who wrote it (with AI) will fix it, and there aren't many external consequences. Basically, <strong>personal projects or automation</strong> are fertile ground for vibe coding, saving engineers from drudge work.</p></li><li><p><strong>Greenfield development (with caution):</strong> Starting a new project from scratch is easier to vibe code than integrating AI into a sprawling legacy system. When nothing exists yet, there are fewer constraints and no established style to adhere to. A team might vibe code the first version of a new microservice or frontend app to hit an aggressive deadline, then polish it later. </p></li></ul><p>In all these scenarios, though, it's assumed that <strong>experienced developers are in the loop</strong>. The people achieving success with vibe coding still apply their judgment to prompt the AI effectively and to verify the outputs. They treat it as <strong>pair programming with a tireless but error-prone junior dev</strong>. Contrast this with novices who might think the AI is a magic shortcut to skip learning &#8211; that approach often ends in frustration. Vibe coding, if overused by beginners, can short-circuit the learning process and produce grads who have impressive projects on their resume but lack fundamental skills &#8211; another long-term risk noted by educators. Ultimately, when paired with human experience, it can meaningfully reduce time on menial work, but you likely can&#8217;t <a href="https://www.reddit.com/r/vibecoding/comments/1mu6t8z/whats_the_point_of_vibe_coding_if_i_still_have_to/">skip</a> human review and refinement entirely:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F8pz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F8pz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 424w, https://substackcdn.com/image/fetch/$s_!F8pz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 848w, https://substackcdn.com/image/fetch/$s_!F8pz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 1272w, https://substackcdn.com/image/fetch/$s_!F8pz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F8pz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png" width="1456" height="552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/339d50d3-4daa-43c4-8392-97da564101db_1866x708.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:248467,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/172063654?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F8pz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 424w, https://substackcdn.com/image/fetch/$s_!F8pz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 848w, https://substackcdn.com/image/fetch/$s_!F8pz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 1272w, https://substackcdn.com/image/fetch/$s_!F8pz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339d50d3-4daa-43c4-8392-97da564101db_1866x708.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The key insight is that vibe coding can be extremely valuable as an ideation and acceleration tool, but it should almost never be the </strong><em><strong>last step</strong></em><strong> for anything that lives on.</strong> Use it to get from zero to demo, or to churn out boilerplate, or explore multiple options. Then comes the crucial phase: <em>the hand-off from vibes to rigorous engineering</em>. How to do that effectively is our next focus.</p><h2>Spec-Driven: An antidote to prompt-chaos?</h2><p>One promising development in response to the pitfalls of vibe coding is the rise of <strong>spec-driven and <a href="https://medium.com/@takafumi.endo/software-3-0-blueprint-from-vibe-coding-to-verified-intelligence-swarms-23b4537f12fa">"agentic" AI coding approaches</a></strong>. These methods aim to retain the productivity benefits of AI generation while introducing more structure, planning, and verification &#8211; essentially adding <em>rails</em> to the free-form vibe coding process.</p><p><strong>Spec-driven AI development</strong> means starting with a clear specification or design, often created in collaboration with the AI, <em>before</em> any code is written. Instead of immediately prompting "Hey AI, build me a feature," an engineer might prompt the AI to "Help me outline the requirements and modules for this feature." By having an explicit spec (be it a written paragraph, a formal design doc, or a list of steps and functions), the developer ensures both they and the AI have a shared understanding of the goal. It's akin to writing pseudo-code or user stories first. This addresses one major issue of vibe coding: lack of direction. Writing a functional spec with the AI can keep a team on track and prevent the AI from wandering off into irrelevant complexity.</p><p>Practically, spec-driven workflow can involve <strong>prompting the AI to generate high-level plans, interface definitions, and even test cases up front</strong>, and iterating on those until the human is satisfied that the plan makes sense. Only then do they ask the AI to implement the pieces of that plan. This is similar to how good engineers work with juniors &#8211; you wouldn't let a junior developer code an entire subsystem solo on day one; you'd first agree on a design together. With AI, we're learning to do the same: treat it like a junior that needs a blueprint. Early evidence suggests this yields better results than ad-hoc prompting. </p><p><strong><a href="https://medium.com/@takafumi.endo/software-3-0-blueprint-from-vibe-coding-to-verified-intelligence-swarms-23b4537f12fa">Agentic AI approaches</a></strong> take this further by enabling AI to not only follow spec but also <strong>perform some self-directed actions like running code, testing, and refining</strong>. The term "agentic" here refers to AI agents that can take higher-level goals and then act in an autonomous, iterative way to achieve them (within bounds). For example, some tools allow the AI to do things like execute the code it wrote, observe the results, and fix errors &#8211; all without the user explicitly asking at each step. </p><p>Spec-driven and agentic approaches contrast with naive vibe coding in a few key ways:</p><ul><li><p><strong>Upfront intent vs. after-the-fact fixes:</strong> Instead of writing code and then trying to retrofit understanding (or just hoping it works), the spec-first approach encodes intent clearly from the start. The AI's output is judged against a known target. This reduces the "surprise" factor of AI code that technically does something you asked but not in the way you wanted (a common complaint when prompts are not specific enough).</p></li><li><p><strong>Small iterations vs. big bang:</strong> Agentic workflows encourage small, testable increments. Rather than asking for a thousand-line program in one go, you ask for one function, see it pass tests, then proceed. Essentially, it mimics <strong>test-driven development</strong> but with AI as the one writing the implementation from your test descriptions. If vibe coding is like typing a novel in one prompt, spec-driven is like pair-writing it chapter by chapter with continuous editorial review.</p></li><li><p><strong>AI in the loop vs. human alone:</strong> Interestingly, agentic approaches can also take <em>more</em> burden off humans in some respects by letting the AI handle tedious verification steps. For instance, if every AI-written PR must come with an explanation of <em>why</em> the changes were made, an AI agent can be tasked with generating an initial draft of that explanation, which the developer then edits for accuracy. This ensures no code is merged without context. In effect, <strong>these approaches try to weave AI into the fabric of software development best practices &#8211; not replace them</strong>. Instead of ignoring testing and review (as vibe coding often does), they automate parts of testing and review.</p></li></ul><p>In practice, teams exploring these approaches use tactics like:</p><ul><li><p><strong>Requiring design docs</strong> for any major AI-generated component (even if it's one page, written with AI assistance). This ensures thought was given to how the component fits the system.</p></li><li><p><strong>Using AI to generate unit tests</strong> or property-based tests for its own code, catching obvious errors immediately.</p></li><li><p><strong>Locking down dependencies and focusing on security</strong>: For example, instructing the AI to <em>only</em> use approved libraries and run a security scan. As noted, one team actually uses vibe coding in a <em><a href="https://www.vktr.com/ai-technology/vibe-coding-explained-use-cases-risks-and-developer-guidance/">controlled way to intentionally generate insecure code</a>, so they can study it and improve their security scanners</em> &#8211; turning the AI into a pen-testing tool rather than a production coder.</p></li><li><p><strong>Preferring integrated AI tools (in-IDE like Cursor or VS Code Copilot)</strong> over copy-paste from ChatGPT. Integration means the AI suggestions are applied in a context where the developer can see the entire diff and run the code immediately, reducing the chance of inadvertently introducing something you don't notice.</p></li><li><p><strong>Keeping humans in the decision loop</strong>: e.g. an AI agent might propose a code change, but it can't merge it &#8211; a human must approve. This is analogous to how continuous integration (CI) systems run tests and linters, but ultimately a dev checks the PR. The AI is a CI assistant here, not an autorobot deploying to prod on its own.</p></li></ul><p>All these measures are attempts to <strong>mitigate the "vibes" with actual engineering rigor</strong>. They acknowledge that large language models are incredibly useful &#8211; they really can understand intent and produce working code for a huge variety of tasks &#8211; but they function best when <strong>given clear direction and boundaries</strong>. Left unguided, they'll happily drift into the weeds or produce a solution that passes superficial muster but fails in edge cases.</p><p>In essence, spec-driven and agentic methods are about <strong>marrying the best of AI and human strengths</strong>: humans excel at defining problems, understanding context, and making judgment calls; AIs excel at traversing solution spaces quickly, writing boilerplate, and even coordinating tasks when set up to do so. The future of AI-assisted engineering likely lies in this middle ground &#8211; not in pure prompt-and-pray vibe coding, but in <em>augmented workflows</em> where AI amplifies human design and humans reign in AI's excesses.</p><p>The way forward is hybrid: Sandbox phase: vibe freely, test ideas, build prototypes. Production phase: apply engineering discipline - testing, refactoring, design, security. </p><h2>Conclusion</h2><p><strong>In summary, developers and teams can evolve vibe-coded prototypes into robust systems by re-injecting all the traditional software engineering practices that might have been bypassed in the rush of AI generation: design it, test it, review it, own it.</strong> </p><p>Speeding through the first draft is fine &#8211; even commendable &#8211; as long as everyone understands that <em>you then switch out of "vibe mode" and into "engineering mode."</em> High-performing teams will likely develop an intuition for when to employ the AI fast lane and when to merge back onto the steady highway of tested, reviewed code. The end goal is the same as it's always been: deliver software that works, is secure, and can be maintained by the team. </p><p>The tools and methods are evolving, but <strong>accountability, craftsmanship, and collaboration remain paramount in the age of AI-assisted engineering</strong>.</p><p><em>I&#8217;m excited to share I&#8217;m writing a new <a href="https://beyond.addy.ie">AI-assisted engineering book</a> with O&#8217;Reilly. If you&#8217;ve enjoyed my writing here you may be interested in checking it out. I&#8217;ve included a number of free tips on the book site.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wQ9T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wQ9T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!wQ9T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!wQ9T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!wQ9T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wQ9T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png" width="1456" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7981551,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/172063654?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wQ9T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 424w, https://substackcdn.com/image/fetch/$s_!wQ9T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 848w, https://substackcdn.com/image/fetch/$s_!wQ9T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 1272w, https://substackcdn.com/image/fetch/$s_!wQ9T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86161744-11ac-4187-a945-0dc5cc7a6920_5246x3496.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[The reality of AI-Assisted software engineering productivity]]></title><description><![CDATA[What the data really shows about AI coding tools in 2025]]></description><link>https://addyo.substack.com/p/the-reality-of-ai-assisted-software</link><guid isPermaLink="false">https://addyo.substack.com/p/the-reality-of-ai-assisted-software</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Sat, 16 Aug 2025 14:30:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Sp4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>tl;dr: AI functions as a situational force multiplier - providing modest, uneven boosts that augment rather than transform engineering productivity. Individual developers and those working on &#8220;new&#8221; projects see speed boosts with AI tools, but these gains aren't (yet) translating to overall team productivity:</strong></p><ul><li><p>AI excels at <em>greenfield projects</em> but struggles with complex legacy codebases</p></li><li><p>84% of devs use AI tools; only 60% view them favorably, down from 70% in 2023</p></li><li><p>Studies show 20-30% productivity improvements, far from &#8220;10x&#8221; claims</p></li><li><p>Most use <em>basic autocomplete</em> features, not full autonomous coding agents</p></li><li><p>66% cite AI's "almost correct" solutions their biggest time sink due to debugging </p></li></ul><h2>Adoption Soars, trust plummets: the 2025 developer sentiment</h2><p><strong>AI coding assistants have rapidly become part of the developer toolkit &#8211; but confidence in their output has declined.</strong></p><p>According to <a href="https://survey.stackoverflow.co/2025/">Stack Overflow&#8217;s 2025 Developer Survey</a> (49,000+ devs globally), <strong>84% of respondents are using or planning to use AI tools in their development process</strong>, up from 76% a year prior. Over half of professional developers now use AI coding tools <em>daily</em>. This represents a remarkable adoption curve &#8211; AI pair programmers went from novelty to normalcy in under two years. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W11L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W11L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 424w, https://substackcdn.com/image/fetch/$s_!W11L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 848w, https://substackcdn.com/image/fetch/$s_!W11L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!W11L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W11L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png" width="1456" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:132906,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W11L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 424w, https://substackcdn.com/image/fetch/$s_!W11L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 848w, https://substackcdn.com/image/fetch/$s_!W11L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!W11L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9f1223-c45e-4e70-ad3e-16b122854d31_2400x1110.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Developers are primarily leveraging these tools for help with coding problems and tedious tasks: the survey found the top uses of AI were searching for answers (54% of respondents), generating code or synthetic data (36%), learning new concepts (33%), and even writing documentation (30%). In short, AI is touching many parts of the dev workflow.</p><p>Paradoxically, <strong>as usage increased, positive sentiment has</strong> fallen. Stack Overflow reports that favorable views of AI tools dropped from over 70% in 2023 to just ~60% in 2025. In practice,46% of developers say they don&#8217;t trust the accuracy of AI output &#8211; a sharp rise in skepticism from 31% last year. The data suggests many developers have encountered the limitations and flaws of these tools firsthand. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ghtg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ghtg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!Ghtg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!Ghtg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!Ghtg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ghtg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:123181,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ghtg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!Ghtg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!Ghtg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!Ghtg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0141f5a-8fe5-441c-9ad0-03e11f3ad384_2400x1200.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong>number-one frustration</strong>, cited by 66% of devs, is AI solutions that are <em>&#8220;almost right, but not quite,&#8221;</em> which often leads to time-consuming debugging. Another 45% specifically complained that <strong>debugging AI-generated code is more work</strong> than it&#8217;s worth. This sentiment came through loud and clear in the survey and echoes across developer forums: AI helpers often <em>accelerate typing</em> but can inject subtle bugs or nonsense that soak up time.</p><blockquote><p><strong>&#8220;AI solutions that are </strong><em><strong>almost</strong></em><strong> right, but not quite, are now my biggest time sink. The code </strong><em><strong>looks</strong></em><strong> plausible but I end up spending more time fixing those &#8216;helpful&#8217; suggestions.&#8221;</strong> &#8211; <em><a href="https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/#:~:text=%2A%20The%20number,last%20year%20learning%20new%20coding">Survey respondent, cited by Stack Overflow</a></em></p></blockquote><p><strong>Crucially, most developers are not (yet) using AI to fully automate programming or &#8220;agentically&#8221; build entire applications.</strong></p><p>The Stack Overflow survey surveyed &#8220;vibe coding&#8221; &#8211; meaning letting an AI generate whole programs from high-level prompts &#8211; and found that <strong><a href="https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/#:~:text=developers%20say%20agents%20have%20affected,a%20threat%20to%20their%20job">nearly 72% said vibe coding is </a></strong><em><strong><a href="https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/#:~:text=developers%20say%20agents%20have%20affected,a%20threat%20to%20their%20job">not</a></strong></em><strong><a href="https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/#:~:text=developers%20say%20agents%20have%20affected,a%20threat%20to%20their%20job"> part of their professional work, with an additional 5% &#8220;emphatically&#8221; avoiding it</a></strong>.</p><p>In other words, roughly <strong>77% of developers do </strong><em><strong>no</strong></em><strong> whole-app generation</strong> on the job. Most are using AI in a more incremental, assistive capacity (like code completion, example generation, or Q&amp;A), not as autonomous project-builders.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TSCw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TSCw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 424w, https://substackcdn.com/image/fetch/$s_!TSCw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 848w, https://substackcdn.com/image/fetch/$s_!TSCw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!TSCw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TSCw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png" width="1456" height="837" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:837,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131366,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TSCw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 424w, https://substackcdn.com/image/fetch/$s_!TSCw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 848w, https://substackcdn.com/image/fetch/$s_!TSCw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!TSCw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01954c7-fdea-494c-82fc-72035cd87e4d_2400x1380.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> This aligns with the finding that while <strong>52% say AI &#8220;agents&#8221; have affected how they work, the primary benefit cited is <a href="https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/#:~:text=The%20adoption%20of%20AI%20agents,not%20a%20threat%20to%20their">personal productivity boosts</a></strong> (69% saw an increase in their own throughput) &#8211; not fundamental changes to how software is delivered. And despite all the &#8220;AI will replace programmers&#8221; media chatter, <strong>64% of developers do not see AI as a threat to their jobs</strong> (though that&#8217;s down slightly from 68% last year, indicating a bit more unease).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!enyj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!enyj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!enyj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!enyj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!enyj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!enyj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133014,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!enyj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!enyj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!enyj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!enyj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F518a6c80-0963-47ab-ab01-2131481d6918_2400x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In summary, right now <strong>AI-assisted coding is mainstream, but wariness is high</strong>. Developers appreciate the time-savers but have learned to &#8220;<strong>trust, but verify</strong>&#8221; every output. As Stack Overflow&#8217;s report put it, <em>more developers are using AI tools, but their trust in those tools is falling</em>. This cracks in the foundation set the stage: <em>why</em> aren&#8217;t these tools living up to the wild productivity promises?</p><h2>Hype vs reality: Why &#8220;10&#215; Engineers&#8221; remain unicorns</h2><p><strong>Amid the exuberance, many experienced engineers have pushed back on the notion that AI is making devs &#8220;10&#215; more productive&#8221; overnight.</strong></p><p>A notable example is Colton Voege&#8217;s essay, <em>&#8220;<a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/">No, AI is Not Making Engineers 10&#215; as Productive &#8211; Curing Your AI &#8216;10&#215; Engineer&#8217; Imposter Syndrome.</a>&#8221;</em> Voege addresses the anxiety some developers feel seeing social media posts claiming that <em>&#8220;real engineers&#8221;</em> are now using LLMs to churn out 10&#8211;100&#215; more output by spinning up numerous agent instances in parallel. He admits even he momentarily wondered if he was being left behind.</p><p>But after deep experimentation with various AI coding approaches, his conclusion was that the <strong>10&#215; claims don&#8217;t withstand scrutiny</strong>:</p><blockquote><p><strong>&#8220;I wouldn&#8217;t be surprised to learn AI helps many engineers do certain tasks 20&#8211;50% faster, but the nature of software bottlenecks means this </strong><em><strong>doesn&#8217;t translate</strong></em><strong> to a 20% productivity increase &#8211; and certainly not a 10&#215; increase.&#8221;</strong></p></blockquote><p>In other words, <strong>AI can speed up coding</strong> tasks, but overall engineering <strong>outcomes</strong> (features delivered, systems deployed) are constrained by many other factors. <strong>Writing code is often not the slowest part of software development</strong>; tasks like designing architecture, clarifying requirements, code reviewing, testing, fixing bugs, and coordinating with teammates don&#8217;t magically compress just because you can generate a function faster.</p><p>Voege walks through a simple reality check: <em>&#8220;10&#215; productivity means what you used to ship in a quarter you now ship in a week and a half&#8221;</em>. That would require every step &#8211; product planning, code reviews, QA, deployments &#8211; to happen 10&#215; faster, which is implausible in any real-world team. As he dryly notes, <em>&#8220;You can&#8217;t compress the back-and-forth of 3 months of code review into 1.5 weeks&#8230; This simply cannot be done.&#8221;</em></p><p>The human processes around coding have not accelerated at anywhere near the rate that AI can spit out code. Pull requests still need careful review (often more so if AI wrote the code), test suites still must run, and users still have evolving needs. A senior engineer on Hacker News echoed this, saying <strong><a href="https://simonwillison.net/2025/Aug/6/not-10x/#:~:text=translate%20to%20a%2020,certainly%20not%20a%2010x%20increase">&#8220;all the other stuff involved in building software makes the 10&#215; thing unrealistic in most cases.&#8221;</a></strong></p><p>Importantly, others point out that the loudest &#8220;AI makes us 10&#215; faster&#8221; claims tend to come from <em><a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=If%20you%20are%20running%20an,and%20your%20boss%20asks%20you">biased sources</a></em> &#8211; <strong>tech CEOs, investors, or consultants</strong> &#8211; rather than rank-and-file developers in the trenches. There are <a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=If%20you%20are%20running%20an,and%20your%20boss%20asks%20you">strong </a><strong><a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=If%20you%20are%20running%20an,and%20your%20boss%20asks%20you">incentives</a></strong> for startup founders to over-hype productivity (to attract funding) and for bosses to suggest huge gains (to pressure employees or justify AI investments). Thus a kind of echo chamber can form, detached from ground truth. Meanwhile, <strong>front-line engineers&#8217; actual experiences are more &#8220;varied and much more muted in their praise&#8221;</strong> &#8211; they see AI as a useful autocomplete and sometimes a &#8220;magic&#8221; assistant, but also one that <strong>often needs you to take the wheelback</strong> when it veers off course.</p><p>To be clear, <strong>developers are seeing meaningful boosts from AI &#8211; just not an order of magnitude. Voege concedes that &#8220;AI helps with boilerplate&#8221; and routine coding, estimating perhaps a 20&#8211;50% speed-up on certain sub-tasks for many engineers. Likewise, Simon Willison, a well-known developer and AI blogger, says he is &#8220;a huge proponent of AI-assisted development&#8221; and finds that <a href="https://simonwillison.net/2025/Aug/6/not-10x/#:~:text=I%27m%20a%20pretty%20huge%20proponent,do%20as%20a%20software%20engineer">LLMs make him 2&#8211;5&#215; more productive for the coding portions</a></strong> of his work. </p><p>But he immediately qualifies that <strong>coding is only a fraction of his job</strong>, so the <strong><a href="https://simonwillison.net/2025/Aug/6/not-10x/#:~:text=those%2010x%20claims%20convincing,do%20as%20a%20software%20engineer">overall productivity gain is much smaller</a></strong>. This sentiment is common: using an AI code editor or vibe-coding tool can often help crank out a unit test file or convert some data format in seconds, which is awesome, but it might shave only an hour off a week-long project that is bottlenecked by design discussions, integration testing, and production debugging.</p><p>So, <strong>where does this leave us?</strong> The hype has been tempered by reality: <strong>AI coding tools are best viewed as </strong><em><strong>assistants that save you keystrokes and sometimes ideas, not silver bullets that remove engineering toil altogether</strong></em><strong>.</strong> </p><p>The <em>true</em> value may lie in <a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=Let%27s%20start%20by%20looking%20at,seen%20have%2010x%20productivity%20gains">preventing wasted effort</a> (by quickly retrieving solutions or generating scaffolding) rather than in simply cranking out more questionable code faster.</p><h2>What the data Says: Mixed results from studies and surveys</h2><p><strong>Beyond anecdotes and surveys, 2024 and 2025 produced some rigorous studies on AI&#8217;s impact on developer productivity</strong>. These range from controlled experiments to large-scale data analyses. However, keep in mind <strong>model quality</strong> and <strong>when</strong> the studies were conducted. Let&#8217;s break down the key findings:</p><h3>Controlled trials: modest speed-ups in Enterprise settings</h3><ul><li><p><strong>Google&#8217;s Internal RCT (2024)</strong> &#8211; Google conducted a randomized controlled trial on ~100 of its own software engineers to measure AI&#8217;s impact using <em>multiple</em> in-house AI coding tools (code completion, smart paste, and a natural language-to-code assistant). The task was a realistic, <a href="https://linearb.io/blog/gen-AI-research-software-development-productivity-at-google#:~:text=option%20to%20have%20the%20AI,assistant%20make%20recommendations">&#8220;enterprise-grade&#8221; coding assignment</a> integrating with Google&#8217;s build and test systems (adding a new logging feature across 10 files, ~474 LOC). The result: developers <strong>using AI completed the task ~21% faster</strong> on average than those without AI. The AI group finished in ~96 minutes vs 114 minutes for the control group. So, about a one-fifth time savings. Notably, this was less dramatic than some earlier studies in simpler scenarios &#8211; a point we&#8217;ll revisit. Google&#8217;s study also found that, somewhat surprisingly, <strong>the senior developers saw </strong><em><strong>slightly larger</strong></em><strong> gains than junior devs</strong> in this experiment. They speculate that <a href="https://linearb.io/blog/gen-AI-research-software-development-productivity-at-google#:~:text=One%20surprising%20result%20from%20Google%E2%80%99s,for%20their%20lack%20of%20experience">seniors leveraged the AI more effectively</a> on complex codebase tasks, whereas juniors might have been overwhelmed or not known how to best use it. However, the sample of seniors was small, so that could be noise. The key takeaway is that even in a high-context enterprise environment, AI tools provided a <strong><a href="https://linearb.io/blog/gen-AI-research-software-development-productivity-at-google#:~:text=Key%20Findings%3A%2021,But%20Context%20Matters">measurable but moderate productivity boost (~20%)</a></strong>, not an earth-shattering one. The researchers also emphasized that code quality was not evaluated &#8211; so faster doesn&#8217;t necessarily mean <em>better</em> code, just that tests passed more quickly.</p></li></ul><ul><li><p><strong>Multi-Company Industry RCT (2024)</strong> &#8211; Another large study (published via SSRN) <a href="https://medium.com/@sahin.samia/can-ai-really-boost-developer-productivity-new-study-reveals-a-26-increase-1f34e70b5341#:~:text=,into%2010%20hours%20of%20output">spanned three organizations</a> &#8211; Microsoft, Accenture, and a Fortune 100 enterprise &#8211; and nearly 5,000 developers, measuring the effect of GitHub Copilot in real work settings. It found an average <strong><a href="https://medium.com/@sahin.samia/can-ai-really-boost-developer-productivity-new-study-reveals-a-26-increase-1f34e70b5341#:~:text=,into%2010%20hours%20of%20output">26% increase in productivity</a></strong> for developers with Copilot access. In practical terms, the authors frame it as <em><a href="https://medium.com/@sahin.samia/can-ai-really-boost-developer-productivity-new-study-reveals-a-26-increase-1f34e70b5341#:~:text=three%20major%20companies%20%E2%80%94%20Microsoft%2C,into%2010%20hours%20of%20output">&#8220;turning an 8-hour workday into 10 hours of output&#8221;</a></em>. This was determined by metrics like tasks completed (pull requests merged), code written, and build success rates. Importantly, they reported <strong>no drop in code quality or increase in errors</strong> &#8211; Copilot users actually had slightly <em>higher</em> successful build rates, implying the AI suggestions often prevented certain mistakes. However, the benefits were <strong>not evenly distributed</strong>: <em>&#8220;newer, less experienced developers reaped the most benefits,&#8221;</em> seeing as high as a <strong><a href="https://medium.com/@sahin.samia/can-ai-really-boost-developer-productivity-new-study-reveals-a-26-increase-1f34e70b5341#:~:text=The%20study%20also%20highlighted%20a,enhanced%20productivity%20across%20the%20board">35&#8211;39% speed-up</a></strong>, whereas seasoned developers saw smaller (8&#8211;16%) improvements. Essentially, Copilot acted like an &#8220;always-available mentor&#8221; for juniors &#8211; helping them write code they might otherwise struggle with &#8211; while senior devs used it more selectively for boilerplate and got modest gains. This contrast with Google&#8217;s finding about seniors could be due to different tasks or simply that in the wild, junior devs lean on AI more heavily. Regardless, the <strong>26% average boost</strong> from this study is often cited as evidence that AI can significantly accelerate coding <em>when integrated well into team workflows</em>. (Caveat: The study was careful to control for many factors over months of usage, but one can imagine enthusiastic participants might also work differently knowing they&#8217;re in a trial.)</p></li></ul><ul><li><p><strong>Upwork Freelancer experiment (2023)</strong> &#8211; A well-known earlier experiment by Peng et al. (2023) hired 95 freelance programmers on Upwork to build a web server, and found the group with access to Copilot completed the task <strong><a href="https://www.researchgate.net/publication/368473822_The_Impact_of_AI_on_Developer_Productivity_Evidence_from_GitHub_Copilot#:~:text=,">55% faster</a></strong> than the <a href="https://www.researchgate.net/publication/368473822_The_Impact_of_AI_on_Developer_Productivity_Evidence_from_GitHub_Copilot#:~:text=significant%20productivity%20effects%20of%20generative,">control group</a>. Another referenced study even reported <a href="https://www.researchgate.net/publication/368473822_The_Impact_of_AI_on_Developer_Productivity_Evidence_from_GitHub_Copilot#:~:text=,desarrolladores%20juniors%20muestran%20los%20mayores">&#8220;2&#215; faster&#8221; completion</a> with Copilot for certain tasks. These results, while impressive, were on relatively contained tasks and often with less experienced coders. They represent something like a best-case scenario (single focused task, no legacy code, motivated participants). In real teams, you might not see such big jumps &#8211; which is exactly what the Google and multi-company trials, with more context and longer duration, confirmed (around 20&#8211;30% gains, not 50%).</p></li></ul><p>On the whole, <strong>controlled experiments suggest AI</strong> can <strong>provide a notable productivity uplift (roughly 20&#8211;30% faster coding) in both enterprise and broad industry settings</strong> &#8211; <em>when properly used</em>. This is consistent with what I&#8217;ve observed at Google.</p><p>That&#8217;s nothing to scoff at: a quarter more output is significant at scale. But it&#8217;s a far cry from 10&#215;, and the effect depends on context. The multi-company study&#8217;s authors explicitly note that <strong>newcomers benefit more</strong> (which makes intuitive sense &#8211; AI can help fill knowledge gaps), whereas <strong>seasoned devs adopt it more slowly and use it for narrower cases</strong>. </p><p>Perhaps the <strong>most striking controlled study</strong> came from a different angle &#8211; not showing a boost, but a <em>slowdown</em>:</p><h3>A reality check: When AI <strong>slows</strong> experienced devs</h3><p>In July 2025, a group of researchers (Becker et al. via METR) published results of a <strong><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=We%20conduct%20a%20randomized%20controlled,from%20AI%20R%26D%20automation%201">randomized controlled trial on 16 experienced open-source developers</a></strong> working on <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=To%20directly%20measure%20the%20real,Developers%20complete">their own large OSS projects</a>. These were devs with years of experience on repos &gt;1M lines, solving real issues from their bug trackers. </p><p>The twist: tasks (~2 hours each) were randomly assigned to &#8220;AI-allowed&#8221; or &#8220;AI-disallowed&#8221; conditions for each developer, who would either use <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=To%20directly%20measure%20the%20real,Developers%20complete">state-of-the-art AI tools</a> (they mostly used Cursor Pro with Claude 3.5/3.7) or <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=allowed%2C%20developers%20can%20use%20any,their%20participation%20in%20the%20study">work completely solo on each issue</a>. The outcome was surprising: <strong>when using AI, these seasoned devs took</strong> <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=We%20conduct%20a%20randomized%20controlled,from%20AI%20R%26D%20automation%201">19% longer</a> <strong>on average to complete the tasks</strong>. In other words, <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=Core%20Result">AI made them </a><em><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=Core%20Result">slower</a></em>. The authors dubbed it a snapshot of &#8220;early-2025 AI capabilities&#8221; in a realistic setting &#8211; and it wasn&#8217;t very flattering.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q01Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q01Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 424w, https://substackcdn.com/image/fetch/$s_!q01Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 848w, https://substackcdn.com/image/fetch/$s_!q01Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 1272w, https://substackcdn.com/image/fetch/$s_!q01Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q01Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png" width="1456" height="875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:875,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:276148,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q01Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 424w, https://substackcdn.com/image/fetch/$s_!q01Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 848w, https://substackcdn.com/image/fetch/$s_!q01Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 1272w, https://substackcdn.com/image/fetch/$s_!q01Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0172195b-38ef-400f-8446-3ac197270e08_2562x1540.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Even more intriguing was the <strong>perception gap</strong>: the developers <em>expected</em> AI would speed them up by ~24%, and even after the experiment, they <em>believed</em> they had been faster by ~20% when using AI. In reality, the screen recordings and time logs told a different story &#8211; a significant slowdown. This finding generated a lot of discussion. How could skilled devs be slower <em>with</em> an advanced code assistant?</p><p>The paper offers some explanations. They analyzed 20 factors and <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=Factor%20Analysis">identified a few likely causes for the slowdown</a>.</p><ul><li><p><strong>Overhead of integrating AI suggestions:</strong> Developers spent extra time verifying, debugging, and adjusting the AI&#8217;s output. The AI might produce tangential or incorrect code that then needed human correction. Essentially, <em>&#8220;hallucinations&#8221; and missteps introduced extra cycles</em>.</p></li></ul><ul><li><p><strong>Familiarity and inefficiency in use:</strong> These devs were relatively new to the specific AI tools. Only one participant had &gt;50 hours experience with Cursor; notably, that <a href="https://news.ycombinator.com/item?id=44739060#:~:text=EnnEmmEss%20%20%2025%20,next%20%5B%E2%80%93">one experienced user </a><em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=EnnEmmEss%20%20%2025%20,next%20%5B%E2%80%93">did</a></em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=EnnEmmEss%20%20%2025%20,next%20%5B%E2%80%93"> see a positive speedup</a>, suggesting a <a href="https://news.ycombinator.com/item?id=44739060#:~:text=mentioned%20in%20the%20research%20paper,is">learning curve effect</a>. Others may have used the AI sub-optimally or gotten stuck following it down wrong paths.</p></li></ul><ul><li><p><strong>Task complexity and context:</strong> The issues required deep understanding of a large codebase. AI can struggle with such context unless carefully guided. The devs possibly had to rewrite or heavily edit AI&#8217;s code to fit project conventions, eating up time.</p></li></ul><ul><li><p><strong>Cognitive interruptions:</strong> Switching between one&#8217;s own thought process and the AI&#8217;s suggestions can incur &#8220;context switching&#8221; overhead. If the AI outputs something that&#8217;s partially useful but needs fixes, the dev must reconcile it, which can be slower than writing a correct solution directly (especially for experts who <em>know</em> the codebase).</p></li></ul><ul><li><p><strong>False sense of security:</strong> Developers might accept AI output too readily and then debug for longer when it&#8217;s wrong, rather than writing a simpler correct solution. The study noted that even after experiencing the slowdown, devs still <em><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=Core%20Result">felt</a></em><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=Core%20Result"> like AI helped</a> &#8211; a kind of cognitive bias because the AI made things <em>feel</em> easier even if it wasn&#8217;t actually faster.</p></li></ul><p>The METR authors are careful to say this doesn&#8217;t prove &#8220;AI never speeds up devs&#8221; &#8211; just that <strong><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=Given%20both%20the%20importance%20of,evidence%20for%20in%20Table%202">in this particular realistic scenario, current tools didn&#8217;t help</a></strong>. They acknowledge AI is evolving fast and that <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=We%20do%20not%20provide%20evidence,the%20past%20five%20years%203">better prompting or more experienced users might achieve positive results</a>. But the study is a valuable counterpoint to optimistic lab results. It underscores that <strong>the effectiveness of AI assistance varies wildly with context</strong>. Give it a newbie doing a well-defined task and it shines; give it a veteran in a messy codebase and it might slow them down.</p><p>On Hacker News, this report sparked debate. Some engineers remarked that it <em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=,has%20some%20connection%20to%20reality">matched their intuition</a></em>: <em>&#8220;LLM-based coding tools seem to actually hurt programmers&#8217; productivity [in complex scenarios]. &#8216;Hallucinations&#8217; aren&#8217;t going away&#8230;they just sometimes happen to generate something usable&#8221;</em>. Others pushed back, sharing their personal wins with Copilot or agents for certain languages or simpler tasks.</p><blockquote><p>One commenter <a href="https://news.ycombinator.com/item?id=44739060#:~:text=,Pro%20user%20since%202021%2C%20still">wrote</a>: <em>&#8220;Whether productivity is tanking or not, I will find it incredibly hard to stop using LLMs&#8230; I must note though, it might be too soon to put a mark on productivity &#8211; it&#8217;s a function of how well new technologies are integrated into processes, which happens over years, not months.&#8221;</em>.</p></blockquote><p>Another pointed out that <strong>familiarity matters</strong>: <em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=EnnEmmEss%20%20%2025%20,next%20%5B%E2%80%93">&#8220;Our one dev with &gt;50h of Cursor experience saw a speedup &#8211; so maybe there&#8217;s a high skill ceiling to using these tools effectively&#8221;</a></em>. In essence, early adopters believe things will improve as we learn to co-work with AI, but at least in early 2025, <strong>the &#8220;AI Productivity Boom&#8221; hasn&#8217;t universally materialized</strong>.</p><h3>The AI productivity paradox: More Code &#8800; More Productivity</h3><p>Perhaps the most comprehensive look at AI&#8217;s impact on engineering came from the <strong>2025 DORA/Faros &#8220;AI Productivity Paradox&#8221; report</strong>. This research analyzed telemetry from over <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=Drawing%20on%20telemetry%20from%20over,recent%20landmark%20research%20report%20confirms">10,000 developers across 1,255 teams</a></strong> (using data from source control, task trackers, CI pipelines, etc.) to see how high AI adoption correlates with team and organizational performance. The findings reveal a fundamental mismatch between individual output and organizational outcomes:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9XdI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9XdI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 424w, https://substackcdn.com/image/fetch/$s_!9XdI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 848w, https://substackcdn.com/image/fetch/$s_!9XdI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 1272w, https://substackcdn.com/image/fetch/$s_!9XdI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9XdI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png" width="999" height="651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:651,&quot;width&quot;:999,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48708,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9XdI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 424w, https://substackcdn.com/image/fetch/$s_!9XdI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 848w, https://substackcdn.com/image/fetch/$s_!9XdI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 1272w, https://substackcdn.com/image/fetch/$s_!9XdI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae39f7e-afa5-44b8-ae18-7c4e2fcdb83b_999x651.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><ul><li><p>Teams with heavy AI tool use <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,queues%20balloon">completed 21% more tasks</a></strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,queues%20balloon"> and </a><strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,queues%20balloon">merged 98% more pull requests</a></strong> &#8211; confirming that AI users tend to crank out more code and work items. However, their <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=Developers%20on%20teams%20with%20high,a%20critical%20bottleneck%3A%20human%20approval">PR review times ballooned by 91%</a></strong>, creating a new bottleneck at the human approval stage. Essentially, <em>AI let devs throw code over the wall faster, but the walls (code review, QA) then piled up higher</em>. The report invokes <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=bottleneck%3A%20human%20approval">Amdahl&#8217;s Law</a>: the slowest part of the pipeline dictates overall speed. Without speeding up code review and deployment processes, <strong>the extra code just queues up waiting for humans</strong>.</p></li></ul><ul><li><p>AI-enabled developers indeed <strong>parallelize more</strong>: high AI teams saw devs touching <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,day">9% more distinct tasks per day</a> and <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=Developers%20on%20teams%20with%20high,more%20pull%20requests%20per%20day">47% more PRs per day</a>. This indicates more context-switching and multi-threaded work. The report suggests a <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=AI%20is%20shifting%20that%20benchmark%2C,generated%20contributions%20across%20multiple%20workstreams">new &#8220;operating model&#8221; is emerging</a> where devs orchestrate multiple AI-assisted threads of work rather than focusing on one at a time. This isn&#8217;t entirely negative &#8211; it might mean devs can juggle more things (review one AI-suggested PR while another runs tests, etc.) &#8211; but it challenges the conventional wisdom that context-switching is always bad. In an AI world, <em>some</em> increased context-switching might be normal as devs oversee multiple semi-autonomous efforts. Still, it can also be mentally taxing.</p></li></ul><ul><li><p><strong>Code quantity vs. quality:</strong> The Faros data found code <em>structure</em> and hygiene might improve with AI (they observed slightly fewer code smells and higher test coverage in some cases), <strong>but bug rates actually increased</strong>. Specifically, AI adoption was associated with a <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,quality%20worsens">9% increase in bugs per developer</a></strong> and astonishingly a <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=While%20we%20observe%20a%20modest,increase%20in%20average%20PR%20size">+154% increase in average PR size</a></strong>! So PRs got <em>much larger</em> when AI was involved, likely because AI can generate big chunks quickly. Larger PRs are harder to review and more bug-prone. Indeed, more bugs slipped through. The tooling may encourage a spray of code that isn&#8217;t fully digested by the author, putting more burden on downstream QA.</p></li></ul><ul><li><p><strong>No overall acceleration at the org level:</strong> When looking at big-picture metrics (like <em>DORA&#8217;s four key DevOps metrics</em> of deployment frequency, lead time, change fail rate, and MTTR, as well as overall throughput), the analysis found <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,from%20AI">no significant correlation between AI adoption and better outcomes at the company level</a></strong>. In other words, companies with lots of AI usage didn&#8217;t ship faster or more reliably than those without, once you aggregate the data. The individual team boosts were getting absorbed by cross-team dependencies and bottlenecks.</p></li></ul><p>This disconnect is what Faros calls the <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=AI%20Is%20Everywhere">&#8220;AI Productivity Paradox&#8221;</a></strong>: <em>AI is everywhere, yet impact isn&#8217;t</em>. By 2025, <strong>75% of engineers use AI tools, yet most orgs see no measurable performance gains</strong> in delivery. The report offers insightful reasons why these gains haven&#8217;t materialized, crystallized into <strong>four patterns</strong>:</p><ol><li><p><strong>Adoption is very recent:</strong> Widespread usage (60%+ of devs using weekly) <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=Even%20with%20rising%20usage%2C%20we,often%20fail%20to%20scale%2C%20namely">only took off in the last 2&#8211;3 quarters</a> in most companies. The tooling and practices are immature; teams are basically &#8220;beta-testing&#8221; AI in real time. There hasn&#8217;t been enough time to re-engineer processes around it.</p></li></ol><ol start="2"><li><p><strong>Usage is uneven across teams:</strong> Even if overall company adoption is high, it <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=1,Usage%20is%20highest%20among">varies team by team</a>. Some teams may be &#8220;AI super-users&#8221; cranking out code, while others are more traditional. Since software delivery is often cross-team, one fast team won&#8217;t dramatically speed up a whole project if the adjacent teams or downstream reviewers aren&#8217;t equally augmented. It&#8217;s the <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=developing,In">&#8220;weakest link&#8221; effect</a> again.</p></li></ol><ol start="3"><li><p><strong>Adoption skews toward newer engineers:</strong> The data showed <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=organizational%20level,the%20dataset%2C%20most%20developers%20use">new hires and less-tenured engineers use AI the most</a></strong>, whereas many <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=engineers%20who%20are%20newer%20to,system%20knowledge%20and%20organizational%20context">senior engineers and veterans are using it less or not at all</a>. (Not to confuse <em>tenure</em> with age or skill &#8211; this specifically means new to the company, often they lean on AI to navigate unfamiliar codebases). Seniors may be more skeptical of AI&#8217;s help on complex tasks, or simply creatures of habit. The implication is that <strong>the people designing systems and making big architectural decisions (often senior staff) are using AI least</strong>, while juniors cranking out code use it most. So the <em>type</em> of work AI is doing is likely more on the periphery (small feature PRs, minor fixes) than at the core architectural level. This limits the impact on major outcomes, at least for now.</p></li></ol><ol start="4"><li><p><strong>Usage remains shallow (autocomplete-overdrive):</strong> Most developers are using only the <strong>most basic AI capabilities &#8211; primarily <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=ability%20to%20support%20more%20complex,task%20execution%20remain%20largely%20untapped">code autocomplete</a></strong> in the IDE. Advanced uses like integrated AI chat for troubleshooting, AI-assisted code review, or autonomous agents creating MR requests are <em>rare</em>. The report explicitly notes <em><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=ability%20to%20support%20more%20complex,task%20execution%20remain%20largely%20untapped">&#8220;advanced capabilities&#8230; remain largely untapped&#8221;</a></em>. So, despite all the talk of &#8220;agentic AI&#8221; that can file pull requests or automatically fix bugs, the reality is that here in 2025 the typical dev just has a smarter autocomplete that occasionally writes a function for them. That&#8217;s useful, but it&#8217;s incremental. The full transformative potential of AI (if it exists) isn&#8217;t being realized because the tooling and adoption of those capabilities are nascent.</p></li></ol><p>The Faros report suggests that <strong>to get real value, organizations need to deliberately adapt</strong>: invest in training developers on effective AI use, update code review practices (maybe even use AI to help review the AI-generated code), improve test automation to catch the extra bugs, and foster knowledge sharing of successful AI workflows. </p><p>A handful of <em>&#8220;rare companies&#8221;</em> were seeing tangible performance gains, and they were the ones that treated AI not as a plug-and-play gadget, but as a strategic initiative with <strong>&#8220;five enablers &#8211; workflow design, governance, infrastructure, training, and cross-functional alignment&#8221;</strong> backing it. In plain terms: If you don&#8217;t change your development process and upskill people, throwing AI in the mix might just create faster chaos, not faster delivery.</p><h2>Where AI helps the most (and least)</h2><p>The utility of AI coding assistance can vary dramatically depending on the scenario. Let&#8217;s break down <strong>use-cases where developers are seeing clear gains</strong> versus areas where AI still struggles or even hinders:</p><ul><li><p><strong>&#9989; Greenfield projects &amp; prototyping:</strong> When starting a new app or feature from scratch (with little existing code or legacy constraints), AI can be a turbocharger. Developers often report that <em>&#8220;vibe coding&#8221;</em> &#8211; letting the AI generate a substantial initial codebase or component &#8211; works best in greenfield situations or throwaway prototypes. The AI is less likely to conflict with established patterns because there aren&#8217;t any yet, and the cost of mistakes is lower. One engineer described the first time using AI on a new project like <em>&#8220;sipping rocket fuel&#8221;</em> &#8211; you get a burst of speed early on. Boilerplate for common frameworks (spinning up a React frontend or a basic Express server) is done in seconds. In hackathons or early-stage startup projects, some developers essentially use AI as an extra pair of hands to churn out a minimum viable product quickly. <strong>The benefit here</strong> is psychological as much as practical: AI-generated code can help you <em>iterate faster</em>, since you can quickly scaffold something, run it, and then refine. As Voege noted, it&#8217;s <a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=And%20it%20was,leading%20to%20significant%20security%20vulnerabilities">good at the generic stuff</a> (especially in well-trodden domains like JavaScript/React). So you feel super-productive initially. <em>However</em>, even in greenfield work, AI might not architect the solution optimally &#8211; it&#8217;s great for stubs and examples, but a human still needs to guide the overall design.</p></li></ul><ul><li><p><strong>&#9989; Boilerplate and repetitive code:</strong> Perhaps the most agreed-upon strength: AI excels at writing the boring bits. This includes things like: unit tests that follow a pattern, boilerplate CRUD methods, converting one data structure to another, writing serialization/deserialization code, glue code between APIs, etc. If you have examples to imitate, AI will mimic them. David Cramer (engineering leader at Sentry) gave a practical tip &#8211; if you have to write new tests similar to existing ones, <strong><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=Ignore%20the%20claims%20about%20vibe,feed%20it%20the%20Issue%20URL">generate them with AI</a></strong> to save time, then <em><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=to%20write%20code,feed%20it%20the%20Issue%20URL">&#8220;dive in and change what you need to&#8221;</a></em>. This speeds up the rote parts of coding. Similarly, routine functions (parsers, format converters, simple algorithms) can be knocked out quickly by prompting the AI. The key is that <em>you</em> as the developer know exactly what needs to be done and roughly how &#8211; you just let the AI fill in the syntax and edge cases. Many devs report significant time saved not having to search Stack Overflow for that one-off regex or not having to manually write dull boilerplate. It&#8217;s like having an encyclopedia and snippet library on tap.</p></li></ul><ul><li><p><strong>&#9989; Documentation and learning:</strong> An underrated use of AI in development is as a learning and explanation tool. Over <a href="https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/#:~:text=still%20ask%20another%20person%20for,invest%20time%20in%20AI%20programming">44% of devs learning a new language or tech in the past year used AI help to do so</a>. Tools like Cursor can explain code, translate one language to another, or answer &#8220;how do I do X in framework Y&#8221; much faster than combing documentation. When encountering a new API, developers often paste an error message or function signature into the AI chat and get a quick explanation or sample usage. This accelerates the &#8220;research&#8221; phase of coding. Rather than scouring Google and skimming docs for 30 minutes, an AI might give you the gist in 30 seconds (sometimes even with a runnable example). This isn&#8217;t direct &#8220;code productivity&#8221; but it reduces time spent stuck or reading manuals. The Stack Overflow survey indicates <a href="https://www.admin-magazine.com/News/Stack-Overflow-Survey-66-of-Developers-Frustrated-by-AI-Inaccuracy#:~:text=Overall%2C%20developers%20say%20they%E2%80%99re%20using,AI%20for%20tasks%20such%20as">searching for answers</a> is the #1 use of AI tools by devs &#8211; essentially AI as a smart assistant for Q&amp;A. That said, <em>distrust</em> of AI answers is high (for good reason &#8211; they can sound confident but be wrong), so developers are double-checking anything important. But as a supplement to official docs, AI chat can be a great tutor or rubber duck.</p></li></ul><ul><li><p><strong>&#9989; Onboarding to codebases:</strong> New hires or contributors unfamiliar with a large codebase have found AI assistants helpful in navigating and understanding the code. Because AI models can ingest a lot of context you can do things like: ask &#8220;what does this module do?&#8221;, &#8220;summarize how data flows from class A to B&#8221;, or &#8220;where in the code is the logic for X handled?&#8221; and get pointed in the right direction. The Faros report noted that <strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=organizational%20level,the%20dataset%2C%20most%20developers%20use">newer engineers lean on AI to navigate unfamiliar code and accelerate early contributions</a></strong>. This suggests a good use-case: easing the steep learning curve of complex systems. Instead of constantly pestering senior team members with questions, a junior dev can ask the AI and often get useful answers (again, with caution about accuracy). Even something like, &#8220;generate a quick example usage of internal library Z from our repo&#8221; can give a template to work from. AI won&#8217;t have the true architectural understanding a senior dev has, but it can index the codebase and surface relevant bits quickly. In essence, it&#8217;s like an interactive documentation/search tool for your own code. This can save time and help newer team members become productive faster &#8211; a legitimate productivity gain at the team level if it shortens onboarding time.</p></li></ul><ul><li><p><strong>&#9989; &#8220;Hands-on&#8221; debugging aids:</strong> We are seeing developers use AI as a debugging assistant in creative ways. For instance, some will paste a stack trace or error log and ask the AI for likely causes or solutions. Others have started to integrate AI into their monitoring &#8211; e.g., feeding an issue or bug description to an internal AI agent (like <a href="https://cra.mr/built-with-borrowed-hands/#:~:text=favorite%20IDE,feed%20it%20the%20Issue%20URL">Sentry&#8217;s MCP tool that David Cramer mentioned</a>) to get analysis on what might be wrong. One HN user described a workflow: <em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=Any%20research%20will%20be%20limited,what%20the%20researchers%20control%20for">&#8220;when I get a well-written bug report or detailed logs, my instinct is to feed it to an agent and let it figure it out in the background while I work on other things&#8221;</a></em>. They claimed this parallel approach often surfaces insights or even fixes. This hints that AI&#8217;s value isn&#8217;t only in writing new code &#8211; it can also help understand existing broken code. These uses can trim down debugging time, which is historically a huge part of development. However, this area is still emerging; AI can misdiagnose issues too, so it&#8217;s another tool in the toolbox rather than a magic debugger.</p></li></ul><p>On the flip side, <strong>situations where AI assistance struggles or backfires:</strong></p><ul><li><p><strong>&#10060; Large, complex legacy codebases (brownfield):</strong> In mature enterprise codebases with lots of domain-specific context, custom patterns, and interdependent components, AI often flounders. Developers note that an AI might write code that doesn&#8217;t fit the existing architecture or misses subtle requirements, causing integration headaches. Colton Voege pointed out that <strong><a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=And%20it%20was,leading%20to%20significant%20security%20vulnerabilities">AI &#8220;is not good at keeping up with the standards and utilities of </a></strong><em><strong><a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=And%20it%20was,leading%20to%20significant%20security%20vulnerabilities">your</a></strong></em><strong><a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=And%20it%20was,leading%20to%20significant%20security%20vulnerabilities"> codebase&#8221;</a></strong> and tends to <a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=standards%20and%20utilities%20of%20your,leading%20to%20significant%20security%20vulnerabilities">fail if you use non-mainstream libraries</a>. It might call APIs that <em>almost</em> do what&#8217;s needed but not quite, or use outdated approaches. In such environments, integrating an AI-generated piece can take as long as writing it manually because you must rework it to match the codebase&#8217;s idioms. David Cramer&#8217;s experiment at Sentry, where he tried to rely <strong>100% on agents to build a real service</strong>, ended up confirming this: <em><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=The%20example%20I%20used%20above,importantly%2C%20they%20don%E2%80%99t%20replace%20engineering">&#8220;you cannot use these agents to build software today&#8230; they don&#8217;t replace hands-on-keyboards. Most importantly, they don&#8217;t replace engineering.&#8221;</a></em> He found that for non-trivial new features in a complex system, the agent kept producing <strong><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=The%20problem%20you%20see%20next,is%20an%20existing%20code%20base">&#8220;absolutely unmaintainable&#8221;</a></strong><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=The%20problem%20you%20see%20next,is%20an%20existing%20code%20base"> code</a> or got stuck, and he eventually had to <a href="https://cra.mr/built-with-borrowed-hands/#:~:text=The%20example%20I%20used%20above,importantly%2C%20they%20don%E2%80%99t%20replace%20engineering">&#8220;hit eject&#8221; and do it the traditional way</a>. The AI could generate lots of code, but it wasn&#8217;t the <em>right</em> code. <strong><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=The%20problem%20you%20see%20next,is%20an%20existing%20code%20base">Duplicate code, unused code, and incorrect abstractions</a></strong> were common when the agent tried to <a href="https://cra.mr/built-with-borrowed-hands/#:~:text=shot%20it%20,is%20an%20existing%20code%20base">extend a complex project</a>. This highlights that in brownfield development, <strong>deep understanding of the existing system is critical, and AI doesn&#8217;t truly understand &#8211; it guesses based on patterns</strong>. Until we have AI that can deeply ingest and reason about millions of lines of bespoke code (and maybe have the product context), human engineers will still be needed to ensure coherence and correctness in large systems. Thus, the productivity boost in brownfield scenarios is much smaller &#8211; some engineers estimate only a 10&#8211;30% speed-up at best in these cases, and sometimes a slowdown if the AI suggestions lead you astray.</p></li></ul><ul><li><p><strong>&#10060; &#8220;Agentic&#8221; autonomous coding:</strong> 2025 saw a lot of hype around coding agents &#8211; AI that can iterate on its own, e.g. writing code, running tests, reading the results, and refining. In theory, you could tell an agent &#8220;build me X feature&#8221; and it will write code, compile, test, fix bugs, and so on with minimal intervention. In practice, as Armin Ronacher documented in <em>&#8220;Agentic Coding Things That Didn&#8217;t Work,&#8221;</em> these workflows are <strong><a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=But%20oddly%20enough%2C%20very%20little,from%20these%20failures%20for%20others">fragile and often more trouble than they&#8217;re worth</a></strong>. Armin enthusiastically tried features like <a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=Using%20Claude%20Code%20and%20other,exploring%20everything%20on%20my%20plate">slash-commands to automate tasks</a>, background hooks, and <a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=So%20I%20end%20up%20doing,be%20good%20use%20of%20copy%2Fpaste">&#8220;YOLO&#8221; modes that let the AI run wild on his codebase</a>. He ultimately abandoned most of these complex setups. Why? They didn&#8217;t consistently yield good results and added complexity to his workflow. <strong><a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=But%20oddly%20enough%2C%20very%20little,from%20these%20failures%20for%20others">&#8220;Most of my attempts didn&#8217;t last&#8230; I ended up doing the simplest thing: just talk to the machine more, give it more context&#8230; That is 95% of my workflow.&#8221;</a></strong> In other words, all the fancy autonomous behaviors were less useful than a straightforward interactive chat with the AI. He would <a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=commands%20cluttering%20your%20workspace%20and,confusing%20others">dictate or write what he wanted in detail</a> (often via speech-to-text to be more verbose) and <a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=So%20I%20end%20up%20doing,be%20good%20use%20of%20copy%2Fpaste">guide the AI step by step</a>. The agent features either failed or he forgot to use them. This sentiment is echoed by many: current coding agents are cool demos, but in day-to-day use they can go off the rails and require constant babysitting. They might run the wrong command, misunderstand a test failure, or get stuck in loops. So, <strong>fully hands-off coding is not reliable in 2025</strong>. It&#8217;s still a <strong>human-in-the-loop game</strong>, where the human provides direction and judgment. The most effective &#8220;automation&#8221; remains partial &#8211; e.g., using AI to autofix simple lint errors or generate a PR draft, but not expecting it to deliver a shippable feature without human oversight. As Cramer concluded after his two-month agent experiment, <em><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=went%20through%20the%20embedded%20agent,importantly%2C%20they%20don%E2%80%99t%20replace%20engineering">&#8220;I wasted three days trying to get the agent to design a feature I could&#8217;ve done in an afternoon&#8230; what matters is simply: you cannot use these agents to build software today.&#8221;</a></em> He emphasizes that <strong>AI in its current form will not replace the keyboard or the need for engineering skill</strong>. Instead, its value is in <em>augmenting</em> engineers, not substituting them.</p></li></ul><ul><li><p><strong>&#10060; Unvalidated &#8220;almost-right&#8221; code:</strong> This ties to the trust issue. When an AI produces code that looks plausible, there&#8217;s a temptation to accept and run with it. But as <a href="https://www.admin-magazine.com/News/Stack-Overflow-Survey-66-of-Developers-Frustrated-by-AI-Inaccuracy#:~:text=%2A%2046,last%20year">66% of devs noted, </a><em>almost-right can be worse than wrong</em>. An obviously wrong answer you&#8217;ll discard immediately, but an almost-correct snippet might slip through and later cause a subtle bug. This leads to scenarios where developers unknowingly introduce issues or technical debt by over-relying on AI output. For example, one might use an AI-generated algorithm that works on typical cases but fails on edge cases &#8211; and if the dev doesn&#8217;t thoroughly test it (trusting the AI got it right), that bug goes to production. Or AI might use a deprecated function that mostly works but breaks in a future update. Without vigilance, AI can actually <strong>decrease code quality</strong>. The Faros study&#8217;s finding of <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,quality%20worsens">increased bug density</a> on AI-heavy teams underscores this. So any productivity gain from writing code faster could be wiped out by time spent fixing the resulting bugs. Many teams have learned to treat AI suggestions with the same scrutiny as code from a junior developer: useful, but must be reviewed line by line. This of course eats into the productivity gains. As one senior dev on HN noted, <em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=indoordin0saur%20%20%2030%20,%E2%80%93">&#8220;I wouldn&#8217;t use it to write anything lengthy&#8230; overall it has improved my productivity, though I could see how it might hurt junior engineers [who trust it too much].&#8221;</a></em> The key is that <strong>experience is needed to validate AI output</strong>. Inexperienced devs who blindly trust AI may produce code faster, but potentially <em>worse</em> code &#8211; leading to a net negative productivity once you account for QA and maintenance. This is why some in the community worry that AI assistance could become a crutch that impedes learning (junior devs might cargo-cult AI code without understanding it). The optimists argue it&#8217;s akin to Stack Overflow &#8211; you still have to know enough to integrate the answer. Regardless, <strong>blind trust in AI is a recipe for trouble</strong>; successful use requires a healthy dose of skepticism and old-fashioned testing.</p></li></ul><ul><li><p><strong>&#10060; Tasks requiring creative insight or novel solutions:</strong> Language models are fundamentally pattern mimickers. When faced with a truly novel problem that doesn&#8217;t map well to known examples, AI tools often flail or produce very generic suggestions. For instance, designing a new algorithm or inventing an architecture for a brand-new paradigm &#8211; these high-level creative engineering tasks are not something current AIs excel at. They can help by brainstorming or enumerating options (which might inspire the human), but they are unlikely to produce an innovative solution outright. Thus, for the most intellectually challenging parts of engineering &#8211; deciding <em>what</em> to build, <em>why</em> and <em>how</em> at a conceptual level &#8211; human engineers are still very much in the driver&#8217;s seat. The code assistants come into play more in the later stage of <em>how to implement this logic</em> in syntax. So one could argue AI hasn&#8217;t changed the nature of software design; it&#8217;s just sped up the mechanical aspects of coding. Real productivity leaps would require AI to contribute at the design/problem-solving level, which we aren&#8217;t seeing yet except in trivial ways.</p></li></ul><p>In summary, <strong>AI currently shines for well-defined, repetitive, or insulated tasks</strong> &#8211; writing boilerplate, tests, simple functions, and answering how-to questions. It falters in situations requiring holistic understanding of large systems, creative problem-solving, or strict correctness and maintainability. This delineation suggests why startups and individual projects might feel a bigger benefit (they can afford to move fast and break things with AI-generated code), whereas big mature products can&#8217;t tolerate mistakes as easily and thus can&#8217;t unleash AI without caution.</p><h2>Managing developer attention with AI suggestions</h2><p>A recurring design challenge is <strong>managing developer attention</strong>. When assistants surface <strong>too many suggestions</strong>, utility saturates and developers start to ignore them. Chen et al. observe that higher&#8209;frequency suggestion modes led participants to copy suggestions into code less often despite productivity gains - see the Chen et al., 2025 <a href="https://arxiv.org/abs/2410.04596">paper</a>. In program&#8209;design workflows, developers also <strong>struggled to keep up with LLM&#8209;originated changes</strong> and experienced <strong>information overload</strong>, underscoring the need for careful gating of what to show and when such as in the Zamfirescu&#8209;Pereira et al., 2025 <a href="https://arxiv.org/abs/2503.06911">paper</a> and <a href="https://dl.acm.org/doi/10.1145/3706598.3714154">ACM DOI</a>.</p><p>Actionably, future agents should <strong>time and target interventions based on what the developer is focused on</strong>. That includes monitoring context such as current activity in the IDE and recent interactions to offer <strong>well&#8209;timed, context&#8209;aware suggestions</strong> rather than a constant stream. Chen et al., 2025 recommend showing contextually relevant information and timing suggestions based on user workflow. Empirically, Pu et al in their <a href="https://arxiv.org/abs/2503.06911">paper</a> show that <strong>subtask&#8209;boundary heuristics</strong> program execution, code&#8209;block completion, and user comments were effective triggers, while some signals for idleness and code selection created false positives and disruptions. </p><p>Bottom line for the write&#8209;up. Emphasize that <strong>proactivity pays off when it reduces intent&#8209;expression and interpretation effort</strong>, especially in debugging and refactoring, but it must be <strong>attention&#8209;aware</strong>. Favor interventions at subtask boundaries, allow users to tune frequency, and defer to the engineer&#8217;s flow during implementation. </p><h2>Adapting workflows: How developers are integrating AI</h2><p>To harness AI effectively, developers are evolving their workflows and tools. Some notable trends and best practices emerging in 2025:</p><ul><li><p><strong>&#8220;AI Pair-programming&#8221; via chat:</strong> Rather than relying solely on inline code completions, many developers keep an AI chat window open as they work. They treat it like a colleague they can rapidly iterate with. For example, they might paste a function and say &#8220;hey, can you refactor this to use approach X&#8221; or &#8220;find the bug in this code&#8221; or &#8220;write unit tests for this function.&#8221; This interactive approach often yields better results than expecting the AI to do everything autonomously. As Armin Ronacher noted, <strong>the simplest and most effective use of these tools is just to </strong><em><strong><a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=commands%20cluttering%20your%20workspace%20and,confusing%20others">talk to them more</a></strong></em>. He even uses <strong><a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=commands%20cluttering%20your%20workspace%20and,confusing%20others">voice input</a></strong> to stream-of-consciousness describe what he wants, because <a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=So%20I%20end%20up%20doing,be%20good%20use%20of%20copy%2Fpaste">speaking can be faster than typing and encourages providing more context</a>. The AI then responds with code or answers. This <em>conversational</em> coding style is becoming more common, especially with the advent of voice-enabled coding assistants and editor plugins. It&#8217;s like having a rubber duck that talks back with suggestions. The benefit is you can guide the AI step by step, rather than letting it guess the whole solution. This mitigates misunderstanding and lets you course-correct in real time. Tools like Cursor or VS Code Copilot Chat allow highlighting code and asking questions or for modifications, which fits naturally into a developer&#8217;s flow. The takeaway: <strong>treat the AI as a collaborator, not an autonomous coder</strong>. Continuous back-and-forth yields better outcomes than one-shot prompts.</p></li></ul><ul><li><p><strong>Personal prompt libraries &amp; reusable recipes:</strong> Developers who regularly use AI are building up a set of &#8220;prompt patterns&#8221; or semi-structured commands that work well for their needs. For instance, a prompt to &#8220;explain this code&#8221;, a prompt to &#8220;optimize this function without changing its API&#8221;, or a prompt to &#8220;generate a SQL query for X based on these tables.&#8221; Some IDE plugins let you save these as shortcuts (Armin experimented with <strong><a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=Slash%20Commands">slash commands</a></strong> for common tasks). While he found many of them ended up unused, the idea of having <strong>custom AI commands</strong> is still compelling for some &#8211; e.g., a /doc command that when you highlight a function, it generates documentation comments for it. Or /tests to scaffold test cases for a given code snippet. Armin discovered <a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/#:~:text=session,I%20ended%20up%20never%20using">limitations in implementation</a> (lack of parameterization, etc., which frustrated him), but the concept may improve as tooling matures. Even without formal slash commands, some devs keep a text snippet of their favorite prompts to copy-paste when needed (for example, the precise phrasing to get an AI to produce output in a desired format). This is analogous to having shell scripts or editor macros &#8211; you learn how to &#8220;code&#8221; the AI with prompts and reuse what works.</p></li></ul><ul><li><p><strong>AI-aware code reviews:</strong> One interesting development is engineers beginning to use AI <em>during code review</em>. For instance, if they receive a pull request (possibly AI-generated or not), they might ask an AI to summarize the changes or identify potential issues. GitHub has started previewing an &#8220;AI-assisted code review&#8221; that will highlight risky code or suggest improvements. While still early, this could help deal with the onslaught of <a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,quality%20worsens">larger PRs from AI-generated code</a>. It&#8217;s almost a necessity: if AI doubles the code written, to avoid review becoming the bottleneck, perhaps AI needs to aid in review too. Some developers already manually use AI chat: paste a diff and say &#8220;review this diff for any bugs or style issues.&#8221; The AI might catch things or at least provide a second opinion. However, caution is required &#8211; an AI code reviewer might miss context or enforce pedantic rules. But this area will likely grow, as it directly addresses the slowest link (human review) that Faros identified. If AI can pre-filter changes and auto-approve trivial ones (some companies already auto-merge PRs below X lines or with low risk), that could free humans to focus on complex changes, boosting throughput.</p></li></ul><ul><li><p><strong>Smaller, incremental changes (batch size adjustments):</strong> Some teams are adjusting their practices to better accommodate AI. One tactic is encouraging <strong>smaller, more incremental commits/PRs</strong> when using AI. Since AI can spew a lot of code quickly, there&#8217;s a temptation to do a huge change in one go. But that leads to the 150% larger PRs and long reviews. Instead, savvy devs are learning to break tasks into smaller sub-tasks, get AI to help with each, and commit in pieces. This ties into an old best practice (small batches) that becomes even more important with AI output. Smaller AI contributions are easier to verify and less likely to introduce big bugs. It&#8217;s the principle of <strong>keeping the human in control</strong> by not letting the AI run away. For example, instead of &#8220;Implement the payment system&#8221; in one shot, do &#8220;Implement the payment API client&#8221; (review it), then &#8220;Implement the payment processing function&#8221;, etc. This also helps psychologically; the dev stays engaged and doesn&#8217;t lose track of what the AI is doing.</p></li></ul><ul><li><p><strong>Investing in tests &amp; tooling:</strong> A pattern emerging in teams using AI is doubling down on automated testing and CI quality gates. Since AI code may have unknown flaws, having a robust test suite is your safety net. Some companies require that any AI-generated code must come with tests (often AI-written tests!) to ensure it&#8217;s exercised. Others run new static analysis or AI-driven analysis on the code to catch issues. Essentially, if AI increases the volume and velocity of code, the verification and validation steps must catch up too. So, teams are adding more linting, more types (in typed languages), and more checks to avoid regressions. This isn&#8217;t glamorous productivity stuff, but it&#8217;s necessary to actually realize net gains. If AI lets you write code 30% faster, but you have no tests and spend an extra 40% of time debugging in production, you lost the game. Smart teams realize this and shore up their quality pipelines. In a way, AI is forcing better engineering discipline: you can&#8217;t rely on intuition that the code is right if you didn&#8217;t write it fully yourself &#8211; you write tests to be sure.</p></li></ul><ul><li><p><strong>Knowledge sharing and training:</strong> Developers are learning tips and tricks from each other on how to coax the best out of AI. Internal brown-bags or Slack channels dedicated to AI tools are common now in companies. People share prompt techniques (&#8220;Ask it like <em>this</em> and it will include import statements correctly&#8221; etc.), or warn about pitfalls (&#8220;Don&#8217;t use it for X, it always messes up thread safety&#8221;). Treating AI proficiency as a skill that can be taught and learned is important. Google&#8217;s study and others noted how <em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=The%20report%20is%20definitely%20worth,1%5D%20is">experience with the tool</a></em><a href="https://news.ycombinator.com/item?id=44739060#:~:text=The%20report%20is%20definitely%20worth,1%5D%20is"> mattered</a>: one person with a lot of AI usage under their belt performed better. So ramping everyone up on AI &#8220;literacy&#8221; can improve overall team productivity. We&#8217;re seeing new roles or informal leads for this &#8211; e.g., an &#8220;AI champion&#8221; on a team who stays updated on the latest features (like VS Code adding a new AI refactoring command) and helps teammates use them.</p></li></ul><p>In essence, teams that get value from AI are those that <strong>treat it as an evolving capability to be managed</strong>, not a magic box. They iterate on how they integrate AI into their dev process, much like adopting any new tool. We&#8217;re at an interesting juncture where even seasoned engineers are somewhat <strong>&#8220;junior&#8221; at using AI tools</strong> &#8211; there&#8217;s a learning curve, and those who climb it reap more benefits.</p><p>One overarching theme: <strong>keeping the</strong> human in charge. All these workflow adaptations &#8211; from interactive prompting to extra testing &#8211; are about channeling AI&#8217;s strengths while mitigating its weaknesses, under human guidance. As David Cramer aptly advised fellow engineers:</p><blockquote><p>&#8220;<a href="https://cra.mr/built-with-borrowed-hands/">Ignore the claims</a> about <em>vibe coding</em> and claims that <em>you don&#8217;t need to know how to write code</em>. Instead look for ways to augment what you do. Those tests you need to write for that new API route? They look awfully similar to those other tests: so generate them.&#8221;</p></blockquote><p>He emphasizes that <strong>AI won&#8217;t replace the craft of engineering</strong>, and that&#8217;s okay. You still need to understand code &#8211; AI just helps you generate and verify it faster. And if using AI makes you miserable (some devs find &#8220;vibe coding&#8221; dull because it takes the fun out of writing code), it&#8217;s okay not to push it to the max. <em><a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=Even%20after%20I%20got%20over,higher%20output%20path%20is%20available">&#8220;It&#8217;s okay to sacrifice some productivity to make work enjoyable,&#8221;</a></em> Colton Voege reminds, noting that <a href="https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/#:~:text=No,codebase%20will%20benefit%20from%20it">forcing yourself to code in a way you hate</a> (whether that&#8217;s writing everything by hand <em>or</em> wrangling an AI for every line) can lead to burnout and worse outcomes. In other words, <strong>productivity isn&#8217;t everything &#8211; developer happiness and creativity matter too, and there&#8217;s a balance to strike</strong>.</p><h2>Proactive AI Agents for debugging and refactoring</h2><p>Recent studies suggest that <strong>proactive AI agents are particularly useful during debugging and refactoring</strong>. In a CHI 2025 study of a proactive coding assistant, participants engaged with the AI most often during implementation 38.2 percent of all AI interactions and debugging 26.4 percent, with lower rates for analyze, design, organize, and refactor stages. This reveals a general trend to favor proactive intervention for implementation and debugging phases. Source Pu et al., 2025 <a href="https://arxiv.org/abs/2502.18658">paper</a> and <a href="https://dl.acm.org/doi/10.1145/3706598.3713357">ACM DOI</a>.</p><p>Participants also reported that <strong>increased AI proactivity led to higher efficiency</strong>, while prompt&#8209;only tools demanded more effort to use. One participant contrasted the proactive modes with a prompt&#8209;only baseline saying they &#8220;had to keep on prompting and asking.&#8221; Pu et al., 2025 <a href="https://arxiv.org/abs/2502.18658">paper</a>. Independent work from Chen et al. found similar pain points with purely reactive chat assistants, with participants noting &#8220;I really wasn&#8217;t sure what to ask for with the non&#8209;proactive chat.&#8221; Chen et al., 2025 <a href="https://arxiv.org/abs/2410.04596">paper</a>, <a href="https://www.microsoft.com/en-us/research/publication/need-help-designing-proactive-ai-assistants-for-programming/">Microsoft Research summary</a> and <a href="https://dl.acm.org/doi/10.1145/3706598.3714002">ACM DOI</a>.</p><p>Critically, <strong>proactive suggestions were perceived as least disruptive during debugging and refactoring, and most disruptive during implementation</strong>. In Pu et al., disruptions clustered in implementation 32.7 percent of disruptions vs. far fewer in debugging 7.27 percent and refactor 1.82 percent, reinforcing that proactivity should be applied thoughtfully: welcome during &#8220;fix&#8221; or cleanup phases, restrained during heads&#8209;down feature work. Pu et al., 2025 <a href="https://arxiv.org/abs/2502.18658">paper</a>.</p><h2>What&#8217;s the engineering leadership perspective?</h2><p>The <a href="https://leaddev.com/the-ai-impact-report-2025">LeadDev AI Impact Report 2025</a> surveyed 883 engineering leaders across the US, UK, Europe, and beyond to capture how AI tooling is changing teams and process.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oIGn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oIGn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 424w, https://substackcdn.com/image/fetch/$s_!oIGn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 848w, https://substackcdn.com/image/fetch/$s_!oIGn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!oIGn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oIGn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png" width="1456" height="793" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:793,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:353132,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oIGn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 424w, https://substackcdn.com/image/fetch/$s_!oIGn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 848w, https://substackcdn.com/image/fetch/$s_!oIGn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!oIGn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a917f49-6a74-4b59-b673-b7112944912d_2382x1298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While 59% of leaders report feeling more productive with AI tools, the reality is more nuanced. Automated code generation dominates usage at 48%, followed by summarization (39%) and documentation (36%), while critical areas like code review (17%) and testing (7%) lag significantly. The strongest perceived gains appear in very small teams (fewer than 5 engineers), where 59% cite improvements exceeding 10%. However, 60% of organizations cite the lack of clear metrics as their biggest AI-related challenge, with only 18% currently measuring impact systematically.</p><h2>Conclusion: A clear-ryed, data-driven perspective</h2><p>By now, the consensus among many software engineers &#8211; especially the characteristically skeptical Hacker News crowd &#8211; is that <strong>AI coding tools provide </strong><em><strong>useful boosts</strong></em><strong> but not miracles</strong>. The data backs this up:</p><ul><li><p><strong>Adoption is high</strong> and growing, because these tools <em>do</em> help developers work faster or reduce tedious work.</p></li></ul><ul><li><p><strong><a href="https://medium.com/@sahin.samia/can-ai-really-boost-developer-productivity-new-study-reveals-a-26-increase-1f34e70b5341#:~:text=,into%2010%20hours%20of%20output">Productivity gains in the 20-30% range</a></strong> are being observed in controlled settings and some real teams. That&#8217;s significant, but it&#8217;s a linear gain, not exponential.</p></li></ul><ul><li><p><strong>Individual experiences vary</strong>: <a href="https://medium.com/@sahin.samia/can-ai-really-boost-developer-productivity-new-study-reveals-a-26-increase-1f34e70b5341#:~:text=The%20study%20also%20highlighted%20a,enhanced%20productivity%20across%20the%20board">novices might feel supercharged</a> and reach near-senior output on certain tasks, while some <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=We%20conduct%20a%20randomized%20controlled,from%20AI%20R%26D%20automation%201">veterans find the AI more distraction than help</a> and <a href="https://news.ycombinator.com/item?id=44739060#:~:text=indoordin0saur%20%20%2030%20,%E2%80%93">proceed cautiously</a>.</p></li></ul><ul><li><p><strong><a href="https://www.admin-magazine.com/News/Stack-Overflow-Survey-66-of-Developers-Frustrated-by-AI-Inaccuracy#:~:text=URL%3A%20https%3A%2F%2Fwww.admin">Trust is a major issue</a></strong>, with nearly half of developers not trusting AI&#8217;s output. This lack of trust is warranted given the tendency of models to err &#8211; yet ironically, developers can be <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/#:~:text=Core%20Result">overconfident after using AI</a>, as the METR study showed. Navigating between over-reliance and under-utilization is the new skill to master.</p></li></ul><ul><li><p><strong><a href="https://www.faros.ai/blog/ai-software-engineering#:~:text=,from%20AI">Output &#8800; Outcome</a></strong>: Without adapting processes, more code faster can just mean more code waiting in review or more bugs to fix. Organizations are learning that they must invest in how AI is rolled out (training people, updating workflows) to see a true productivity payoff.</p></li></ul><p>So, <strong>is AI making engineers more productive?</strong> Yes, <strong>but modestly and unevenly</strong>. Forget the splashy &#8220;10x&#8221; headlines &#8211; the reality is a story of incremental improvements and second-order effects. It&#8217;s telling that the biggest benefits cited are often <em>qualitative</em>: e.g., developers feeling <strong>less mental load</strong> on grunt work, being able to focus on higher-level problems while the AI handles boilerplate, or learning new tech quicker with AI assistance. These are real improvements to the developer experience, even if they don&#8217;t neatly show up as 1000% more output. </p><p>From a manager or CTO perspective, the message is to be <strong>realistic</strong>. If your engineers report a 30% productivity gain with AI, that&#8217;s actually in line with the best studies &#8211; be happy with that, and skeptical of any claims far above it without extraordinary proof. Also, look at where that 30% is coming from: it might be 30% more <em>code</em> written, which as we&#8217;ve discussed doesn&#8217;t automatically equal 30% more <em>value</em> delivered. Monitor your bug rates, review times, and developer satisfaction alongside raw output.</p><p>For engineers themselves, the advice is <strong>pragmatic</strong>: <em>experiment with these tools, keep what works, discard what doesn&#8217;t.</em> There&#8217;s a lot of noise and &#8220;grift&#8221; in the AI tooling space, so focus on concrete improvements in your workflow. If using an AI assistant helps you write your code more efficiently or enjoyably, great &#8211; use it. If it sometimes slows you down, figure out why (are you trusting it too much? Using it on the wrong problems? Spending too long crafting the perfect prompt?) and adjust accordingly. It&#8217;s a learning process for everyone.</p><p>Crucially, <strong>continue honing core software engineering skills</strong>. AI might change the nature of coding over time, but in 2025 it&#8217;s clear that understanding how to design a system, how to debug, how to test, and how to maintain code are still vital. In fact, those skills become <em>more</em> important when an AI is doing the easy stuff, because the human must handle the hard stuff. As Cramer wrote, <em><a href="https://cra.mr/built-with-borrowed-hands/#:~:text=it%20in%20an%20afternoon,importantly%2C%20they%20don%E2%80%99t%20replace%20engineering">&#8220;they don&#8217;t replace engineering&#8221;</a></em> &#8211; AI won&#8217;t turn a bad programmer into a great one, but it can make a good programmer faster. Think of it like power tools: a nail gun lets a skilled carpenter frame a house faster, but an unskilled person with a nail gun can also just make a big dangerous mess faster. The skill and judgment remain paramount.</p><p>So to the question likely on every engineer&#8217;s mind: <strong>Will AI take my job or make my role obsolete?</strong> The data so far suggests <strong>no &#8211; but it might change your job somewhat.</strong> Developers are still needed to conceive ideas, break down problems, review AI&#8217;s work, and ensure the final product meets real-world requirements. AI is not replacing those creative and analytical parts; it&#8217;s just shaving some of the manual labor off the edges. A majority of developers (<a href="https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/#:~:text=about%20,a%20threat%20to%20their%20job">64% still feel secure</a>) that AI isn&#8217;t a threat to their jobs, though that confidence wavered slightly this year. The best approach is probably to embrace the tools and make them <em>complement</em> your skills. In other words, <strong>be the engineer who&#8217;s 1.3&#215; as productive with AI, rather than the one who refuses to use it and falls behind</strong>. Even a pessimist can appreciate a 30% speed-up on the dull parts of coding.</p><p>Finally, a note on <strong>mindset</strong>: The initial magic of AI coding wears off, and what&#8217;s left is figuring out how to incorporate this capability sustainably. <strong>We&#8217;re after the honeymoon phase now</strong> &#8211; on the ground, assessing what these tools can <em>really</em> do for us. And the picture is not one of a revolution rendering programmers irrelevant, but of a gradual evolution in how programmers do their work. </p><p>The <strong>canonical, comprehensive take</strong> at this point is: <strong>AI coding tools are helpful assistants that, when used wisely, can make engineers moderately more productive and happier by automating some drudgery &#8211; but they are not a substitute for human insight, and they introduce new challenges (verification, coordination) that must be managed.</strong></p><p><strong>In short: </strong><em><strong>The future of coding is likely</strong></em><strong> </strong><em><strong>human+AI, not AI-alone</strong>.</em> Embrace the helper, but keep your hands on the wheel and your engineering fundamentals sharp. </p><p>That&#8217;s how you&#8217;ll truly reap the productivity gains without getting lost in the hype.</p><p><em>I&#8217;m excited to share I&#8217;m writing a new <a href="https://beyond.addy.ie">AI-assisted engineering book</a> with O&#8217;Reilly. If you&#8217;ve enjoyed my writing here you may be interested in checking it out.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sp4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sp4c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 424w, https://substackcdn.com/image/fetch/$s_!Sp4c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 848w, https://substackcdn.com/image/fetch/$s_!Sp4c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 1272w, https://substackcdn.com/image/fetch/$s_!Sp4c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sp4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:578496,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/170851153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Sp4c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 424w, https://substackcdn.com/image/fetch/$s_!Sp4c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 848w, https://substackcdn.com/image/fetch/$s_!Sp4c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 1272w, https://substackcdn.com/image/fetch/$s_!Sp4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c497c2-42b5-4e28-a5cf-a3c89db18e41_5246x5246.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Coding for the Future Agentic World]]></title><description><![CDATA[The promise and reality of autonomous coding agents in 2025]]></description><link>https://addyo.substack.com/p/coding-for-the-future-agentic-world</link><guid isPermaLink="false">https://addyo.substack.com/p/coding-for-the-future-agentic-world</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Tue, 29 Jul 2025 18:30:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pxOC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Looking forward, we're entering an <strong>agentic</strong> coding era &#8211; one where autonomous AI agents could handle more significant parts of the software development lifecycle (SDLC) on our behalf, with humans being in part of the loop. In this post I'll explore how programming is evolving in this new world of <strong>CLI</strong> <strong>coding agents</strong>, <strong>orchestrators</strong>, and <strong>background</strong> <strong>AI-driven automation</strong>. I&#8217;ll touch on everything from Claude Code to AmpCode and beyond. Let's dive in.</p><h2><strong>The rise of coding agents</strong></h2><p>It helps to frame where we are coming from. Over the past couple of years, we've seen AI assistance in coding go from simple autocomplete (e.g. early Copilot) to more interactive pair-programming. These tools claim to dramatically boost individual productivity &#8211; Microsoft recently reported that <em>over 30% of new code at the company is now AI-generated</em>, with <a href="https://simple.ai/p/coding-agents-are-having-their-chatgpt-moment#:~:text=But%20now%2C%20the%20transformation%20is,faster%20than%20most%20people%20realize">similar numbers</a> at Meta and <a href="https://9to5google.com/2025/06/30/google-engineers-ai-code/#:~:text=Google%20issues%20company%2Dwide%20AI%20coding%20guidance%20to%20software%20engineers&amp;text=Back%20in%20April%2C%20Sundar%20Pichai,for%20coding%20and%20their%20work.">Google</a>. This isn&#8217;t just happening in sandbox projects; it&#8217;s happening with production code running in systems used by billions. It&#8217;s clear we&#8217;ve hit an inflection point.</p><blockquote><p><strong>Tip:</strong> To improve AI coding, don&#8217;t just jump straight into tasks without <strong>planning</strong>. Planning first makes a big difference, as does <a href="https://addyo.substack.com/p/context-engineering-bringing-engineering">including sufficient context</a>. You can even <a href="https://www.youtube.com/watch?v=fD4ktSkNCw4">setup custom rules</a> for exactly how your mini-PRDs should be written.</p></blockquote><p>The next leap is <strong>autonomous coding agents</strong>. Instead of just offering suggestions or one-file edits, these agents can execute high-level tasks: e.g. <em>&#8220;Add a dark mode to my app&#8221;</em> or <em>&#8220;Fix all the failing tests and update outdated dependencies&#8221;</em>. They will plan the work, modify multiple files, run tests or commands, and even produce pull requests &#8211; all with minimal human intervention beyond approval. In other words, we&#8217;re shifting from <em>&#8220;AI as a coding assistant&#8221;</em> to <em>&#8220;AI as an autonomous coder&#8221;</em>.</p><p>In just a few months, <strong>every major AI lab and tech company rolled out their take on an autonomous coding agent</strong>. Anthropic launched <strong>Claude Code</strong>, Google unveiled <strong>Gemini CLI</strong> and an async agent called <strong>Jules</strong>, OpenAI introduced a research preview of a new <strong>Codex agent</strong>, and Microsoft/GitHub announced a <strong>Copilot &#8220;autonomous agent&#8221;</strong> mode. Social media has been abuzz about a coming "rapid development" in AI coding platforms. It feels reminiscent of ChatGPT&#8217;s breakout moment &#8211; suddenly <em>coding agents</em> are everywhere, and developers are taking notice.</p><p>The answer many are converging on is that <strong>developers will evolve from "coders" to "conductors"</strong>. We&#8217;ll spend less time grinding out boilerplate or hunting bugs, and more time orchestrating AI agents, providing high-level guidance, and verifying the results. In other words, the next level of abstraction in software engineering is here. Just as we moved from assembly to high-level languages, and from on-prem servers to cloud, we&#8217;re now moving to a world where you might <strong>&#8220;orchestrate fleets of agents, letting AI handle the implementation&#8221;</strong>. The best developers will be those who can <em>communicate goals effectively to AIs, review and improve AI-generated code, and know when human insight must override AI output</em>.</p><blockquote><p><strong>Tip:</strong> Having AI write tests first (e.g. TDD-style) and then the least amount of code needed to make them pass can minimize things going off the rails.</p></blockquote><p>So, what does this agentic future look like in practice? Let&#8217;s explore some key categories of tools and patterns that are emerging:</p><ul><li><p><strong>AI Coding Agents in the terminal (CLI agents)</strong> &#8211; interactive command-line tools (Claude Code, Google&#8217;s Gemini CLI, OpenCode, etc.) that act as AI engineers living in your terminal. They can understand your entire codebase and perform multi-step coding tasks via a conversational interface.</p></li><li><p><strong>Orchestrating multiple agents in parallel</strong> &#8211; tools that let you run multiple AI agents simultaneously on different tasks or parts of a project. Think of this as having an <em>AI pair programming team</em> at your disposal, managed via a single interface (Claude Squad, Conductor, Agent Farm, Magnet, etc.).</p></li><li><p><strong>Asynchronous background coders</strong> &#8211; agents that run in the cloud or in the background (like Google&#8217;s Jules or OpenAI&#8217;s Codex agent), which you can assign tasks to. They work autonomously and come back with code changes or pull requests, while you continue other work.</p></li><li><p><strong>AI-Assisted testing and CI</strong> &#8211; applying agentic AI to the adjacent stages of development: writing unit tests, detecting and fixing build failures, and automating code reviews. This includes &#8220;self-healing&#8221; CI pipelines that automatically correct failing tests, and bots that can propose fixes via PRs.</p></li><li><p><strong>Integrated AI-First dev environments</strong> &#8211; new IDEs and platforms built around AI from the ground up (e.g. Cursor, Windsurf, Replit Ghostwriter, etc.), which blur the line between writing code and prompting an agent. These often combine aspects of the above categories into a unified experience.</p></li><li><p><strong>Project orchestration &amp; management</strong> &#8211; AI agents hooking into our project management tools (issues, tickets, docs) to close the loop. For example, starting from a GitHub or Linear issue and letting an AI agent implement the required code changes and update the ticket.</p></li></ul><p>Along the way, I&#8217;ll also touch on challenges and open questions &#8211; like how we maintain code quality, handle AI mistakes, control costs, and ensure developers remain in the loop. Let&#8217;s start with the CLI agent revolution, since that&#8217;s been one of the most talked-about developments.</p><h2><strong>Coding with CLI agents: AI in your terminal</strong></h2><blockquote><p><strong>The terminal is the new IDE - CLI agents turn your shell into an action&#8209;oriented interface where prompts translate into multi&#8209;file commits and tests, collapsing traditional editor boundaries.</strong></p></blockquote><p>One striking trend is the resurgence of the command-line interface as a place for powerful developer tools &#8211; now supercharged with AI. Historically, many devs prefer the GUI of an IDE for serious coding, but the new generation of AI coding agents is making the terminal <em>shockingly productive</em>. I&#8217;ve personally found myself living in the terminal more, thanks to these tools.</p><p><strong>Anthropic&#8217;s <a href="https://docs.anthropic.com/en/docs/claude-code/overview#:~:text=Claude%20Code%20overview">Claude Code</a></strong> is a prime example. It&#8217;s an <em>&#8220;agentic coding tool that lives in your terminal and helps turn ideas into code faster&#8221;</em>. Under the hood, Claude Code is essentially Anthropic&#8217;s Claude AI (a large language model like ChatGPT) augmented with the ability to act on your filesystem and execute commands. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1AVK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1AVK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 424w, https://substackcdn.com/image/fetch/$s_!1AVK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 848w, https://substackcdn.com/image/fetch/$s_!1AVK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 1272w, https://substackcdn.com/image/fetch/$s_!1AVK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1AVK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png" width="1400" height="785" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:785,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Claude Code: From Zero to Hero. What is Claude Code? | by Daniel Avila |  Jun, 2025 | Medium&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Claude Code: From Zero to Hero. What is Claude Code? | by Daniel Avila |  Jun, 2025 | Medium" title="Claude Code: From Zero to Hero. What is Claude Code? | by Daniel Avila |  Jun, 2025 | Medium" srcset="https://substackcdn.com/image/fetch/$s_!1AVK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 424w, https://substackcdn.com/image/fetch/$s_!1AVK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 848w, https://substackcdn.com/image/fetch/$s_!1AVK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 1272w, https://substackcdn.com/image/fetch/$s_!1AVK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ab44e-b868-4bdf-b32c-16f217ed4587_1400x785.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You install it via npm (npm install -g @anthropic-ai/claude-code), navigate to a project directory, and just run claude. This drops you into an interactive CLI where you can have a conversation with Claude about your codebase. But unlike a chat in the browser, this agent can <strong>directly edit files, run tests, git commit changes, and more</strong>.</p><div id="youtube2-U_vwfQBhVSY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;U_vwfQBhVSY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/U_vwfQBhVSY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><em>There&#8217;s an excellent beginners guide to Claude Code by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;ian nuttall&quot;,&quot;id&quot;:307634712,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e13e0bec-b862-4cbe-9b5e-e4331947b8a7_2335x2335.jpeg&quot;,&quot;uuid&quot;:&quot;76e1fcc6-23d3-4451-9436-a8a55f54c0fc&quot;}" data-component-name="MentionToDOM"></span> above.</em></p><p>What can Claude Code do for you? Quite a lot:</p><ul><li><p><strong>Build features from descriptions</strong> &#8211; You can literally tell Claude Code what you want to build in plain English. It will break the task into steps, write or modify code across multiple files, and ensure it runs. Essentially, it handles implementation from a spec.</p></li><li><p><strong>Debug and fix issues</strong> &#8211; If you describe a bug or paste an error, Claude Code will analyze the codebase to locate the problem and then <em>implement a fix</em> automatically. It&#8217;s not just suggesting a fix; it can apply it.</p></li><li><p><strong>Navigate and answer questions about the codebase</strong> &#8211; Because it indexes your entire project (and even can fetch external info via Anthropic&#8217;s <strong>Model Context Protocol (<a href="https://addyo.substack.com/p/mcp-what-it-is-and-why-it-matters">MCP</a>)</strong> hooks), you can ask questions like <em>&#8220;Where is the user authentication logic defined?&#8221;</em> or <em>&#8220;What does this error mean in our context?&#8221;</em>, and get answers that reference your actual code.</p></li><li><p><strong>Automate tedious tasks</strong> &#8211; Need to do a boring refactor or cleanup? For example, <em>&#8220;remove all deprecated API calls and update them to the new interface&#8221;</em> &#8211; the agent can handle that. Fix lint issues, resolve merge conflicts, generate release notes, you name it.</p></li></ul><blockquote><p>Here are some great <a href="https://x.com/jasonzhou1993/status/1948334295120314793">Claude Code pro-tips</a> from <strong><a href="https://x.com/jasonzhou1993">Jason Zhou</a>:</strong></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;5d2c5a2e-4bb2-4cd5-b91b-6918d45c84e6&quot;,&quot;duration&quot;:null}"></div></blockquote><p>In short, Claude Code tries to be a <strong>hands-on AI software engineer</strong> working alongside you. It not only chats, but <em>takes action</em>. It was important to Anthropic that this agent can follow the Unix philosophy &#8211; meaning you can script it and compose it with other tools. A fun example from their docs: you could pipe logs to it and have it monitor them, e.g. tail -f app.log | claude -p "Slack me if you see any anomalies". Another: run it in CI to automatically raise a PR if certain conditions are met (like new strings needing translation) This shows how deeply an AI agent can integrate into developer workflows beyond just writing code &#8211; it can observe and act on events. </p><blockquote><p><strong>Tip: </strong><a href="https://frontendatscale.com/issues/49">Spec-driven development</a> is increasingly becoming common when using a CLI coding agent. This is for when you need a more structured approach with a detailed task breakdown vs. just going with the vibes via a simple prompt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DyUF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DyUF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 424w, https://substackcdn.com/image/fetch/$s_!DyUF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 848w, https://substackcdn.com/image/fetch/$s_!DyUF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 1272w, https://substackcdn.com/image/fetch/$s_!DyUF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DyUF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55074,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/169029578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DyUF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 424w, https://substackcdn.com/image/fetch/$s_!DyUF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 848w, https://substackcdn.com/image/fetch/$s_!DyUF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 1272w, https://substackcdn.com/image/fetch/$s_!DyUF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017bdb96-6128-41f1-b53a-4e7793d86caa_2094x1186.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><p>CLI agents have been evolving further lately and Claude Code just got a new feature: <a href="https://docs.anthropic.com/en/docs/claude-code/sub-agents">custom subagents</a>. Subagents can create &#8220;teams&#8221; of custom agents, each designed to handle specialized tasks. These include architects, reviewers and testers. Each agent has it's own context and conversation history. Type `/agents` to get started:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;f438aa72-a7c2-4a80-88af-74bf0600a5c9&quot;,&quot;duration&quot;:null}"></div><p>Anthropic&#8217;s approach has resonated especially with more advanced developers who love the terminal. There&#8217;s no new editor to learn &#8211; <em>the agent meets you where you already work</em>. If you&#8217;re a vim + CLI person, Claude Code doesn&#8217;t force a GUI on you. And if you <em>do</em> prefer an IDE, they&#8217;ve made it easy to connect Claude Code to popular editors (via a local server that the IDE can talk to). </p><p>Personally, I enjoy the focus of the CLI &#8211; when I run claude in a project, I&#8217;m in a single-pane environment where the AI agent is front and center, not tucked in a small sidebar. A great description from one review: <em>&#8220;that single terminal pane felt like a better interface&#8230; now that the agent was doing so much, I found myself saying: do I really need the file editor to be the primary focus?&#8221;</em>. I can relate to that sentiment; when an agent is effectively handling multi-file changes, the traditional editor UI starts to feel like overhead.</p><p>Anthropic isn&#8217;t alone here. <strong><a href="https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/#:~:text=does%20the%20demand%20for%20integrated,AI%20assistance">Google&#8217;s Gemini CLI</a></strong> is another major entrant. Announced in June 2025, Gemini CLI brings Google&#8217;s premier LLM (Gemini 2.5 Pro) right into your terminal. It&#8217;s free and open-source &#8211; you just log in with a Google account to get generous access (Google touts <em>&#8220;unmatched free usage limits&#8221;</em> for individuals, including a 1 million token context window for Gemini 2.5 Pro). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-FbD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-FbD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-FbD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-FbD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-FbD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-FbD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Google introduces Gemini CLI, a light open-source AI agent that brings  Gemini directly into the terminal&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Google introduces Gemini CLI, a light open-source AI agent that brings  Gemini directly into the terminal" title="Google introduces Gemini CLI, a light open-source AI agent that brings  Gemini directly into the terminal" srcset="https://substackcdn.com/image/fetch/$s_!-FbD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-FbD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-FbD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-FbD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1956c1f5-87ca-4557-b8bf-4398634b8f66_1280x720.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That context size is massive &#8211; it means the agent can theoretically take into account your entire codebase or large chunks of documentation when helping you. I and my teams have been enjoying using Gemini CLI for both work and personal projects.</p><div id="youtube2-eyYmFAFxiJ4" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;eyYmFAFxiJ4&quot;,&quot;startTime&quot;:&quot;2s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/eyYmFAFxiJ4?start=2s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Like Claude Code, Gemini CLI can do a lot more than code completion: it handles <em>&#8220;coding, content generation, problem-solving, and task management&#8221;</em> in the terminal. In practice, it overlaps many of Claude Code&#8217;s capabilities &#8211; editing code, answering questions, running commands, etc. Google integrated it tightly with their AI ecosystem: if you use <strong>Gemini Code Assist</strong> in VS Code, you can seamlessly move to the CLI agent and vice versa. It&#8217;s clear Google sees CLI agents as a cornerstone, not just a novelty.</p><blockquote><p>Some Gemini CLI safety pro-tips from creator <a href="https://x.com/ntaylormullen">N. Taylor Mullen</a>: </p><ul><li><p>gemini --sandbox is your safe playground (macOS Seatbelt, Docker and Podman support) </p></li><li><p>gemini --checkpointing = instant "undo" button with /restore </p></li></ul><p>There&#8217;s a whole <a href="https://www.philschmid.de/gemini-cli-cheatsheet">Gemini CLI cheat-sheet</a> also available from <a href="https://x.com/_philschmid/status/1950206886071992667">Philipp Schmid</a>, who has shared some great tips including using / commands for <strong>creating a plan</strong> based on your request and existing codebase:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;3a24a025-66b9-4aad-b3fa-0e1ecaefd1ee&quot;,&quot;duration&quot;:null}"></div></blockquote><p>There are other <strong>open-source CLI agents</strong> making waves. One <a href="https://www.youtube.com/watch?v=hJm_iVhQD6Y#:~:text=ClaudeCode%21%20www,CLI%2C%20AI%20coding%20agent">noteworthy</a> project is <strong><a href="https://github.com/opencode-ai/opencode">OpenCode CLI</a></strong> (by the team at SST) &#8211; a &#8220;powerful AI coding agent built for the terminal&#8221;. OpenCode is model-agnostic: you can plug in different providers (OpenAI, Anthropic, local models via Ollama, etc.) and it will use them to drive the agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SMin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SMin!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 424w, https://substackcdn.com/image/fetch/$s_!SMin!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 848w, https://substackcdn.com/image/fetch/$s_!SMin!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 1272w, https://substackcdn.com/image/fetch/$s_!SMin!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SMin!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png" width="1200" height="737" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:737,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OpenCode: Open Source Claude Code Alternative is Here:&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OpenCode: Open Source Claude Code Alternative is Here:" title="OpenCode: Open Source Claude Code Alternative is Here:" srcset="https://substackcdn.com/image/fetch/$s_!SMin!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 424w, https://substackcdn.com/image/fetch/$s_!SMin!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 848w, https://substackcdn.com/image/fetch/$s_!SMin!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 1272w, https://substackcdn.com/image/fetch/$s_!SMin!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d9fd4a-757a-4f52-b1c2-38a5d3e0767e_1200x737.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>OpenCode&#8217;s philosophy is <a href="https://dev.opencode.ai/docs/cli/#:~:text=Running%20the%20opencode%20CLI%20starts,it%20for%20the%20current%20directory">similar</a>: you navigate to your project and launch opencode to start an AI session in that context. It even supports a non-interactive mode (opencode run "your prompt") for quick answers or scripting usage. Being open source, developers can inspect the code and extend it &#8211; which builds trust. And because it&#8217;s not tied to a single model, you&#8217;re free to use whichever AI backend is best (or most cost-effective) for you.</p><p><strong>What&#8217;s it like to use a CLI coding agent?</strong> In my experience, it feels like pair programming with a supercharged assistant who never gets tired. For example, I can say: <em>&#8220;Add a new API endpoint for uploading a profile picture. Use Express and ensure the image is stored in Cloud Storage&#8221;</em>. The agent will typically respond with a plan (e.g. &#8220;I will create a new route, a middleware for handling uploads, update the user model, and write a test for it&#8221;), then proceed to implement it step by step, asking for confirmation before running potentially destructive commands or big changes. Claude Code, for instance, asks yes/no before applying diffs or running a test suite. You remain the human in charge &#8211; but you&#8217;re delegating the heavy lifting.</p><p>Using these CLI agents effectively also requires <strong>prompting skill</strong> and a willingness to trust automation. The first few times I watched an agent refactor code, I was nervous &#8211; it was changing code faster than I could fully parse. It&#8217;s important to review diffs carefully (the tools make this easier by showing colorized diffs in the terminal and summarizing changes). Over time, I&#8217;ve grown more comfortable, especially as I see that I can always rollback a commit if needed. The CLI context actually encourages a commit-per-change workflow, which is nice (Claude Code will often commit changes with descriptive messages as it goes, so you have a history of what the AI did).</p><blockquote><p><strong>Tip: </strong>For <strong>cost savings</strong>, you can ask Claude Code or other CLI agents to <a href="https://x.com/iannuttall/status/1938695506013601802">use Gemini CLI to build a plan for Claude</a> to take action on. This is useful with its 1M context window and free plan (in non-interactive mode) to research your codebase etc.</p></blockquote><p>Finally, cost is a consideration. Some CLI agents are free or let you use your own API keys. When using these extensively, token usage can add up. I expect we&#8217;ll see more pricing options and competition driving costs down. For now, I treat the heavy agents like I&#8217;d treat running a cloud dev VM &#8211; immensely powerful, but remember to shut it down when not needed or consider using more cost-effective models.</p><p>The bottom line: <strong>CLI-based AI coding agents are here and surprisingly effective.</strong> They turn the humble terminal into a smart, action-oriented IDE. For many tasks, I&#8217;ve found myself preferring the CLI agent over clicking around an editor. It&#8217;s a different way of working &#8211; more conversational and higher-level. And as these tools improve (with bigger contexts, integration of web search, etc.), the gap between &#8220;describe what you want&#8221; and &#8220;get working code&#8221; is closing fast.</p><h2>Toad</h2><p>The technical implementation of terminal-based agents has room for improvement, as highlighted by <a href="https://willmcgugan.github.io/announcing-toad/">Will McGugan's </a><strong><a href="https://willmcgugan.github.io/announcing-toad/">Toad</a></strong><a href="https://willmcgugan.github.io/announcing-toad/"> project</a>. McGugan, former CEO of Textualize and creator of the Textual framework, identified fundamental user experience issues in existing CLI agents including visual flickering, poor text selection, and interface degradation during terminal resizing. These problems stem from how current agents update terminal content by removing and rewriting entire sections rather than performing targeted updates.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;8f75072c-9c29-45c3-b36b-882542588e7e&quot;,&quot;duration&quot;:null}"></div><p>Toad demonstrates an alternative architectural approach that separates the user interface layer from the AI interaction logic through a JSON-based protocol over stdin/stdout. This separation enables flicker-free updates, proper text selection capabilities, and smooth scrolling while maintaining the lightweight, keyboard-driven experience that developers expect from terminal applications. The architecture also allows for language-agnostic backend implementations, meaning the AI processing can be written in any language while the interface remains consistent.</p><p>While still in early development and <a href="https://willmcgugan.github.io/announcing-toad/">available to GitHub sponsors</a>, Toad represents an important consideration for the future of terminal-based development tools: the quality of the user interface implementation significantly impacts developer productivity and adoption, even when the underlying AI capabilities are sophisticated.</p><p>Before I get carried away letting a single agent handle everything, it&#8217;s worth noting another development: why use one agent when you can use <em>many</em>? That leads us to orchestrators.</p><h2><strong>From solo agent to AI team: Orchestrating multiple coding agents</strong></h2><blockquote><p><strong>Parallel agent orchestration transforms a single helper into an AI &#8220;team&#8221;. You assign boundaries, isolate via branching, and supervise a swarm - scaling development horizontally with human oversight.</strong></p></blockquote><p>If one AI coding assistant can make you twice as productive, what could <em>five or ten</em> working in parallel do? This question has led to the rise of <strong>agent orchestrators</strong> &#8211; tools that let you spin up multiple AI coding agents to work concurrently on a project. It&#8217;s like having an army of AI developers, each handling a piece of the work, with you as the manager overseeing them. This pattern is quickly becoming feasible on everyday developer machines.</p><p>One of the pioneers here is an open-source project called <strong><a href="https://github.com/smtg-ai/claude-squad#:~:text=GitHub%20github,in%20separate%20workspaces">Claude Squad</a></strong> (by @smtg-ai). Claude Squad is a terminal app that manages multiple Claude Code (and even other agents like OpenAI Codex or Aider) sessions in parallel. The idea is simple: each agent gets its own isolated Git workspace (e.g., using git worktree or branches), so they don&#8217;t step on each other&#8217;s toes, and you can assign different tasks to each. The tool provides a unified TUI (text UI) where you can monitor all agents at <a href="https://smtg-ai.github.io/claude-squad/#:~:text=Claude%20Squad%20,%C2%B7%20Review%20work%20before%20shipping">once</a>. Why do this? Because certain large tasks can be broken down. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vQDT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vQDT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 424w, https://substackcdn.com/image/fetch/$s_!vQDT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 848w, https://substackcdn.com/image/fetch/$s_!vQDT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 1272w, https://substackcdn.com/image/fetch/$s_!vQDT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vQDT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png" width="1456" height="1064" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1064,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:946796,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/169029578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vQDT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 424w, https://substackcdn.com/image/fetch/$s_!vQDT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 848w, https://substackcdn.com/image/fetch/$s_!vQDT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 1272w, https://substackcdn.com/image/fetch/$s_!vQDT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e667220-41ae-4010-b7c5-f32e8db52c6d_2700x1974.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For example, if you have a big legacy codebase to modernize, you might run one agent to upgrade the frontend framework, another to refactor the database layer, and another to add tests &#8211; all in parallel. Claude Squad lets you supervise and coordinate this within one interface, and importantly, it <em>isolates the changes</em> until you choose to merge them, preventing conflicts. Early users have reported massive productivity boosts, essentially <em>multiplying</em> their output by N agents.</p><p>A similar commercial tool is <strong><a href="https://conductor.build/#:~:text=URL%3A%20https%3A%2F%2Fconductor,of%20Claude%20Codes%20in%20parallel">Conductor</a></strong> (currently Mac-only). Conductor&#8217;s tagline is literally <em>&#8220;Run a bunch of Claude Codes in parallel&#8221;</em>. It provides a desktop app where you connect a repo, and can deploy multiple Claude Code agents simultaneously &#8211; each agent gets its own workspace (Conductor uses git worktrees under the hood, as confirmed in their <a href="https://conductor.build/#:~:text=Does%20Conductor%20use%20worktrees%3F">FAQ</a>). </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;380b4262-3559-452c-addc-fe245d8f388a&quot;,&quot;duration&quot;:null}"></div><p>The UI shows a list of agents and their status: you can see who&#8217;s working on what, who might be waiting for input, and importantly, what files have changed in each workspace. You act as the conductor (hence the name): assign tasks, monitor progress, and review code changes. Once satisfied, you can merge the changes back. As of now, Conductor supports Claude Code as the AI backend, but they hint at adding others soon. The fact that this is a user-friendly app (with a nice GUI) indicates that orchestrating multiple agents isn&#8217;t just a hacker&#8217;s experiment &#8211; it&#8217;s headed for mainstream developer workflows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8cfk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8cfk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 424w, https://substackcdn.com/image/fetch/$s_!8cfk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 848w, https://substackcdn.com/image/fetch/$s_!8cfk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 1272w, https://substackcdn.com/image/fetch/$s_!8cfk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8cfk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp" width="1456" height="984" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:984,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1682972,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/169029578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8cfk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 424w, https://substackcdn.com/image/fetch/$s_!8cfk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 848w, https://substackcdn.com/image/fetch/$s_!8cfk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 1272w, https://substackcdn.com/image/fetch/$s_!8cfk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea50a86-86a1-4319-a817-47581bf340b3_4096x2768.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The Conductor app orchestrating multiple Claude Code agents on a project (each agent runs in an isolated git branch). The UI shows agent statuses, and allows reviewing their code changes before merging.</em></p><p>Another exciting open-source project is <strong><a href="https://github.com/Dicklesworthstone/claude_code_agent_farm#:~:text=Claude%20Code%20Agent%20Farm%20is,scale%20code%20improvements">Claude Code Agent Farm</a></strong> by Jeffrey Emanuel (@doodlestein). With a name evoking &#8220;farming&#8221; a whole crop of Claude agents, it delivers on that imagery. Agent Farm is described as an orchestration framework to run <em>&#8220;multiple Claude Code sessions in parallel to systematically improve your codebase&#8221;</em>. It supports running <em>20 or more agents simultaneously (configurable up to 50!)</em>. </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;e6b12532-4e38-444e-bf1d-fecaf5152ccd&quot;,&quot;duration&quot;:null}"></div><p>This blows my mind a bit &#8211; 50 AI instances attacking a codebase in concert. Of course, to manage that, the framework includes some sophisticated coordination features: a <em>lock-based system to prevent conflicts</em>, so agents won&#8217;t overwrite each other&#8217;s changes on the same file. </p><p>It&#8217;s also highly configurable, supporting <em>34 different tech stacks</em> out of the box with custom workflow scripts for each. This means you can, say, unleash a swarm of agents to apply a set of best practices across a polyglot monorepo &#8211; one agent might handle updating React components, another fixes Python lint issues, another updates CI configs, etc., all guided by the frameworks provided. Agent Farm also provides a real-time dashboard with heartbeats and context usage warnings, automatic recovery if an agent crashes, and neat features like generating an HTML report of everything the agents did. Essentially, it&#8217;s turning the idea of a &#8220;code mod&#8221; (mass refactoring) into an AI-driven, parallelizable process.</p><p>While running dozens of agents locally might stress some machines (and certainly can rack up API usage costs), the concept is compelling. It reminds me of how build systems went from single-threaded to massively parallel &#8211; why not do the same for AI coding tasks? Many small changes that are independent (or can be partitioned by area) are perfect for parallelization. Of course, humans can&#8217;t effectively write or review 10 changes at once, but an orchestrator + agents can. The human (you) just need to oversee the results.</p><blockquote><p><strong>Tip:</strong> Open-source models like Qwen3 have become capable enough that developers are seriously evaluating them for these use-cases. Do keep in mind cost for self-hosted usage <a href="https://x.com/StochasticGhost/status/1947945075230539881">can add-up fast</a>, especially if running multiple agents async.</p></blockquote><p>Besides Claude-focused tools, we have <strong><a href="http://magnet.run">Magnet</a></strong> (by Nicolae Rusan and team), which brands itself as <em>&#8220;the AI workspace for agentic coding&#8221;</em>. Magnet is a bit like an IDE + orchestrator + project management hybrid. It allows async AI workflows triggered by issue trackers. </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;f942378a-6a7f-4c17-a353-72997ea88500&quot;,&quot;duration&quot;:null}"></div><p>For <a href="https://www.linkedin.com/posts/nicolaerusan_magnet-magnetrun-now-offers-linear-github-activity-7095193885928214528-XQMe?trk=public_profile_like_view#:~:text=Magnet%20%28magnet,the%20codebase%20but%20also%20asks">example</a>, you can start an AI task directly from a Linear or GitHub issue inside Magnet. Magnet will automatically pull in relevant context (e.g. files related to that issue, based on the issue description), and even ask clarifying questions before proceeding. It&#8217;s designed to guide you end-to-end on a feature or bug fix. </p><p>One thing I love is that Magnet doesn&#8217;t just spew code &#8211; it often <em>suggests considerations or alternative approaches</em> to you as it works, acting like a thoughtful junior engineer might. Nicolae shared an <a href="https://www.linkedin.com/posts/nicolaerusan_magnet-magnetrun-now-offers-linear-github-activity-7095193885928214528-XQMe?trk=public_profile_like_view#:~:text=caught%20myself%2C%20leading%20me%20to,to%20implement%20a%20feature%20in">example</a> of shipping a new feature where Magnet not only correctly implemented it, but <em>introduced him to a package he didn&#8217;t know about that offered a better solution</em>. That kind of AI augmentation &#8211; not just doing the task but improving the solution &#8211; is powerful. Magnet also supports grouping tasks across multiple repositories (&#8220;Projects&#8221; feature). If a feature requires coordinated changes in backend, frontend, and database, the AI can handle each in the appropriate repo and tie them together. This hints at a future where AI agents aren&#8217;t limited to one codebase at a time, but can work across an entire software stack cohesively.</p><p>If running multiple agents sounds complex, it can be at first. But tools like the above are rapidly smoothing the experience. Even simple approaches work: I know folks who just open several terminal windows (using tmux, <a href="https://iterm2.com/">iTerm</a> splits, or the new <a href="https://ghostty.org/">Ghostty</a> terminal emulator) and run separate agent sessions manually in each. Ghostty, for instance, makes it <a href="https://www.bitdoze.com/ghostty-terminal/#:~:text=Ghostty%20Terminal%3A%20A%20Complete%20Setup,This%20feature">easy</a> to tile many terminals in one window, so you can visually monitor multiple agent sessions side by side. </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;7b79b903-499e-417f-ab3f-741c54e3a388&quot;,&quot;duration&quot;:null}"></div><p>I&#8217;ve tried this &#8211; running e.g. Gemini CLI in one pane doing a frontend change, and Claude Code CLI in another doing a backend change. It actually worked out fine, though I had to manage integrating the changes myself. Purpose-built orchestrators like Claude Squad or Magnet handle a lot of that housekeeping (branching, merging, preventing conflicts) automatically, which is definitely nicer.</p><p>We are still in early days of figuring out <em>best practices for multi-agent dev</em>. Some challenges to consider: dividing tasks well (so that agents don&#8217;t depend on each other&#8217;s yet-to-be-done work), ensuring consistency (if two agents need to agree on an interface contract, how do we coordinate that?), and not overwhelming the human overseer. There&#8217;s research happening on having the agents themselves coordinate &#8211; e.g. one agent acts as a &#8220;planner&#8221; and assigns subtasks to other agent instances. That starts to sound like AI project managers and AI developers working as a team. I wouldn&#8217;t be surprised if in a year or two, we have an &#8220;AI scrum master&#8221; that can spin up and manage a swarm of coding agents dynamically.</p><p>For now, the onus is on the developer to partition work and review outputs. But even in this form, parallel AI development is a force multiplier. I&#8217;ve felt a bit like a tech lead directing multiple junior devs &#8211; except these juniors work at superhuman speed and never get tired of dull tasks. If this pattern becomes common, our development workflows might shift towards queuing up tasks for AI agents to tackle overnight or while we handle higher-level design. Imagine coming into work to find <strong>overnight AI PRs</strong> for all the refactoring tasks you queued up &#8211; ready for your review.</p><p>That segues nicely into the next topic: agents that work asynchronously in the background, producing code while you do other things.</p><h2><strong>Async background coders: agents that code while you aren't watching</strong></h2><blockquote><p><strong>Background agents turn coding into delegated background work: submit a task, let it run in the cloud, review a completed PR later - coding as queued, asynchronous workflow.</strong></p></blockquote><p>One of the promises of agentic coding is the ability to <em>offload coding tasks to an AI and let it work autonomously</em>, notifying you when it&#8217;s done or if it needs input. This frees you to do other work (or get some sleep!). Several efforts in 2024-2025 have focused on these <strong>asynchronous coding agents</strong>. They differ from the interactive CLI agents in that you don&#8217;t babysit them in a live session; instead, you trigger them and they come back with results later (minutes or hours later, depending on the task).</p><p>The poster child here is <strong><a href="https://jules.google/">Google&#8217;s Jules</a></strong>. Jules was first introduced in late 2024 as an experiment in Google Labs, and by May 2025 Google put it into <a href="https://blog.google/technology/google-labs/jules/#:~:text=Jules%20is%20an%20asynchronous%2C%20agentic,and%20performs%20tasks%20such%20as">public beta</a> for everyone. They explicitly describe Jules not as a &#8220;co-pilot&#8221; or autocomplete assistant, but as <em>&#8220;an autonomous agent that reads your code, understands your intent, and gets to work&#8221;</em>. </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;8f6da879-8613-4ed4-a6cf-1444c1da814a&quot;,&quot;duration&quot;:null}"></div><p>Here&#8217;s how Jules works, in a nutshell:</p><ul><li><p>You integrate Jules with your GitHub account/repositories. (Jules runs as a cloud service &#8211; specifically, it spins up a secure Google Cloud VM for each task).</p></li><li><p>You give Jules a high-level task. This can be done through a web UI or VS Code plugin (and soon by simply labeling a GitHub issue with &#8220;assign-to-jules&#8221;jules.google!). For example: <em>&#8220;Write unit tests for all functions in the utils/ folder&#8221;</em> or <em>&#8220;Upgrade the project to Next.js v15 and refactor to the new app directory structure&#8221;</em>.</p></li><li><p>Jules clones your repo into the VM, <em>understands the full context</em> of the codebase (thanks to using Gemini 2.5 which can handle huge context), and then <strong>generates a plan</strong> of action for the task. It actually shows you this plan for approval. For instance, it might say: <em>&#8220;Plan: Update 22 files to migrate to Next.js 15 conventions&#8221;</em>.</p></li><li><p>Once you approve, Jules executes the plan autonomously. It makes the code changes, runs tests if relevant, and so on. This might involve multiple internal steps, but you don't need to supervise them.</p></li><li><p>When done, Jules presents you with the <em>diffs of all the changes</em> it made for reviewjules.google. You can browse through the code changes (in the UI or as a PR on GitHub).</p></li><li><p>If you&#8217;re happy, you can tell Jules to create a pull request with those changes. Jules will then open a PR on your repository with the commits. (At this point, it&#8217;s just regular code &#8211; you or teammates can review, run CI, and merge as usual).</p></li><li><p>As a bonus, Jules can generate an <strong>audio summary</strong> of the changes &#8211; a spoken changelog you can listen to, highlighting what was done. This is a neat touch; I&#8217;ve tried it and it feels like a quick way to catch up on an agent&#8217;s work while I&#8217;m, say, commuting or doing something away from the screen.</p></li></ul><div id="youtube2-Fm6MQpzwhwA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Fm6MQpzwhwA&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Fm6MQpzwhwA?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>All of this happens asynchronously. Jules is doing the heavy lifting in the cloud VM, possibly utilizing parallelism under the hood (Google mentioned it can handle <em>concurrent tasks with speed and precision</em> thanks to the cloud setup). Meanwhile, you could be focusing on another part of the project, or multiple Jules tasks could even run in parallel. It really is like having a background worker.</p><blockquote><p>Some <a href="https://x.com/julesagent/status/1927791418942132659">Jules pro-tips</a> are available:</p><ul><li><p>For cleaner results with Jules, give each distinct job its own task. E.g., 'write documentation' and 'fix tests' should be separate tasks in Jules.</p></li><li><p>Help Jules write better code: When prompting, ask Jules to 'compile the project and fix any linter or compile errors' after coding.</p></li><li><p>Do you have an <em>instructions.md</em> or other prompt related markdown files? Explicitly tell Jules to review that file and use the contents as context for the rest of the task</p></li><li><p>Jules can surf the web! Give Jules a URL and it can do web lookups for info, docs, or examples</p></li></ul></blockquote><p>What kind of tasks is Jules good at? These include: <em>writing tests, building new features, providing audio changelogs, fixing bugs, bumping dependency versions</em>. The common theme is tasks that have a clear goal and can be relatively well-defined in a prompt. Jules shines at maintenance chores like dependency upgrades or adding tests &#8211; things we often procrastinate on as developers. A developer in the Jules beta shared on HN that they could just tell Jules <em>&#8220;achieve 100% test coverage&#8221;</em> and it went off to write a comprehensive test suite. Another user mentioned Jules automatically fixed a bug and made a PR, saying <em>&#8220;Jules just made her first contribution to a project I&#8217;m working on&#8221;</em> &#8211; a sentence that felt sci-fi only a year ago!</p><p>OpenAI, not to be outdone, <em>surprised the industry</em> by releasing a research preview of their own coding agent, referred to as <strong><a href="https://openai.com/index/introducing-codex/">Codex</a></strong>. This new Codex agent reportedly can <em>&#8220;write, fix bugs and answer codebase questions in a separate sandbox&#8221;</em>, much like Jules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UG2H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UG2H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UG2H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UG2H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UG2H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UG2H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg" width="480" height="399.6279069767442" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1074,&quot;width&quot;:1290,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:75325,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/169029578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UG2H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UG2H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UG2H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UG2H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b9b2d-b350-433f-a894-09e3a9c2d534_1290x1074.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>GitHub (and Microsoft) also announced <strong><a href="https://github.blog/news-insights/product-news/github-copilot-meet-the-new-coding-agent/">GitHub Copilot &#8220;agent&#8221;</a></strong> features around May 2025. At Build 2025, they demoed a <strong>Copilot that can handle entire coding workflows asynchronously</strong>. This includes monitoring your repo for to-do items, self-assigning tasks like generating a PR to fix an issue, etc. Essentially, GitHub is extending Copilot from inline suggestions to an autonomous mode that can act on your repository &#8211; in many ways, this sounds like their answer to Jules and Codex. Given GitHub&#8217;s deep integration in developer workflows, Copilot Agent could directly live in the GitHub UI (imagine a &#8220;Copilot, fix this issue&#8221; button on a Pull Request that triggers the agent).</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;6f2f5f96-8823-4078-8f01-34c73442aeda&quot;,&quot;duration&quot;:null}"></div><p>One challenge with background agents is <strong>trust</strong>. If an AI is working for an hour on your repo unsupervised, you want to be sure it doesn&#8217;t go off the rails. The current implementations mitigate risk by <em>operating in sandboxes or branches</em>, and requiring human review before changes get to main. Jules, for instance, works on a separate branch and requires you to approve its diff and PR. OpenAI&#8217;s Codex sandbox implies it doesn&#8217;t directly touch your actual repo until you merge its output. This is good &#8211; it keeps the human in the loop at key checkpoints.</p><p>Another challenge is <strong>scoping the task well</strong>. If you give an overly broad instruction, an agent might attempt a huge refactor that becomes hard to validate. The best use cases are somewhat bounded tasks. I've found that even if you ultimately want something big, it helps to break it down. For example, instead of "migrate my whole app from Django to FastAPI" (which is enormous), you might start with "generate a plan for migrating from Django to FastAPI" and then feed sub-tasks. Future agents might be smart enough to do this decomposition themselves (some already attempt it &#8211; Jules internally creates a plan). But guiding them with manageable tasks makes for better outcomes today.</p><p>One very cool aspect of Jules: <em>it can handle multiple requests simultaneously in parallel</em>. That means if you have several tasks (write tests, update deps, etc.), Jules will utilize multiple agents or threads to do them concurrently in the cloud VM. In the demo at Google I/O, they showed Jules fixing several bugs in parallel. This parallelism in async mode is like combining the ideas of the previous section (multiple agents) with the cloud scale &#8211; you as a developer might just fire off a batch of tasks and come back later to find a set of PRs ready. It&#8217;s a glimpse of a near-future developer experience where a lot of the "busy work" in coding truly happens without constant human attention.</p><p>The developer community&#8217;s reaction to these async agents has been a mix of awe and healthy skepticism. Awe, because when it works, it feels magical (folks sharing on Twitter how Jules fixed something while they worked on something else). Skepticism, because sometimes the agents get things wrong or produce suboptimal code, and a human has to clean it up. From my perspective, even if these agents are <a href="https://addyo.substack.com/p/the-70-problem-hard-truths-about">only 70% good</a> right now, they&#8217;re improving rapidly. And even 70% is a huge help if they can knock out the boring stuff.</p><p>I personally tried Jules on a few tasks in a side project &#8211; one being &#8220;upgrade this project to the latest version of Node.js and fix any compatibility issues&#8221;. It actually <em>did it</em>: it bumped the Node version in config, updated a couple of libraries, found some deprecated API usage and changed it, and all tests passed. It left a few things untouched that maybe could be improved, but it saved me a chunk of time. Another time, I asked it to generate an audio summary of recent changes in a repo, and listening to that felt like a mini podcast of my codebase&#8217;s progress. These experiences felt like early glimpses of offloading the &#8220;maintenance engineer&#8221; duties to an AI.</p><p>In summary, <strong>async coding agents</strong> like Jules and Codex are turning coding into a background activity for certain tasks. You assign a job and check back on results. It&#8217;s a different workflow &#8211; almost like how you delegate to a build server or a CI pipeline. In fact, it&#8217;s natural to integrate these agents with CI, which leads us to the next topic: what about using AI agents to <em>maintain</em> code quality and fix issues continuously?</p><h2>When multiple capabilities converge</h2><div><hr></div><blockquote><p><strong>Enterprise-grade platforms are emerging that combine CLI, IDE, orchestration, and async capabilities into unified development experiences, challenging the single-purpose tool approach.</strong></p></blockquote><p>While specialized tools excel in specific categories - CLI agents, orchestrators, or async coders - a new class of integrated platforms is emerging that combines multiple agentic capabilities into comprehensive development ecosystems. These platforms represent the maturation of AI coding tools from experimental single-purpose agents to production-ready development environments that can handle end-to-end workflows.</p><p>In recent months, <a href="https://ampcode.com/">Sourcegraph&#8217;s </a><strong><a href="https://ampcode.com/">Amp</a></strong> (or AmpCode) has gained significant attention, especially among enterprise users seeking robust AI coding agents. Positioned in a research preview phase, it already demonstrates unique strengths that distinguish it from other agents. Installation is seamless: users can enable Amp via a<a href="https://marketplace.visualstudio.com/items?itemName=sourcegraph.amp"> VS Code extension</a> (including Cursor, VSCodium, Insiders forks), or choose to run it directly as a <a href="https://www.npmjs.com/package/@sourcegraph/amp">command-line tool</a>. This flexibility reflects Amp&#8217;s design philosophy - deeply integrated yet adaptable to diverse developer workflows. Here&#8217;a a demo from Amp&#8217;s <a href="https://x.com/daniel_mac8/status/1945227816926167321">Daniel Mac</a> showing how to add a new model provider to an AI chat app, using 'Oracle', Amp's o3 based sub-agent, to research and create a plan:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;ab8601f1-7dd6-46d2-b2f8-f0462be8aa19&quot;,&quot;duration&quot;:null}"></div><p>One of Amp&#8217;s most discussed technical advantages is its fixed <strong>200,000-token context window</strong>. The design aggressively leverages this to retain as much relevant codebase context as possible. When conversations grow large, developers can use features like compact thread summarization or spawn new threads initialized with summaries - preserving continuity without exceeding token constraints (<a href="https://www.reddit.com/r/cursor/comments/1kpin6e/tried_amp_sourcegraphs_new_ai_coding_agent_heres">Reddit discussion</a>).</p><p>In terms of extensibility, Amp incorporates <a href="https://addyo.substack.com/p/mcp-what-it-is-and-why-it-matters">MCP</a> in a straightforward interface accessible from its chat UI. Pre-bundled MCP servers support dynamic tasks - such as generating mermaid graphs inside conversation threads - and developers can define command allowlists stored directly in their repository. These features form a thoughtful foundation for secure and auditable agent execution in enterprise environments (<a href="https://www.reddit.com/r/cursor/comments/1kpin6e/tried_amp_sourcegraphs_new_ai_coding_agent_heres">Reddit overview</a>). Here&#8217;s a demo from <a href="https://x.com/jdorfman/status/1940272202487914735">Justin Dorfman</a> of Playwright MCP, one of the prebundled MCPs, to identify slow-loading pages on localhost:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;24209b2c-977e-44d7-a2e1-39e1b0204c03&quot;,&quot;duration&quot;:null}"></div><p>One user I spoke to appreciated Amp&#8217;s ability to act like a junior engineer - compiling, running tests, making coordinated changes across classes or modules - while another cautioned about token costs and the importance of review processes. As one thread noted, <em>&#8220;Amp does walk you through the changes it&#8217;s making&#8230; I actually prefer this over Cursor&#8217;s one-by-one change </em></p><p>However, Amp&#8217;s design choices are not without tradeoffs. Current iterations rely <a href="https://ampcode.com/manual#:~:text=Using%20Amp-,How%20to%20Prompt,-Amp%20currently%20uses">heavily</a> on <strong>Claude Sonnet</strong>, with little support for other models, open API keys, or private deployments. Sourcegraph frames this as a deliberate decision to deeply optimize around a single model rather than spreading across many, but it limits flexibility for organizations requiring model governance or internal hosting.</p><blockquote><p>&#8220;No model selector, always the best models. You don&#8217;t pick models, we do. Instead of offering selectors and checkboxes and building for the lowest common denominator, Amp is built to use the full capabilities of the best models.&#8221; is an intentional principle in <a href="https://ampcode.com/manual">their docs</a>.</p></blockquote><p>Several aspects of Amp&#8217;s state also raised caution among reviewers. Threads are stored on Sourcegraph&#8217;s servers by default, which may conflict with stringent data policies; edit operations can be auto-applied (though Git review remains available as a safety net); and features like leaderboards or shared prompts may feel misaligned with professional developer norms - even though they can be disabled upon user request. Context leakage across large monorepos was an initial concern, later clarified by insiders: Amp does not pull in entire repositories inadvertently - unlike its predecessor Cody - though prompt token usage remains nontrivial (<a href="https://www.reddit.com/r/cursor/comments/1kpin6e/tried_amp_sourcegraphs_new_ai_coding_agent_heres">Reddit thread</a>).</p><p>Across from Amp, <strong><a href="http://warp.dev">Warp&#8217;s Agentic Development Environment</a></strong> offers a very different vision of platform integration. Rethinking the terminal from the ground up,<a href="https://www.fastcompany.com/91356732/warps-new-agentic-development-environment-helps-developers-work-with-ai-coding-agents"> Warp 2.0</a> delivers a Rust-based, GPU-accelerated interface where traditional shell commands coexist with natural-language agent interactions. Warp allows users to <strong>manage multiple agent sessions in parallel</strong>, share commands via Warp Drive, and selectively approve AI-generated diffs before they are applied.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K9-e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K9-e!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K9-e!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K9-e!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K9-e!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K9-e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg" width="1440" height="823" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:823,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;May be a graphic of poster and text that says '/warp internal search and term nternal provided, elds. run FTS query Agents Your central hub for managing agents provided, diff showing &#956;'&#964;&#953; return all records as before. this stepb step Create changes. Create REST API endpoints webhook: ${{ webhook- type: module checke coverage to parsing create elease Identif branch blocks: type: section text type: ${{ brar mrkdwn module bug checkc the login github. Post ${{ success() uses: with: 1/$ &#4118;&#4101;&#4154;&#4170; rebase Terraform config for Gcloud setup ${{ webhook-type: commLT5 &#964;&#960;&#964; webhook url incoming-webhook Create repasea new Oto matn DaCk lean base corrected others branch under neH name: push this'&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="May be a graphic of poster and text that says '/warp internal search and term nternal provided, elds. run FTS query Agents Your central hub for managing agents provided, diff showing &#956;'&#964;&#953; return all records as before. this stepb step Create changes. Create REST API endpoints webhook: ${{ webhook- type: module checke coverage to parsing create elease Identif branch blocks: type: section text type: ${{ brar mrkdwn module bug checkc the login github. Post ${{ success() uses: with: 1/$ &#4118;&#4101;&#4154;&#4170; rebase Terraform config for Gcloud setup ${{ webhook-type: commLT5 &#964;&#960;&#964; webhook url incoming-webhook Create repasea new Oto matn DaCk lean base corrected others branch under neH name: push this'" title="May be a graphic of poster and text that says '/warp internal search and term nternal provided, elds. run FTS query Agents Your central hub for managing agents provided, diff showing &#956;'&#964;&#953; return all records as before. this stepb step Create changes. Create REST API endpoints webhook: ${{ webhook- type: module checke coverage to parsing create elease Identif branch blocks: type: section text type: ${{ brar mrkdwn module bug checkc the login github. Post ${{ success() uses: with: 1/$ &#4118;&#4101;&#4154;&#4170; rebase Terraform config for Gcloud setup ${{ webhook-type: commLT5 &#964;&#960;&#964; webhook url incoming-webhook Create repasea new Oto matn DaCk lean base corrected others branch under neH name: push this'" srcset="https://substackcdn.com/image/fetch/$s_!K9-e!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K9-e!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K9-e!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K9-e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7057bd2-ebdf-4cb5-86ef-23d612beea07_1440x823.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Warp agents notify you when they need human help: approving a code diff, running a kubectl command, approving a commit message. Plus, you can keep track of all your agents in one, centralized panel.</p></blockquote><p>Warp&#8217;s approach as creating a workspace that is neither IDE nor plain terminal: it is built to let developers coordinate and supervise AI agents across tasks like deployment, debugging, and log analysis. Here&#8217;s a demo I liked by <a href="https://x.com/zachlloydtweets/status/1917292654708035658">Zach Lloyd</a> showing Here's how he uses Warp&#8217;s Agent Mode to update PR descriptions as he codes:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;3fbf66d0-c26f-44f5-8d19-6bc4bd2da3d5&quot;,&quot;duration&quot;:null}"></div><p>Warp&#8217;s controls restrict risky actions (for example deleting files) until explicit human consent is given. This agentive model mirrors classic command-line workflows but with integrated AI oversight.</p><p>Warp's performance credentials are impressive - scoring #1 on<a href="https://www.tbench.ai/"> Terminal-Bench</a> and achieving 71% on SWE-Bench Verified demonstrates that their integrated approach doesn't sacrifice capability for convenience. According to<a href="https://www.producthunt.com/products/warp/reviews?review=1285338"> user feedback</a>, users value Warp&#8217;s speed, command-sharing features, secret redaction and seamless customization of terminal blocks - all positioning it as a terminal that truly harnesses AI while preserving developer control.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hu9L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hu9L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 424w, https://substackcdn.com/image/fetch/$s_!hu9L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 848w, https://substackcdn.com/image/fetch/$s_!hu9L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!hu9L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hu9L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png" width="1456" height="733" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:733,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:158066,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/169029578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hu9L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 424w, https://substackcdn.com/image/fetch/$s_!hu9L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 848w, https://substackcdn.com/image/fetch/$s_!hu9L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!hu9L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d665dc-98d4-48ff-a68a-e0cafd4cb194_2564x1290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That said,<a href="https://www.reddit.com/r/WarpTerminal/comments/1ljoj7k/introducing_warp_20_the_agentic_development"> Reddit commentary</a> flags areas for improvement in large-scale use: some users report indexing challenges with large repositories, and concerns around high request volume - prompt cost multiplied across heavy agent activity - are increasingly common. That said, Warp still allows complete AI functionality toggling; absent AI, the terminal reverts to its previous lightweight state.</p><p>Taken together, <strong>Amp and Warp</strong> exemplify two complementary trajectories in agentic developer tooling. Amp is designed for deep code-intelligence workflows and enterprise collaboration, excelling where coordinated refactoring across extensive codebases matters. Warp instead focuses on synergies between command-line power and agent orchestration, delivering a modern terminal that supports supervised AI activity across development tasks.</p><p>Both platforms signal a broader industry shift from narrow, single-purpose assistants toward holistic, outcome-oriented AI environments. Developers can now work with agents capable of multitasking, context-driven planning, and end-to-end execution - minimizing manual intervention while ensuring procedural oversight and enterprise readiness. As individual tools, they embody the new generation of AI-augmented development platforms rather than incremental, model-bound utilities.</p><h2><strong>Self-healing codebases? AI in testing, debugging and CI/CD</strong></h2><blockquote><p><strong>CI becomes self&#8209;healing when agents not only detect failures but propose, validate, and apply fixes - making builds proactively resilient and developer flow unbroken.</strong></p></blockquote><p>Coding doesn&#8217;t stop at writing features. A huge part of the SDLC is testing, debugging, code review, and continuous integration (CI) &#8211; ensuring that the software works and continues to work with each change. It&#8217;s in these &#8220;adjacent&#8221; stages that AI is also making a big impact, often in ways that complement the coding agents we discussed. After all, what good is an AI-generated PR if it breaks your build or introduces subtle bugs? Thankfully, we are seeing patterns for <strong>AI-assisted testing and CI</strong> that could make our codebases more resilient (perhaps <em>more</em> resilient than purely human-driven ones).</p><p>One exciting development is <strong>AI-powered self-healing CI pipelines</strong>. The idea: when your CI tests fail, instead of just notifying you and waiting for you to fix it, an AI agent can automatically diagnose and even fix the issue. The Nx team (creators of the Nx build system for monorepos) recently announced <strong><a href="https://nx.dev/blog/nx-self-healing-ci#:~:text=Every%20developer%20knows%20this%20workflow%3A">Nx Cloud Self-Healing CI</a></strong>, which does exactly this. </p><div id="youtube2-JW5Ki3PkRWA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;JW5Ki3PkRWA&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/JW5Ki3PkRWA?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>They described a scenario: you push some code, and CI fails due to a silly mistake (like a missing import or a small test assertion fail). Normally, you might not notice for 30 minutes, then have to context-switch to fix it, push again, and waste time. With Self-Healing CI, what happens instead is &#8220;magical&#8221;:</p><ul><li><p><strong>Failure detected</strong>: Upon a test failure, automatically start an AI agent to investigate.</p></li><li><p><strong>AI diagnosis</strong>: The agent looks at the error logs and, importantly, has knowledge of your codebase structure. So it knows where to look.</p></li><li><p><strong>Proposes a fix</strong>: The agent comes up with a fix &#8211; e.g., add the missing import, adjust the test expectation, etc. &#8211; and presents it to you, either in your IDE or as a comment on the PR.</p></li><li><p><strong>Validates the fix</strong>: In parallel, the agent runs the tests again (or the subset that failed) with the proposed change to ensure it actually resolves the issue.</p></li><li><p><strong>Human approval</strong>: You get a notification (e.g., in VS Code) that says &#8220;AI fix available: added missing import X&#8221;. You can review the one-line change. If it looks good and the validation passed, you hit approve.</p></li><li><p><strong>Apply and re-run</strong>: Once approved, the fix commit is applied to your PR automatically, and the full CI runs again (which now likely passes).</p></li></ul><p>All of this can happen in a short time window. </p><p>Crucially, they emphasize that the developer <strong>stays in control</strong> &#8211; the AI doesn&#8217;t just commit changes on its own without your OK. This &#8220;human in the loop&#8221; approach is wise; it builds trust. Over time, if these AI fixes prove consistently correct, maybe teams will auto-approve minor ones, but it&#8217;s good that initial designs assume oversight.</p><p>Self-healing CI addresses a real pain: so much time is lost in the dev cycle due to trivial breakages and the latency of feedback. If an AI can shave that down, it keeps developers in flow. And it&#8217;s not just Nx; we&#8217;re seeing similar ideas elsewhere. There&#8217;s a tool called <strong><a href="https://gitauto.ai/docs/triggers/test-failure#:~:text=GitAuto%20Test%20Failure%20Trigger%20,cause%2C%20and%20creates%20fix%20commits">GitAuto</a></strong> that can analyze a failing CI run and automatically create a fix PR with the necessary changes. CircleCI published a guide on using AI (with their platform&#8217;s APIs) to resolve test failures with zero guesswork. GitLab is integrating AI to <a href="https://about.gitlab.com/blog/quickly-resolve-broken-ci-cd-pipelines-with-ai/#:~:text=Quickly%20resolve%20broken%20CI%2FCD%20pipelines,within%20the%20DevSecOps%20platform">suggest fixes</a> right within Merge Request UI when something fails.</p><p>Beyond CI, consider <strong>runtime debugging</strong>. There was a rather bold experiment shared on Reddit: a developer let an AI agent <a href="https://www.reddit.com/r/ChatGPTCoding/comments/1jibmtc/i_made_ai_fix_my_bugs_in_production_for_27_days/#:~:text=I%20made%20AI%20fix%20my,wanted%20to%20share%20the%20results">attempt to fix</a> production bugs <em>for 27 days straight</em>, automatically generating PRs for any exceptions caught in production. According to their post, the AI managed to resolve a bunch of issues on its own, and the dev team only intervened occasionally. That&#8217;s a bit bleeding-edge and risky for many, but it shows where things could head: an AI ops agent that monitors logs/errors and continuously improves the code.</p><p><strong>AI code review</strong> is another adjacent area. I anticipate that having an &#8220;AI reviewer&#8221; will become common. GitHub is already previewing features where Copilot will automatically suggest improvements in a pull request, highlight insecure code, or summarize a PR&#8217;s changes for the human reviewers. Open source projects like <strong><a href="https://github.com/qodo-ai/pr-agent">pr-agent</a></strong> hook models into your PR workflow to do things like explain the code changes or point out potential issues. While not fully trusted to approve/deny PRs, these AI reviews are like having a diligent junior reviewer who never gets tired. They can enforce style guides, catch obvious bugs, ensure test coverage, etc. I&#8217;ve used one such tool to get a second opinion on my PRs &#8211; sometimes it&#8217;s surface-level, but occasionally it points out something I missed, like <em>&#8220;This function isn&#8217;t handling X case; is that intentional?&#8221;</em>. It&#8217;s easy to see every team having an AI reviewer integrated into GitHub or GitLab, raising the baseline quality of code reviews.</p><p>Let&#8217;s talk about <strong>flaky tests and maintenance</strong> as well. Companies like <a href="http://trunk.io">Trunk.io</a> advertise an <em>&#8220;AI DevOps agent&#8221;</em> that, among other things, can detect flaky tests, quarantine them, or suggest fixes. Maintenance tasks such as updating dependencies, cleaning up warnings, etc., can be handled by periodic AI agent runs. Some teams have scheduled jobs where an AI opens PRs weekly for dependency bumps (tools like Dependabot do minor version bumps, but an AI could handle major upgrades that need code changes). This merges into the idea of <em>continuous improvement bots</em>.</p><p>In essence, the <em>agentic future</em> isn&#8217;t just writing new features &#8211; it&#8217;s also maintaining and improving code continuously. A lot of engineering effort (some say ~30-50%) in a mature codebase goes into maintenance, refactoring, keeping tests green, etc. If AI can take a chunk of that, it frees humans to focus on more complex design and new functionality.</p><p>To illustrate how far things have come, consider this: Just two years ago, &#8220;AI in coding&#8221; mostly meant code completion or maybe a bot that could create a simple PR from a template. Now we&#8217;re talking about AI autonomously <strong>monitoring, coding, and fixing</strong> software in a loop. And it&#8217;s actually happening in real projects.</p><p>Of course, developers still have to make judgment calls. Not every failing test should be &#8220;fixed&#8221; by changing the code &#8211; sometimes the test caught a real bug in the logic or the fix might have side effects. So a human needs to ensure the AI&#8217;s solution is correct for the broader requirements. This again highlights the evolving role: we become the reviewers and approvers, ensuring the AI&#8217;s outputs align with what&#8217;s truly needed.</p><p><strong>In summary</strong>, the future of coding involves AI not just in writing code but in <em>verifying and polishing</em> it. </p><p>We&#8217;ll have agents that act as testers, reviewers, and ops engineers. The codebase could become a more living thing that partially maintains itself. This doesn&#8217;t eliminate the need for human oversight &#8211; rather, it raises the baseline so that humans deal with the more complex issues. </p><p>Now, let&#8217;s step back and consider how all these pieces come together in our day-to-day tools and workflows. The line between &#8220;coding&#8221; and &#8220;prompting&#8221; is blurring. IDEs are changing, and new ones are emerging built from the ground up for this agentic paradigm.</p><p><strong>Multi-modal and &#8220;full stack&#8221; agents</strong> &#8211; As AI models gain capabilities like vision, we&#8217;ll see development agents that can handle not just code, but also UI design, graphic assets, etc. OpenAI&#8217;s <em>Operator</em> agent for web browsing shows that an AI can <a href="https://openai.com/index/introducing-operator/#:~:text=Operator%20is%20powered%20by%20a,people%20see%20on%20a%20screen">operate</a> a browser UI like a human. Translate that to dev tools: an AI could use a GUI builder or drag-drop interface. Perhaps the agent of the future could open Figma designs and generate corresponding code, or vice versa. In fact, Anthropic&#8217;s Claude Code via MCP can pull in design docs or Figma assets to understand what needs to be built. That&#8217;s an early step toward multi-modal development assistance.</p><p><strong>Collaboration with human developers</strong> is also being rethought. For instance, suppose a team of 5 devs is working on a project. In the future, maybe each dev has their own AI agent that learns their coding style and assists them, and the agents also communicate with each other (with permission) to ensure their code changes align. It sounds wild, but I imagine something like a &#8220;team of 5 devs + 5 AI sidekicks&#8221; where the AIs exchange notes (like one agent says &#8220;hey, Alice&#8217;s agent, I&#8217;m updating the API interface, you might want to update Alice&#8217;s UI code usage of it&#8221;). This would essentially mirror how a well-coordinated team works, but at light speed. Early hints of this appear in orchestrator tools that share context between agents. Magnet&#8217;s approach of clarifying questions and context sharing between tasks is a manual form of that. Perhaps soon the agents will negotiate task splits themselves.</p><p><strong>The norm likely to emerge:</strong> A few years down the line, I suspect it will be completely normal for a software engineer to have a development environment where:</p><ul><li><p>You describe what you want at a high level, and an AI agent generates an initial implementation.</p></li><li><p>You then engage in a back-and-forth with the agent to refine it (maybe using both CLI and IDE interactions).</p></li><li><p>If it&#8217;s a big task, you delegate subtasks to multiple agents (possibly with specialized skills).</p></li><li><p>When you hit &#8220;run tests&#8221; or &#8220;deploy&#8221;, AI agents automatically fix small issues and ensure the pipeline is green.</p></li><li><p>Code reviews are partially automated &#8211; AI gives feedback, humans focus on higher-level concerns.</p></li><li><p>Documentation and ancillary artifacts are updated by AI.</p></li><li><p>The developer&#8217;s main job is defining goals, constraints, reviewing important changes, and making architectural decisions. Essentially the human provides vision and oversight, the AI handles execution details.</p></li></ul><p>We&#8217;re already living a prototype of that workflow in 2025; it&#8217;s just not evenly distributed. Some bleeding-edge teams are close to this, while others are just now trying a Copilot suggestion for the first time. But the trajectory is clear enough that I feel confident this will be widespread.</p><h2><strong>Challenges, Limitations, and the human touch</strong></h2><blockquote><p><strong>As humans shift from coder to conductor, the core skill becomes oversight - writing good briefs, reviewing agent output, and catching edge&#8209;case failures. AI amplifies mistakes if unchecked.</strong></p></blockquote><p>Before concluding, it&#8217;s important to temper the excitement with some reality checks. As amazing as these tools are becoming, they introduce challenges &#8211; technical, ethical, and professional.</p><p><strong>Quality and correctness</strong>: AI agents can produce wrong or inefficient code. They lack true understanding and may not foresee edge cases. We&#8217;ve all seen LLMs hallucinate or make up facts; in code, that might mean introducing a subtle bug or using an outdated API. Testing mitigates this, but not everything is easily testable (e.g. did the AI inadvertently create a security vulnerability or a performance bottleneck?). So we still need skilled engineers to audit and refine AI output. In the near future, one of the most valuable skills will be <em>AI-assisted debugging</em>: understanding where the AI&#8217;s reasoning might have gone astray and correcting it. It&#8217;s the &#8220;<a href="https://addyo.substack.com/p/the-trust-but-verify-pattern-for">verify and validate</a>&#8221; loop where humans excel.</p><p><strong>Prompt engineering and task specification</strong>: Getting the best out of an agent requires clear communication. If you underspecify a task, the AI might do something unexpected. If you overspecify, you might as well code it yourself. We&#8217;ll need to learn how to write effective &#8220;AI specs&#8221; &#8211; akin to writing a good spec for a junior developer, but even more explicit at times. Interestingly, this might force us to think more clearly about what we want before diving in, which could be a good thing. I&#8217;ve found myself writing out the desired behavior in detail for the AI, and realizing in the process that I had gaps in my own plan.</p><p><strong>Cost considerations</strong>: While many examples we discussed have free tiers or personal limits (Gemini CLI free access, Jules free beta), ultimately someone pays for those cycles. Running dozens of agents in parallel might burn through tokens (and $$) quickly if not managed. Organizations will have to factor AI compute into their dev costs. It&#8217;s possible that AI-augmented development, despite saving time, could increase cloud costs (at least until model inference gets cheaper). Open-source models running locally offer an alternative for some cases, but the best capabilities still reside in large proprietary models as of 2025. However, given the competition, we may see more cost-effective options or even hardware optimized for these tasks (imagine your dev machine having a built-in AI accelerator to run 20 local model instances cheaply).</p><p>Despite these challenges, I&#8217;m optimistic. Every new abstraction in programming (from assembly to high-level, from manual memory to garbage collection, etc.) faced skepticism, yet ultimately made us more productive and enabled building more complex systems. Agentic AI for coding feels like a similar leap. We just have to apply the same rigor we always have in software engineering: code review, testing, monitoring in production &#8211; those practices don&#8217;t go away; they adapt to include AI in the loop.</p><h2><strong>Conclusion</strong></h2><p>The way we write software in the post-2025 world is transforming into something both exhilarating and a bit surreal. Programming is becoming less about typing out every line and more about <strong>guiding, supervising, and collaborating</strong> with these agentic tools.</p><p>To recap the key patterns shaping this future:</p><ul><li><p><strong>AI coding agents (CLI and IDE)</strong> are enabling us to work at a higher level of abstraction &#8211; focusing on <em>what</em> we want to build, while the AI figures out <em>how</em> in code. Tools like Claude Code, Gemini CLI, OpenCode, Cursor, and Windsurf exemplify this, each bringing their twist but all moving in the same direction.</p></li><li><p><strong>Parallel AI development</strong> allows scaling our efforts horizontally &#8211; it&#8217;s now feasible for a single developer to supervise multiple AI &#8220;developers&#8221; working simultaneously. Early orchestrators (Claude Squad, Conductor, Agent Farm, Magnet) show that this can drastically speed up complex or large-scale code modifications.</p></li><li><p><strong>Asynchronous agents</strong> like Jules and Codex let coding happen in the background, turning software development into a more continuous, autonomous process. You can go to lunch and come back to find a feature implemented or a bug fixed (with a PR and diff ready to review).</p></li><li><p><strong>AI in testing/CI</strong> closes the loop, catching and fixing issues so that the code an AI writes is also verified by AI. Self-healing CI, automated test generation, and AI code reviews collectively push us toward a world of <em>self-maintaining codebases</em>.</p></li><li><p><strong>AI-first workflows and environments</strong> are emerging, from fully AI-driven IDEs to deeper integration in platforms like GitHub. They point to a norm where having multiple AI &#8220;assistants&#8221; as part of your dev team is just how things are done.</p></li><li><p><strong>The developer&#8217;s role is shifting</strong> to higher-level decision making, providing oversight, and handling the creative and complex aspects that AI still struggles with. We&#8217;ll increasingly act as architects and conductors of software, not just bricklayers of code.</p></li><li><p><strong>New tools and projects are popping up constantly</strong>, especially in open source. The community on Reddit, Hacker News, and Twitter is actively discussing and iterating on these ideas. For every major product like Copilot or Jules, there&#8217;s an open-source equivalent or experiment that often sparks the next innovation.</p></li></ul><p>So, how do we prepare and adapt? My approach has been to <strong>embrace these tools early</strong> and experiment with integrating them into real workflows. If you&#8217;re a software engineer reading this, I encourage you to try some of the mentioned tools on a personal project. Experience the feeling of hitting a button and watching an AI write your tests or refactor your code. It&#8217;s eye-opening. Simultaneously, practice the skill of critically reviewing AI output. Treat it like you would a human colleague&#8217;s work &#8211; with respect but also scrutiny.</p><p>As we head into this agentic world, I keep an eye on one guiding principle: <strong>developer experience (DX)</strong>. The best tools will be the ones that <em>feel natural</em> in our workflow and amplify our abilities without getting in the way. It&#8217;s easy to be seduced by autonomy for its own sake, but ultimately these agents must serve the developer&#8217;s intent. </p><p>I&#8217;m encouraged that many of the projects I discussed are being built by developers for developers, with active feedback loops. </p><p><em>I&#8217;m excited to share I&#8217;m writing a new <a href="https://www.oreilly.com/library/view/vibe-coding-the/9798341634749/">AI-assisted engineering book</a> with O&#8217;Reilly. If you&#8217;ve enjoyed my writing here you may be interested in checking it out.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pxOC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pxOC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 424w, https://substackcdn.com/image/fetch/$s_!pxOC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 848w, https://substackcdn.com/image/fetch/$s_!pxOC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 1272w, https://substackcdn.com/image/fetch/$s_!pxOC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pxOC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png" width="1456" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:283697,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/169029578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pxOC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 424w, https://substackcdn.com/image/fetch/$s_!pxOC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 848w, https://substackcdn.com/image/fetch/$s_!pxOC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 1272w, https://substackcdn.com/image/fetch/$s_!pxOC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa723053-031c-4c09-ac2d-283213dda75f_2912x2096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Context Engineering: Bringing Engineering Discipline to Prompts]]></title><description><![CDATA[A practical guide to information architecture of AI prompts]]></description><link>https://addyo.substack.com/p/context-engineering-bringing-engineering</link><guid isPermaLink="false">https://addyo.substack.com/p/context-engineering-bringing-engineering</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Sun, 13 Jul 2025 19:34:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xA9A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>TL;DR:</strong> <em>&#8220;Context engineering&#8221; means providing an AI (like an LLM) with all the information and tools it needs to successfully complete a task &#8211; not just a cleverly worded prompt. It&#8217;s the evolution of <a href="https://addyo.substack.com/p/the-prompt-engineering-playbook-for">prompt engineering</a>, reflecting a broader, more system-level approach. </em></p><h2><strong>Context engineering tips:</strong></h2><p><strong>To get the best results from an AI, you need to provide clear and specific context. The quality of the AI's output directly depends on the quality of your input.</strong></p><p><strong>How to improve your AI prompts</strong></p><ul><li><p><strong>Be precise:</strong> Vague requests lead to vague answers. The more specific you are, the better your results will be.</p></li><li><p><strong>Provide relevant code: </strong>Share the specific files, folders, or code snippets that are central to your request.</p></li><li><p><strong>Include design documents: </strong>Paste or attach sections from relevant design docs to give the AI the bigger picture.</p></li><li><p><strong>Share full error logs: </strong>For debugging, always provide the complete error message and any relevant logs or stack traces.</p></li><li><p><strong>Show database schemas: </strong>When working with databases, a screenshot of the schema helps the AI generate accurate code for data interaction.</p></li><li><p><strong>Use PR feedback: </strong>Comments from a pull request make for context-rich prompts.</p></li><li><p><strong>Give examples: </strong>Show an example of what you want the final output to look like.</p></li><li><p><strong>State your constraints: </strong>Clearly list any requirements, such as libraries to use, patterns to follow, or things to avoid.</p></li></ul><h2><strong>From &#8220;Prompt Engineering&#8221; to &#8220;Context Engineering&#8221;</strong></h2><p><strong>Prompt engineering was about cleverly phrasing a question; context engineering is about constructing an entire information environment so the AI can solve the problem reliably.</strong></p><p>&#8220;Prompt engineering&#8221; became a buzzword essentially meaning the skill of phrasing inputs to get better outputs. It taught us to &#8220;program in prose&#8221; with clever one-liners. But outside the AI community, many took prompt engineering to mean just typing fancy requests into a chatbot. The term never fully conveyed the real sophistication involved in using LLMs effectively.</p><p>As applications grew more complex, the limitations of focusing only on a single prompt became obvious. One analysis quipped: <em>Prompt engineering walked so context engineering could run.</em> In other words, a witty one-off prompt might have wowed us in demos, but building <strong>reliable, industrial-strength LLM systems</strong> demanded something more comprehensive.</p><p>This realization is why our field is coalescing around <strong>&#8220;context engineering&#8221;</strong> as a better descriptor for the craft of getting great results from AI. Context engineering means constructing the entire <strong>context window</strong> an LLM sees &#8211; not just a short instruction, but all the relevant background info, examples, and guidance needed for the task.</p><p>The phrase was popularized by developers like Shopify&#8217;s CEO Tobi L&#252;tke and AI leader Andrej Karpathy in mid-2025. </p><blockquote><p><em>&#8220;I really like the term &#8216;context engineering&#8217; over prompt engineering,&#8221;</em> wrote Tobi. <em>&#8220;It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.&#8221;</em> Karpathy emphatically agreed, noting that <em>people associate prompts with short instructions, whereas in every serious LLM application, <strong>context engineering</strong> is the delicate art and science of filling the context window with just the right information for each step</em>. </p></blockquote><p>In other words, real-world LLM apps don&#8217;t succeed by luck or one-shot prompts &#8211; they succeed by carefully assembling context around the model&#8217;s queries.</p><p>The change in terminology reflects an evolution in approach. If prompt engineering was about coming up with a magical sentence, context engineering is <a href="https://analyticsindiamag.com/ai-features/context-engineering-is-the-new-vibe-coding/#:~:text=If%20prompt%20engineering%20was%20about,in%20favour%20of%20context%20engineering">about</a> <strong>writing the full screenplay</strong> for the AI. It&#8217;s a structural shift: prompt engineering ends once you craft a good prompt, whereas context engineering begins with designing whole systems that bring in memory, knowledge, tools, and data in an organized way. </p><p>As Karpathy explained, doing this well involves everything from <strong>clear task instructions</strong> and explanations, to providing few-shot examples, retrieved facts (RAG), possibly multimodal data, relevant tools, state history, and careful compacting of all that into a limited window. <strong>Too little context (or the wrong kind) and the model will lack the information to perform optimally; too much irrelevant context and you waste tokens or even degrade performance.</strong> <strong>The sweet spot is non-trivial to find.</strong> No wonder Karpathy calls it both a science and an art.</p><p>The term <strong>context engineering</strong> is catching on because it intuitively captures what we actually do when building LLM solutions. &#8220;Prompt&#8221; sounds like a single short query; &#8220;context&#8221; implies a richer information state we prepare for the AI. </p><p>Semantics aside, why does this shift matter? Because it marks a maturing of our mindset for AI development. We&#8217;ve learned that <strong>generative AI in production is less like casting a single magic spell and more like engineering an entire environment</strong> for the AI. A one-off prompt might get a cool demo, but for robust solutions you need to control what the model &#8220;knows&#8221; and &#8220;sees&#8221; at each step. It often means retrieving relevant documents, summarizing history, injecting structured data, or providing tools &#8211; whatever it takes so the model isn&#8217;t guessing in the dark. The result is we no longer think of prompts as one-off instructions we hope the AI can interpret. We think in terms of <strong>context pipelines</strong>: all the pieces of information and interaction that set the AI up for success.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wHL3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wHL3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!wHL3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!wHL3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!wHL3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wHL3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png" width="548" height="548" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:548,&quot;bytes&quot;:1126931,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/168177841?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wHL3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!wHL3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!wHL3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!wHL3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb844018-f295-4063-93ea-6aa8cf72e322_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To illustrate, consider the difference in perspective. Prompt engineering was often an exercise in clever wording (&#8220;Maybe if I phrase it this way, the LLM will do what I want&#8221;). Context engineering, by contrast, feels more like traditional engineering: <em>What inputs (data, examples, state) does this system need? How do I get those and feed them in? In what format? At what time?</em> We&#8217;ve essentially gone from squeezing performance out of a single prompt to designing <em>LLM-powered systems</em>.</p><h2><strong>What Exactly </strong><em><strong>Is</strong></em><strong> Context Engineering?</strong></h2><p><strong>Context engineering means dynamically giving an AI everything it needs to succeed &#8211; the instructions, data, examples, tools, and history &#8211; all packaged into the model&#8217;s input context at runtime.</strong></p><p>A useful <a href="https://blog.langchain.com/context-engineering-for-agents/#:~:text=As%20Andrej%20Karpathy%20puts%20it%2C,Karpathy%20summarizes%20this%20well">mental model</a> (suggested by Andrej Karpathy and others) is to think of an LLM like a CPU, and its context window (the text input it sees at once) as the RAM or working memory. As an engineer, your job is akin to an operating system: <strong>load that working memory with just the right code and data for the task</strong>. In practice, this context can come from many sources: the user&#8217;s query, system instructions, retrieved knowledge from databases or documentation, outputs from other tools, and summaries of prior interactions. Context engineering is about orchestrating all these pieces into the prompt that the model ultimately sees. It&#8217;s not a static prompt but a dynamic assembly of information at runtime.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nNCu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nNCu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!nNCu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!nNCu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!nNCu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nNCu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png" width="550" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:550,&quot;bytes&quot;:1416189,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/168177841?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nNCu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!nNCu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!nNCu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!nNCu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21bfaea3-9430-4fb7-97c5-1c984ec1ae87_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Illustration: multiple sources of information are composed into an LLM&#8217;s context window (its &#8220;working memory&#8221;). The context engineer&#8217;s goal is to fill that window with the right information, in the right format, so the model can accomplish the task effectively.</em></p><p>Let&#8217;s break down what this involves:</p><ul><li><p><strong>It&#8217;s a system, not a one-off prompt.</strong> In a well-engineered setup, the final prompt the LLM sees might include several components: e.g. a role instruction written by the developer, plus the latest user query, plus relevant data fetched on the fly, plus perhaps a few examples of desired output format. All of that is woven together programmatically. For example, imagine a coding assistant AI that gets the query &#8220;How do I fix this authentication bug?&#8221; The system behind it might automatically search your codebase for related code, retrieve the relevant file snippets, and then construct a prompt like: <em>&#8220;You are an expert coding assistant. The user is facing an authentication bug. Here are relevant code snippets: [code]. The user&#8217;s error message: [log]. Provide a fix.&#8221;</em> Notice how that final prompt is built from multiple pieces. <strong>Context engineering is the logic that decides which pieces to pull in and how to join them.</strong> It&#8217;s akin to writing a function that prepares arguments for another function call &#8211; except here the &#8220;arguments&#8221; are bits of context and the function is the LLM invocation.</p></li><li><p><strong>It&#8217;s dynamic and situation-specific.</strong> Unlike a single hard-coded prompt, context assembly happens <em>per request</em>. The system might include different info depending on the query or the conversation state. If it&#8217;s a multi-turn conversation, you might include a summary of the conversation so far, rather than the full transcript, to save space (and sanity). If the user&#8217;s question references some document (&#8220;What does the design spec say about X?&#8221;), the system might fetch that spec from a wiki and include the relevant excerpt. In short, context engineering logic <em>responds</em> to the current state &#8211; much like how a program&#8217;s behavior depends on input. This dynamic nature is crucial. You wouldn&#8217;t feed a translation model the exact same prompt for every sentence you translate; you&#8217;d feed it the new sentence each time. Similarly, in an AI agent, you&#8217;re constantly updating what context you give as the state evolves.</p></li><li><p><strong>It blends multiple types of content.</strong> LangChain <a href="https://blog.langchain.com/context-engineering-for-agents/#:~:text=What%20are%20the%20types%20of,a%20few%20different%20context%20types">describes</a> context engineering as an umbrella that covers at least three facets of context: (1) <strong>Instructional context</strong> &#8211; the prompts or guidance we provide (including system role instructions and few-shot examples), (2) <strong>Knowledge context</strong> &#8211; domain information or facts we supply, often via retrieval from external sources, and (3) <strong>Tools context</strong> &#8211; information coming from the model&#8217;s environment via tools or API calls (e.g. results from a web search, database query, or code execution). A robust LLM application often needs all three: clear instructions about the task, relevant knowledge plugged in, and possibly the ability for the model to use tools and then incorporate the tool results back into its thinking. Context engineering is the discipline of managing all these streams of information and merging them coherently.</p></li><li><p><strong>Format and clarity matter.</strong> It&#8217;s not just <em>what</em> you include in the context, but <em>how</em> you present it. Communicating with an AI model has surprising parallels to communicating with a human: if you dump a huge blob of unstructured text, the model might get confused or miss the point, whereas a well-organized input will guide it. Part of context engineering is figuring out how to compress and structure information so the model grasps what&#8217;s important. This could mean summarizing long texts, using bullet points or headings to highlight key facts, or even formatting data as JSON or pseudo-code if that helps the model parse it. For instance, if you retrieved a document snippet, you might preface it with something like &#8220;Relevant documentation:&#8221; and put it in quotes, so the model knows it&#8217;s reference material. If you have an error log, you might show only the last 5 lines rather than 100 lines of stack trace. Effective context engineering often involves creative <strong>information design</strong> &#8211; making the input as digestible as possible for the LLM.</p></li></ul><p>Above all, context engineering is about <strong>setting the AI up for success</strong>. </p><p>Remember, an LLM is powerful but not psychic &#8211; it can only base its answers on what&#8217;s in its input plus what it learned during training. If it fails or hallucinates, often the root cause is that we didn&#8217;t give it the right context, or we gave it poorly structured context. When an LLM &#8220;agent&#8221; misbehaves, usually <em>&#8220;the appropriate context, instructions and tools have not been communicated to the model.&#8221;</em> Garbage in, garbage out. Conversely, if you <em>do</em> supply all the relevant info and clear guidance, the model&#8217;s performance improves dramatically.</p><p><strong>Feeding high-quality context: practical tips</strong></p><p>Now, concretely, how do we ensure we&#8217;re giving the AI everything it needs? Here are some pragmatic tips that I&#8217;ve found useful when building AI coding assistants and other LLM apps:</p><ul><li><p><strong>Include relevant source code and data.</strong> If you&#8217;re asking an AI to work on code, provide the relevant code files or snippets. Don&#8217;t assume the model will recall a function from memory &#8211; show it the actual code. Similarly, for Q&amp;A tasks include the pertinent facts or documents (via retrieval). <em>Low context guarantees low-quality output.</em> The model can&#8217;t answer what it hasn&#8217;t been given.</p></li><li><p><strong>Be precise in instructions.</strong> Clearly state what you want. If you need the answer in a certain format (JSON, specific style, etc.), mention that. If the AI is writing code, specify constraints like which libraries or patterns to use (or avoid). Ambiguity in your request can lead to meandering answers.</p></li><li><p><strong>Provide examples of the desired output.</strong> Few-shot examples are powerful. If you want a function documented in a certain style, show one or two examples of properly documented functions in the prompt. Modeling the output helps the LLM understand exactly what you&#8217;re looking for.</p></li><li><p><strong>Leverage external knowledge.</strong> If the task needs domain knowledge beyond the model&#8217;s training (e.g. company-specific details, API specs), retrieve that info and put it in the context. For instance, attach the relevant section of a design doc or a snippet of the API documentation. LLMs are far more accurate when they can cite facts from provided text rather than recalling from memory.</p></li><li><p><strong>Include error messages and logs when debugging.</strong> If asking the AI to fix a bug, show it the full error trace or log snippet. These often contain the critical clue needed. Similarly, include any test outputs if asking why a test failed.</p></li><li><p><strong>Maintain conversation history (smartly).</strong> In a chat scenario, feed back important bits of the conversation so far. Often you don&#8217;t need the entire history &#8211; a concise summary of key points or decisions can suffice and saves token space. This gives the model context of what&#8217;s already been discussed.</p></li><li><p><strong>Don&#8217;t shy away from metadata and structure.</strong> Sometimes telling the model <em>why</em> you&#8217;re giving a piece of context can help. For example: <em>&#8220;Here is the user&#8217;s query.&#8221;</em> or <em>&#8220;Here are relevant database schemas:&#8221;</em> as prefacing labels. Simple section headers like &#8220;User Input: &#8230; / Assistant Response: &#8230;&#8221; help the model parse multi-part prompts. Use formatting (markdown, bullet lists, numbered steps) to make the prompt logically clear.</p></li></ul><p>Remember the golden rule: <strong>LLMs are powerful but they aren&#8217;t mind-readers.</strong> The quality of output is directly proportional to the quality and relevance of the context you provide. Too little context (or missing pieces) and the AI will fill gaps with guesses (often incorrect). Irrelevant or noisy context can be just as bad, leading the model down the wrong path. So our job as context engineers is to feed the model exactly what it needs and nothing it doesn&#8217;t.</p><h2>Addressing the skeptics </h2><p>Let's be direct about the criticisms. Many experienced developers see "context engineering" as either rebranded prompt engineering or, worse, pseudoscientific buzzword creation. These concerns aren't unfounded. Traditional prompt engineering focuses on the instructions you give an LLM. Context engineering encompasses the entire information ecosystem: dynamic data retrieval, memory management, tool orchestration, and state maintenance across multi-turn interactions. Much of current AI work lacks the rigor we expect from engineering disciplines. There's too much trial-and-error, not enough measurement, and insufficient systematic methodology. Let's be honest: even with perfect context engineering, LLMs still hallucinate, make logical errors, and fail at complex reasoning. Context engineering isn't a silver bullet - it's damage control and optimization within current constraints. </p><h2><strong>The Art and Science of effective context</strong></h2><p><strong>Great context engineering strikes a balance &#8211; include everything the model truly needs, but avoid irrelevant or excessive detail that could distract it (and drive up cost).</strong></p><p>As Karpathy described, context engineering is a delicate mix of <em>science</em> and <em>art</em>. </p><p>The &#8220;science&#8221; part involves following certain principles and techniques to systematically improve performance. For example: if you&#8217;re doing code generation, it&#8217;s almost scientific that you should include relevant code and error messages; if you&#8217;re doing question-answering, it&#8217;s logical to retrieve supporting documents and provide them to the model. There are established methods like few-shot prompting, retrieval-augmented generation (RAG), and chain-of-thought prompting that we know (from research and trial) can boost results. There&#8217;s also a science to respecting the model&#8217;s constraints &#8211; every model has a context length limit, and overstuffing that window can not only increase latency/cost but potentially <em>degrade</em> the quality if the important pieces get lost in the noise.</p><blockquote><p>Karpathy summed it up well: <em>&#8220;Too little or of the wrong form and the LLM doesn&#8217;t have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down.&#8221;</em>. </p></blockquote><p>So the science is in techniques for selecting, pruning, and formatting context optimally. For instance, using embeddings to find the most relevant docs to include (so you&#8217;re not inserting unrelated text), or compressing long histories into summaries. Researchers have even catalogued failure modes of <em>long</em> contexts &#8211; things like <strong>context poisoning</strong> (where an earlier hallucination in the context leads to further errors) or <strong>context distraction</strong> (where too much extraneous detail causes the model to lose focus). Knowing these pitfalls, a good engineer will curate the context carefully.</p><p>Then there&#8217;s the &#8220;art&#8221; side &#8211; the intuition and creativity born of experience. </p><p>This is about understanding LLM quirks and subtle behaviors. Think of it like a seasoned programmer who &#8220;just knows&#8221; how to structure code for readability: an experienced context engineer develops a feel for how to structure a prompt for a given model. For example, you might sense that one model tends to do better if you first outline a solution approach before diving into specifics, so you include an initial step like &#8220;Let&#8217;s think step by step&#8230;&#8221; in the prompt. Or you notice that the model often misunderstands a particular term in your domain, so you preemptively clarify it in the context. These aren&#8217;t in a manual &#8211; you learn them by observing model outputs and iterating. <strong>This is where prompt-crafting (in the old sense) still matters</strong>, but now it&#8217;s in service of the larger context. It&#8217;s similar to software design patterns: there&#8217;s science in understanding common solutions, but art in knowing when and how to apply them.</p><p>Let&#8217;s explore a few common strategies and patterns context engineers use to craft effective contexts:</p><ul><li><p><strong>Retrieval of relevant knowledge:</strong> One of the most powerful techniques is Retrieval-Augmented Generation. If the model needs facts or domain-specific data that isn&#8217;t guaranteed to be in its training memory, have your system fetch that info and include it. For example, if building a documentation assistant, you might vector-search your documentation and insert the top matching passages into the prompt before asking the question. This way, the model&#8217;s answer will be grounded in real data you provided, rather than its sometimes outdated internal knowledge. Key skills here include designing good search queries or embedding spaces to get the right snippet, and formatting the inserted text clearly (with citations or quotes) so the model knows to use it. When LLMs &#8220;hallucinate&#8221; facts, it&#8217;s often because we failed to provide the actual fact &#8211; retrieval is the antidote to that.</p></li><li><p><strong>Few-shot examples and role instructions:</strong> This harkens back to classic prompt engineering. If you want the model to output something in a particular style or format, show it examples. For instance, to get structured JSON output, you might include a couple of example inputs and outputs in JSON in the prompt, then ask for a new one. Few-shot context effectively teaches the model by example. Likewise, setting a <strong>system role</strong> or persona can guide tone and behavior (&#8220;You are an expert Python developer helping a user&#8230;&#8221;). These techniques are staples because they work: they bias the model towards the patterns you want. In the context-engineering mindset, prompt wording and examples are just one part of the context, but they remain crucial. In fact, you could say prompt engineering (crafting instructions and examples) is now a <em>subset</em> of context engineering &#8211; it&#8217;s one tool in the toolkit. We still care a lot about phrasing and demonstrative examples, but we&#8217;re also doing all these other things around them.</p></li><li><p><strong>Managing state and memory:</strong> Many applications involve multiple turns of interaction or long-running sessions. The context window isn&#8217;t infinite, so a major part of context engineering is deciding how to handle conversation history or intermediate results. A common technique is <strong>summary compression</strong> &#8211; after each few interactions, summarize them and use the summary going forward instead of the full text. For example, Anthropic&#8217;s Claude assistant automatically does this when conversations get lengthy, to avoid context overflow (you&#8217;ll see it produce a &#8220;[Summary of previous discussion]&#8221; that condenses earlier turns). Another tactic is to explicitly write important facts to an external store (a file, database, etc.) and then later retrieve them when needed rather than carrying them in every prompt. This is like an external memory. Some advanced agent frameworks even let the LLM generate &#8220;notes to self&#8221; that get stored and can be recalled in future steps. The art here is figuring out <strong>what</strong> to keep, <strong>when</strong> to summarize, and <strong>how</strong> to resurface past info at the right moment. Done well, it lets an AI maintain coherence over very long tasks &#8211; something that pure prompting would struggle with.</p></li><li><p><strong>Tool use and environmental context:</strong> Modern AI agents can use tools (e.g. calling APIs, running code, web browsing) as part of their operation. When they do, each tool&#8217;s output becomes new context for the next model call. Context engineering in this scenario means instructing the model <em>when and how</em> to use tools and then feeding the results back in. For example, an agent might have a rule: &#8220;If the user asks a math question, call the calculator tool.&#8221; After using it, the result (say 42) is inserted into the prompt: <em>&#8220;Tool output: 42.&#8221;</em> This requires formatting the tool output clearly and maybe adding a follow-up instruction like <em>&#8220;Given this result, now answer the user&#8217;s question.&#8221;</em> A lot of work in agent frameworks (LangChain, etc.) is essentially context engineering around tool use &#8211; giving the model a list of available tools, syntactic guidelines for invoking them, and templating how to incorporate results. The key is that you, the engineer, <strong>orchestrate</strong> this dialogue between the model and the external world.</p></li><li><p><strong>Information formatting and packaging:</strong> We&#8217;ve touched on this, but it deserves emphasis. Often you have more info than fits or is useful to include fully. So you compress or format it. If your model is writing code and you have a large codebase, you might include just function signatures or docstrings rather than entire files, to give it context. If the user query is verbose, you might highlight the main question at the end to focus the model. Use headings, code blocks, tables &#8211; whatever structure best communicates the data. For example, rather than: &#8220;User data: [massive JSON]&#8230; Now answer question.&#8221; you might extract the few fields needed and present: &#8220;User&#8217;s Name: X, Account Created: Y, Last Login: Z.&#8221; This is both easier for the model to parse and uses fewer tokens. In short, think like a UX designer, but your &#8220;user&#8221; is the LLM &#8211; design the prompt for <strong>its</strong> consumption.</p></li></ul><p>The impact of these techniques is huge. When you see an impressive LLM demo solving a complex task (say, debugging code or planning a multi-step process), you can bet it wasn&#8217;t just a single clever prompt behind the scenes. There was a pipeline of context assembly enabling it. </p><p>For instance, an AI pair programmer might implement a workflow like: </p><ol><li><p>Search the codebase for relevant code</p></li><li><p>Include those code snippets in the prompt with the user&#8217;s request</p></li><li><p>If the model proposes a fix, run tests in the background</p></li><li><p>If tests fail, feed the failure output back into the prompt for the model to refine its solution</p></li><li><p>Loop until tests pass. </p></li></ol><p>Each step has carefully engineered context: the search results, the test outputs, etc., are each fed into the model in a controlled way. It&#8217;s a far cry from &#8220;just prompt an LLM to fix my bug&#8221; and hoping for the best.</p><h2>The challenge of context rot</h2><p>As we get better at assembling rich context, we run into a new problem: context can actually poison itself over time. This phenomenon, aptly termed "context rot" by developer <a href="https://news.ycombinator.com/item?id=44308711#44310054">Workaccount2</a> on Hacker News, describes how <strong>context quality degrades as conversations grow longer and accumulate distractions, dead ends, and low-quality information.</strong></p><p>The pattern is frustratingly common: you start a session with a well-crafted context and clear instructions. The AI performs beautifully at first. But as the conversation continues - especially if there are false starts, debugging attempts, or exploratory rabbit holes - the context window fills with increasingly noisy information. The model's responses gradually become less accurate, more confused, or start hallucinating. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RZQa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RZQa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!RZQa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!RZQa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!RZQa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RZQa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png" width="550" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:550,&quot;bytes&quot;:1179851,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/168177841?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RZQa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!RZQa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!RZQa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!RZQa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6aefa3f-7cf3-4aba-b2c3-1ed4aa5ce3d6_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Why does this happen? Context windows aren't just storage - they're the model's working memory. When that memory gets cluttered with failed attempts, contradictory information, or tangential discussions, it's like trying to work at a desk covered in old drafts and unrelated papers. The model struggles to identify what's currently relevant versus what's historical noise. Earlier mistakes in the conversation can compound, creating a feedback loop where the model references its own poor outputs and spirals further off track.</p><p>This is especially problematic in iterative workflows - exactly the kind of complex tasks where context engineering shines. Debugging sessions, code refactoring, document editing, or research projects naturally involve false starts and course corrections. But each failed attempt leaves traces in the context that can interfere with subsequent reasoning.</p><p>Practical strategies for managing context rot include:</p><ul><li><p><strong>Context pruning and refresh:</strong> Workaccount2's solution is "I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance." This approach preserves the essential state while discarding the noise. You're essentially doing garbage collection for your context.</p></li><li><p><strong>Structured context boundaries:</strong> Use clear markers to separate different phases of work. For example, explicitly mark sections as "Previous attempts (for reference only)" versus "Current working context." This helps the model understand what to prioritize.</p></li><li><p><strong>Progressive context refinement:</strong> After significant progress, consciously rebuild the context from scratch. Extract the key decisions, successful approaches, and current state, then start fresh. It's like refactoring code&#8212;occasionally you need to clean up the accumulated cruft.</p></li><li><p><strong>Checkpoint summaries:</strong> At regular intervals, have the model summarize what's been accomplished and what the current state is. Use these summaries as seeds for fresh context when starting new sessions.</p></li><li><p><strong>Context windowing:</strong> For very long tasks, break them into phases with natural boundaries where you can reset context. Each phase gets a clean start with only the essential carry-over from the previous phase.</p></li></ul><p>This challenge also highlights why "just dump everything into the context" isn't a viable long-term strategy. Like good software architecture, <strong>good context engineering requires intentional information management</strong> - deciding not just what to include, but when to exclude, summarize, or refresh.</p><h2><strong>Context engineering in the Big Picture of LLM applications</strong></h2><p><strong>Context engineering is crucial, but it&#8217;s just one component of a larger stack needed to build full-fledged LLM applications &#8211; alongside things like control flow, model orchestration, tool integration, and guardrails.</strong></p><p>In Karpathy&#8217;s words, context engineering is <em>&#8220;one small piece of an emerging thick layer of non-trivial software&#8221;</em> that powers real LLM apps. So while we&#8217;ve focused on how to craft good context, it&#8217;s important to see where that fits in the overall architecture.</p><p>A production-grade LLM system typically has to handle many concerns beyond just prompting, for example:</p><ul><li><p><strong>Problem decomposition and control flow:</strong> Instead of treating a user query as one monolithic prompt, robust systems often break the problem down into sub-tasks or multi-step workflows. For instance, an AI agent might first be prompted to outline a plan, then in subsequent steps be prompted to execute each step. Designing this flow (which prompts to call in what order, how to decide branching or looping) is a classic programming task &#8211; except the &#8220;functions&#8221; are LLM calls with context. Context engineering fits here by making sure each step&#8217;s prompt has the info it needs, but the decision <em>to have steps at all</em> is a higher-level design. This is why you see frameworks where you essentially write a script that coordinates multiple LLM calls and tool uses.</p></li><li><p><strong>Model selection and routing:</strong> You might use different AI models for different jobs. Perhaps a lightweight model for simple tasks or preliminary answers, and a heavyweight model for final solutions. Or a code-specialized model for coding tasks versus a general model for conversational tasks. The system needs logic to route requests to the appropriate model. Each model might have different context length limits or formatting requirements, which the context engineering must account for (e.g. truncating context more aggressively for a smaller model). This aspect is more engineering than prompting: think of it as matching the tool to the job.</p></li><li><p><strong>Tool integrations and external actions:</strong> If your AI can perform actions (like calling an API, database queries, opening a web page, running code), your software needs to manage those capabilities. That includes providing the AI with a list of available tools and instructions on usage, as well as actually executing those tool calls and capturing the results. As we discussed, the results then become new context for further model calls. Architecturally, this means your app often has a loop: prompt model -&gt; if model output indicates a tool to use -&gt; execute tool -&gt; incorporate result -&gt; prompt model again. Designing that loop reliably is a challenge.</p></li><li><p><strong>User interaction and UX flows:</strong> Many LLM applications involve the user in the loop. For example, a coding assistant might propose changes and then ask the user to confirm applying them. Or a writing assistant might offer a few draft options for the user to pick from. These UX decisions affect context, too. If the user says &#8220;Option 2 looks good but shorten it,&#8221; you need to carry that feedback into the next prompt (e.g. <em>&#8220;The user chose draft 2 and asks to shorten it.&#8221;</em>). Designing a smooth human-AI interaction flow is part of the app, though not directly about prompts. Still, context engineering supports it by ensuring each turn&#8217;s prompt accurately reflects the state of the interaction (like remembering which option was chosen, or what the user edited manually).</p></li><li><p><strong>Guardrails and safety:</strong> In production, you have to consider misuse and errors. This might include content filters (to prevent toxic or sensitive outputs), authentication and permission checks for tools (so the AI doesn&#8217;t, say, delete a database because it was in the instructions), and validation of outputs. Some setups use a second model or rules to double-check the first model&#8217;s output. For example, after the main model generates an answer, you might run another check: <em>&#8220;Does this answer contain any sensitive info? If so, redact it.&#8221;</em> Those checks themselves can be implemented as prompts or as code. In either case, they often add additional instructions into the context (like a system message: <em>&#8220;If the user asks for disallowed content, refuse.&#8221;</em> is part of many deployed prompts). So the context might always include some safety boilerplate. Balancing that (ensuring the model follows policy without compromising helpfulness) is yet another piece of the puzzle.</p></li><li><p><strong>Evaluation and monitoring:</strong> Suffice to say, you need to constantly monitor how the AI is performing. Logging every request and response (with user consent and privacy in mind) allows you to analyze failures and outliers. You might incorporate real-time evals &#8211; e.g., scoring the model&#8217;s answers on certain criteria and if the score is low, automatically having the model try again or route to a human fallback. While evaluation isn&#8217;t part of generating a single prompt&#8217;s content, it feeds back into improving prompts and context strategies over time. Essentially, you treat the prompt and context assembly as something that can be <em>debugged</em> and optimized using data from production.</p></li></ul><p>We&#8217;re really talking about <strong>a new kind of application architecture</strong>. It&#8217;s one where the core logic involves managing information (context) and adapting it through a series of AI interactions, rather than just running deterministic functions. Karpathy listed elements like control flows, model dispatch, memory management, tool use, verification steps, etc., on top of context filling. All together, they form what he jokingly calls &#8220;an emerging thick layer&#8221; for AI apps &#8211; thick because it&#8217;s doing a lot! When we build these systems, we&#8217;re essentially writing meta-programs: programs that choreograph another &#8220;program&#8221; (the AI&#8217;s output) to solve a task.</p><p>For us software engineers, this is both exciting and challenging. It&#8217;s exciting because it opens capabilities we didn&#8217;t have &#8211; e.g., building an assistant that can handle natural language, code, and external actions seamlessly. It&#8217;s challenging because many of the techniques are new and still in flux. We have to think about things like prompt versioning, AI reliability, and ethical output filtering, which weren&#8217;t standard parts of app development before. In this context, <strong>context engineering lies at the heart</strong> of the system: if you can&#8217;t get the right information into the model at the right time, nothing else will save your app. But as we see, even perfect context alone isn&#8217;t enough; you need all the supporting structure around it.</p><p>The takeaway is that <strong>we&#8217;re moving from prompt design to system design</strong>. Context engineering is a core part of that system design, but it lives alongside many other components. </p><h2><strong>Conclusion</strong></h2><p><strong>Key takeaway:</strong> <em>By mastering the assembly of complete context (and coupling it with solid testing), we can increase the changes of getting the best output from AI models.</em></p><p>For experienced engineers, much of this paradigm is familiar at its core &#8211; it&#8217;s about good software practices &#8211; but applied in a new domain. Think about it:</p><ul><li><p>We always knew <strong>garbage in, garbage out</strong>. Now that principle manifests as &#8220;bad context in, bad answer out.&#8221; So we put more work into ensuring quality input (context) rather than hoping the model will figure it out.</p></li><li><p>We value <strong>modularity and abstraction</strong> in code. Now we&#8217;re effectively abstracting tasks to a high level (describe the task, give examples, let AI implement) and building modular pipelines of AI + tools. We&#8217;re orchestrating components (some deterministic, some AI) rather than writing all logic ourselves.</p></li><li><p>We practice <strong>testing and iteration</strong> in traditional dev. Now we&#8217;re applying the same rigor to AI behaviors, writing evals and refining prompts as one would refine code after profiling.</p></li></ul><p>In embracing context engineering, you&#8217;re essentially saying: <em>I, the developer, am responsible for what the AI does.</em> It&#8217;s not a mysterious oracle; it&#8217;s a component I need to configure and drive with the right data and rules. </p><p>This mindset shift is empowering. It means we don&#8217;t have to treat the AI as unpredictable magic &#8211; we can tame it with solid engineering techniques (plus a bit of creative prompt artistry).</p><p>Practically, how can you adopt this context-centric approach in your work?</p><ul><li><p><strong>Invest in data and knowledge pipelines.</strong> A big part of context engineering is having the data to inject. So, build that vector search index of your documentation, or set up that database query that your agent can use. Treat knowledge sources as first-class citizens in development. For example, if your AI assistant is for coding, make sure it can pull in code from the repo or reference the style guide. A lot of the value you&#8217;ll get from an AI comes from the <em>external knowledge</em> you supply to it.</p></li><li><p><strong>Develop prompt templates and libraries.</strong> Rather than ad-hoc prompts, start creating structured templates for your needs. You might have a template for &#8220;answer with citation&#8221; or &#8220;generate code diff given error&#8221;. These become like functions you reuse. Keep them in version control. Document their expected behavior. This is how you build up a toolkit of proven context setups. Over time, your team can share and iterate on these, just as they would on shared code libraries.</p></li><li><p><strong>Use tools and frameworks that give you control.</strong> Avoid black-box &#8220;just give us a prompt, we do the rest&#8221; solutions if you need reliability. Opt for frameworks that let you peek under the hood and tweak things &#8211; whether that&#8217;s a lower-level library like LangChain or a custom orchestration you build. The more visibility and control you have over context assembly, the easier to debug when something goes wrong.</p></li><li><p><strong>Monitor and instrument everything.</strong> In production, log the inputs and outputs (within privacy limits) so you can later analyze them. Use observability tools (like LangSmith, etc.) to trace how context was built for each request. When an output is bad, trace back and see what the model saw &#8211; was something missing? Was something formatted poorly? This will guide your fixes. Essentially, treat your AI system as a somewhat unpredictable service that you need to monitor like any other &#8211; dashboards for prompt usage, success rates, etc.</p></li><li><p><strong>Keep the user in the loop.</strong> Context engineering isn&#8217;t just about machine-machine info; it&#8217;s ultimately about solving a user&#8217;s problem. Often, the user can provide context if asked the right way. Think about UX designs where the AI asks clarifying questions or where the user can provide extra details to refine the context (like attaching a file, or selecting which codebase section is relevant). The term &#8220;AI-assisted&#8221; goes both ways &#8211; AI assists user, but user can assist AI by supplying context. A well-designed system facilitates that. For example, if an AI answer is wrong, let the user correct it and feed that correction back into context for next time.</p></li><li><p><strong>Train your team (and yourself).</strong> Make context engineering a shared discipline. In code reviews, start reviewing prompts and context logic too (&#8220;Is this retrieval grabbing the right docs? Is this prompt section clear and unambiguous?&#8221;). If you&#8217;re a tech lead, encourage team members to surface issues with AI outputs and brainstorm how tweaking context might fix it. Knowledge sharing is key because the field is new &#8211; a clever prompt trick or formatting insight one person discovers can likely benefit others. I&#8217;ve personally learned a ton just reading others&#8217; prompt examples and post-mortems of AI failures.</p></li></ul><p>As we move forward, I expect <strong>context engineering to become second nature</strong> &#8211; much like writing an API call or a SQL query is today. It will be part of the standard repertoire of software development. Already, many of us don&#8217;t think twice about doing a quick vector similarity search to grab context for a question; it&#8217;s just part of the flow. In a few years, &#8220;Have you set up the context properly?&#8221; will be as common a code review question as &#8220;Have you handled that API response properly?&#8221;.</p><p>In embracing this new paradigm, we don&#8217;t abandon the old engineering principles &#8211; we reapply them in new ways. If you&#8217;ve spent years honing your software craft, that experience is incredibly valuable now: it&#8217;s what allows you to design sensible flows, to spot edge cases, to ensure correctness. AI hasn&#8217;t made those skills obsolete; it&#8217;s amplified their importance in guiding AI. The role of the software engineer is not diminishing &#8211; it&#8217;s evolving. We&#8217;re becoming <strong>directors</strong> and <strong>editors</strong> of AI, not just writers of code. And context engineering is the technique by which we direct the AI effectively.</p><p><strong>Start thinking in terms of what information you provide to the model, not just what question you ask.</strong> Experiment with it, iterate on it, and share your findings. By doing so, you&#8217;ll not only get better results from today&#8217;s AI, but you&#8217;ll also be preparing yourself for the even more powerful AI systems on the horizon. Those who understand how to feed the AI will always have the advantage.</p><p>Happy context-coding!</p><p><em>I&#8217;m excited to share I&#8217;m writing a new <a href="https://www.oreilly.com/library/view/vibe-coding-the/9798341634749/">AI-assisted engineering book</a> with O&#8217;Reilly. If you&#8217;ve enjoyed my writing here you may be interested in checking it out.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xA9A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xA9A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 424w, https://substackcdn.com/image/fetch/$s_!xA9A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 848w, https://substackcdn.com/image/fetch/$s_!xA9A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 1272w, https://substackcdn.com/image/fetch/$s_!xA9A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xA9A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4339581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/168177841?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xA9A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 424w, https://substackcdn.com/image/fetch/$s_!xA9A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 848w, https://substackcdn.com/image/fetch/$s_!xA9A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 1272w, https://substackcdn.com/image/fetch/$s_!xA9A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b73676-5487-4cf2-8240-813e707703b7_7869x7869.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[The AI-Native Software Engineer]]></title><description><![CDATA[A practical playbook for integrating AI into your daily engineering workflow]]></description><link>https://addyo.substack.com/p/the-ai-native-software-engineer</link><guid isPermaLink="false">https://addyo.substack.com/p/the-ai-native-software-engineer</guid><dc:creator><![CDATA[Addy Osmani]]></dc:creator><pubDate>Tue, 01 Jul 2025 18:17:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WFGE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>An </strong><em><strong>AI-native software engineer</strong></em><strong> is one who deeply integrates AI into their daily workflow, treating it as a partner to amplify their abilities.</strong></p><p>This requires a fundamental <strong>mindset shift</strong>. Instead of thinking &#8220;AI might replace me&#8221; an AI-native engineer asks for every task: <em>&#8220;Could AI help me do this faster, better, or differently?&#8221;</em>.</p><p>The mindset is optimistic and proactive - you see AI as a multiplier of your productivity and creativity, not a threat. With the right approach <strong>AI could 2x, 5x or perhaps 10x your output as an engineer</strong>. Experienced developers especially find that their expertise lets them prompt AI in ways that yield high-level results; a senior engineer can get answers akin to what a peer might deliver by asking AI the right questions with appropriate <a href="https://x.com/karpathy/status/1937902205765607626">context-engineering</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t2_Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t2_Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!t2_Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!t2_Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!t2_Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t2_Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png" width="450" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:450,&quot;bytes&quot;:1939902,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/165160941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!t2_Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!t2_Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!t2_Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!t2_Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae2c5e0-27c6-4959-b37d-67f2c40b2e09_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Being AI-native means embracing <strong>continuous learning and adaptation</strong> - engineers build software with AI-based assistance and automation baked in from the beginning. This mindset leads to excitement about the possibilities rather than fear.</p><p>Yes, there may be uncertainty and a learning curve - many of us have ridden the emotional rollercoaster of excitement, fear, and back again - but ultimately the goal is to land on excitement and opportunity. The AI-native engineer views AI as a way to delegate the repetitive or time-consuming parts of development (like boilerplate coding, documentation drafting, or test generation) and free themselves to focus on higher-level problem solving and innovation.</p><p><strong>Key principle - AI as collaborator, not replacement:</strong> An AI-native engineer treats AI like a knowledgeable, if junior, pair-programmer who is available 24/7.</p><p>You still drive the development process, but you constantly leverage the AI for ideas, solutions, and even warnings. For example, you might use an AI assistant to brainstorm architectural approaches, then refine those ideas with your own expertise. This collaboration can dramatically speed up development while also enhancing quality - <em>if</em> you maintain oversight. </p><p>Importantly, you don&#8217;t abdicate responsibility to the AI. Think of it as working with a junior developer who has read every StackOverflow post and API doc: they have a ton of information and can produce code quickly, but <strong>you are responsible for guiding them and verifying the output</strong>. This &#8220;<a href="https://addyo.substack.com/p/the-trust-but-verify-pattern-for">trust, but verify</a>&#8221; mindset is crucial and we&#8217;ll revisit it later.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qzj1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qzj1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!qzj1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!qzj1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!qzj1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qzj1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png" width="488" height="488" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:488,&quot;bytes&quot;:2016817,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/165160941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qzj1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!qzj1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!qzj1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!qzj1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff8ce3d-bcdc-45ed-933b-c0a1038c63ea_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let's be blunt: AI-generated slop is real and is not an excuse for <a href="https://addyo.substack.com/p/vibe-coding-is-not-an-excuse-for">low-quality work</a>. A persistent risk in using these tools is a combination of rubber-stamped suggestions, subtle hallucinations, and simple laziness that falls far below professional engineering standards. This is why the "verify" part of the mantra is non-negotiable. As the engineer, you are not just a user of the tool; you are the ultimate guarantor. You remain fully and directly responsible for the quality, readability, security, and correctness of every line of code you commit.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qzt7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qzt7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Qzt7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Qzt7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Qzt7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qzt7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png" width="387" height="387" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:387,&quot;bytes&quot;:2240492,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/165160941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qzt7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Qzt7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Qzt7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Qzt7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fdf82a-2bcc-4a81-853e-779af54c24a2_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Key principle - Every engineer is a manager now:</strong> The role of the engineer is fundamentally changing. With AI agents, you orchestrate the work rather than executing all of it yourself.</p><p>You remain responsible for every commit into main, but you focus more on defining and &#8220;assigning&#8221; the work to get there. In the not-distant future we may increasingly say &#8220;<a href="https://x.com/levie/status/1938647740554092586">Every engineer is a manager now</a>.&#8221; Legitimate work can be directed to background agents like Jules or Codex, or you can task Claude Code/ Gemini CLI/OpenCode with chewing through an analysis or code migration project. The engineer needs to intentionally shape the codebase so that it&#8217;s easier for the AI to work with, using rule files (e.g. GEMINI.md), good READMEs, and well-structured code. This puts the engineer into the role of <a href="https://www.infoworld.com/article/3994519/the-tough-task-of-making-ai-code-production-ready.html">supervisor, mentor, and validator</a>. AI-first teams are smaller, able to accomplish more, and capable of <a href="https://www.forrester.com/blogs/appgen-is-here-say-goodbye-to-software-development-as-you-know-it/">compressing steps of the SDLC</a> to deliver better quality, <a href="https://newsletter.getdx.com/p/how-much-does-ai-impact-development-speed">faster</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wUui!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wUui!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!wUui!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!wUui!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!wUui!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wUui!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png" width="483" height="483" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:483,&quot;bytes&quot;:1989439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/165160941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wUui!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!wUui!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!wUui!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!wUui!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7cfea4-0f99-4c8d-81bd-d4c81070eee4_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>High-level benefits:</strong> By fully embracing AI in your workflow, you can achieve some serious productivity leaps, potentially shipping more features faster without sacrificing quality (this of course has nuance such as keeping task complexity in mind).</p><p>Routine tasks (from formatting code to writing unit tests) can be handled in seconds. Perhaps more importantly, AI can augment your understanding: it&#8217;s like having an expert on call to explain code or propose solutions in areas outside your normal expertise. The result is that an AI-native engineer can take on more ambitious projects or handle the same workload with a smaller team. In essence, <strong>AI extends what you&#8217;re capable of</strong>, allowing you to work at a higher level of abstraction. The caveat is that it requires skill to use effectively - that&#8217;s where the right mindset and practices come in.</p><p><strong>Example - Mindset in action:</strong> Imagine you&#8217;re debugging a tricky issue or evaluating a new tech stack. A traditional approach might involve lots of Googling or reading documentation. An AI-native approach is to engage an AI assistant that supports Search grounding or deep research: describe the bug or ask for pros/cons of the tech stack, and let the AI provide insights or even code examples.</p><p>You remain in charge of interpretation and implementation, but the AI accelerates gathering information and possible solutions. This collaborative problem-solving becomes second nature once you get used to it. Make it a habit to ask, <em>&#8220;How can AI help with this task?&#8221;</em> until it&#8217;s reflex. Over time you&#8217;ll develop instincts for what AI is good at and how to prompt it effectively.</p><p>In summary, <strong>being AI-native means internalizing AI as a core part of how you think about solving problems and building software</strong>. It&#8217;s a mindset of partnership with machines: using their strengths (speed, knowledge, pattern recognition) to complement your own (creativity, judgment, context). With this foundation in mind, we can move on to practical steps for integrating AI into your daily work.</p><h2><strong>Getting Started - Integrating AI into your daily workflow</strong></h2><p>Adopting an AI-native workflow can feel daunting if you&#8217;re completely new to it. The key is to <strong>start small and build up</strong> your AI fluency over time. In this section, we&#8217;ll provide concrete guidance to go from zero to productive with AI in your day-to-day engineering tasks. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OGxs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OGxs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!OGxs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!OGxs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!OGxs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OGxs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2100732,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/165160941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OGxs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!OGxs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!OGxs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!OGxs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ef50d3c-e6f1-4b5e-a850-2177e561bbc1_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The above is a speculative look at where we may end up with AI in the software lifecycle. I continue to strongly believe human-in-the-loop (engineering, design, product, UX etc) will be needed to ensure that quality doesn&#8217;t suffer.</em></p><p><strong>Step 1: The first change? You often start with AI.</strong> </p><p>An AI-native workflow isn&#8217;t about occasionally looking for tasks AI can help with; it's often about giving the task to an AI model first to see how it performs. <a href="https://www.ignorance.ai/p/ai-at-pulley">One team noted</a>: </p><blockquote><p>The typical workflow involves giving the task to an AI model first (via Cursor or a CLI program)... with the understanding that plenty of tasks are still hit or miss. </p></blockquote><p>Are you studying a domain or a competitor? Start with Gemini Deep Research. Find yourself stuck in an endless debate over some aspect of design? While your team argued, you could have built three prototypes with AI to prove out the idea. Googlers are already <a href="https://x.com/rmedranollamas/status/1938305816185966898">using it to build slides, debug production incidents, and much more</a>.</p><p>When you hear &#8220;But LLMs hallucinate and chatbots give lousy answers&#8221; it's time to update your toolchain. Anybody <a href="https://fly.io/blog/youre-all-nuts/">seriously coding with AI today is using agents</a>. Hallucinations can be significantly mitigated and managed with proper <a href="https://blog.langchain.com/the-rise-of-context-engineering/">context engineering</a> and agentic feedback loops. The mindset shift is foundational: all of us should be AI-first right now.</p><p><strong>Step 2: Get the right AI tools in place.</strong> </p><p>To integrate AI smoothly, you&#8217;ll want to set up at least one coding assistant in your environment. Many engineers start with <strong>GitHub Copilot</strong> in VS Code which has code autocomplete and code generation capabilities. If you use an IDE like VS Code, consider installing an AI extension (for example, <strong>Cursor</strong> is a dedicated AI-enhanced code editor, and <strong><a href="https://addyo.substack.com/p/why-i-use-cline-for-ai-engineering">Cline</a></strong> is a VS Code plugin for an AI agent - more on these later). These tools are great for beginners because they work in the background, suggesting code in real-time for whatever file you&#8217;re editing. Outside your editor, you might also explore <strong>ChatGPT, Gemini or Claude</strong> in a separate window for question-answer style assistance. Starting with tooling is important because it lowers the friction to use AI. Once installed, the AI is only a keystroke away whenever you think &#8220;maybe the AI can help with this.&#8221;</p><p><strong>Step 3: Learn prompt basics - be specific and provide context.</strong> </p><p>Using AI effectively is a skill, and the core of that skill is <strong><a href="https://addyo.substack.com/p/the-prompt-engineering-playbook-for">prompt engineering</a></strong>. A common mistake new users make is giving the AI an overly vague instruction and then being disappointed with the result. Remember, the AI isn&#8217;t a mind reader; it reacts to the prompt you give. A little extra context or clarity goes a long way. For instance, if you have a piece of code and you want an explanation or unit tests for it, don&#8217;t just say <em>&#8220;Write tests for this.&#8221;</em> Instead, <strong>describe the code&#8217;s intended behavior and requirements in your prompt</strong>. Compare these two prompts for writing tests for a React login form component:</p><ul><li><p><strong>Poor prompt:</strong> &#8220;Can you write tests for my React component?&#8221;</p></li><li><p><strong>Better prompt:</strong> &#8220;I have a LoginForm React component with an email field, password field, and submit button. It displays a success message on successful submit and an error message on failure, via an onSubmit callback. <strong>Please write a Jest test file</strong> that: (1) renders the form, (2) fills in valid and invalid inputs, (3) submits the form, (4) asserts that onSubmit is called with the right data, and (5) checks that success and error states render appropriately.&#8221;</p></li></ul><p>The second prompt is longer, but it gives the AI exactly what we need. The result will be <strong>far more accurate and useful</strong> because the AI isn&#8217;t guessing at our intentions - we spelled them out. In practice, spending an extra minute to clarify your prompt can save you <strong>hours</strong> of fixing AI-generated code later. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NG0k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NG0k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NG0k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NG0k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NG0k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NG0k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png" width="411" height="411" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:411,&quot;bytes&quot;:1816939,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/165160941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NG0k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NG0k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NG0k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NG0k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d693ee6-61be-4250-ad42-b43ace337365_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Effective prompting is such an important skill that Google has published entire guides on it (see <strong><a href="https://workspace.google.com/learning/content/gemini-prompt-guide">Google&#8217;s Prompting Guide 101</a></strong> for a great starting point). As you practice, you&#8217;ll get a feel for how to phrase requests. A couple of quick tips: be clear about the format you want (e.g., &#8220;return the output as JSON&#8221;), break complex tasks into ordered steps or bullet points in your prompt, and provide examples when possible. These techniques help the AI understand your request better.</p><p><strong>Step 4: Use AI for code generation and completion.</strong> </p><p>With tools set up and a grasp of how to prompt, start applying AI to actual coding tasks. A good first use-case is generating boilerplate or repetitive code. For instance, if you need a function to parse a date string in multiple formats, ask the AI to draft it. You might say: <em>&#8220;Write a Python function that takes a date string which could be in formats X, Y, or Z, and returns a datetime object. Include error handling for invalid formats.&#8221;</em></p><p>The AI will produce an initial implementation. <strong>Don&#8217;t accept it blindly</strong> - read through it and run tests. This hands-on practice builds your trust in when the AI is reliable. Many developers are pleasantly surprised at how the AI produces a decent solution in seconds, which they can then tweak. Over time, you can move to more significant code generation tasks, like scaffolding entire classes or modules. As an example, <strong>Cursor</strong> even offers features to generate entire files or refactor code based on a description. Early on, lean on the AI for <em>helper code</em> - things you understand but would take time to write - rather than core algorithmic logic that&#8217;s critical. This way, you build confidence in the AI&#8217;s capabilities on low-risk tasks.</p><p><strong>Step 5: Integrate AI into non-coding tasks.</strong> </p><p>Being AI-native isn&#8217;t just about writing code faster; it&#8217;s about improving all facets of your work. A great way to start is using AI for writing or analysis tasks that surround coding. For example, try using AI to <strong>write a commit message or a Pull Request description</strong> after you make code changes. You can paste a git diff and ask, &#8220;Summarize these changes in a professional PR description.&#8221; The AI will draft something that you can refine.</p><p>This is a key differentiator between casual users and true AI-native engineers. <strong>The best engineers have always known that their primary value isn't just typing code, but in the thinking, planning, research, and communication that surrounds it.</strong> Applying AI to these areas - to accelerate research, clarify documentation, or structure a project plan - is a massive force multiplier. Seeing AI as an assistant for the entire engineering process, not just the coding part, is critical to unlocking its full potential for velocity and innovation.</p><p>Along these lines, use AI to <strong>document code</strong>: have it generate docstrings or even entire sections of technical documentation based on your codebase. Another idea is to use AI for <strong>planning</strong> - if you&#8217;re not sure how to implement a feature, describe the requirement and ask the AI to outline a possible approach. This can give you a starting blueprint which you then adjust. Don&#8217;t forget about everyday communications: many engineers use AI to draft emails or Slack messages, especially when communicating complex ideas.</p><p>For instance, if you need to explain to a product manager why a certain bug is tricky, you can ask the AI to help articulate the explanation clearly. This might sound trivial, but it&#8217;s a real productivity boost and helps ensure you communicate effectively. Remember, <em>&#8220;it&#8217;s not always all about the code&#8221;</em> - AI can assist in meetings, brainstorming, and articulating ideas too. An AI-native engineer leverages these opportunities.</p><p><strong>Step 6: Iterate and refine through feedback.</strong> </p><p>As you begin using AI day-to-day, treat it as a learning process for yourself. Pay attention to where the AI&#8217;s output needed fixing and try to deduce why. Was the prompt incomplete? Did the AI assume the wrong context? Use that feedback to craft better prompts next time. Most AI coding assistants allow an iterative process: you can say &#8220;Oops, that function is not handling empty inputs correctly, please fix that&#8221; and the AI will refine its answer. Take advantage of this interactivity - it&#8217;s often faster to correct an AI&#8217;s draft by telling it what to change than writing from scratch.</p><p>Over time, you&#8217;ll develop a library of prompt patterns that work well. For example, you might discover that <em>&#8220;Explain X like I&#8217;m a new team member&#8221;</em> yields a very good high-level explanation of a piece of code for documentation purposes. Or that providing a short example input and output in your prompt dramatically improves an AI&#8217;s answer for data transformation tasks. Build these discoveries into your workflow.</p><p><strong>Step 7: Always verify and test AI outputs.</strong> </p><p>This cannot be stressed enough: <strong>never assume the AI is 100% correct</strong>. Even if the code compiles or the answer looks reasonable, do your due diligence. Run the code, write additional tests, or sanity-check the reasoning. Many AI-generated solutions work on the surface but fail on edge cases or have subtle bugs.</p><p>You are the engineer; the AI is an assistant. Use all your normal best practices (code reviews, testing, static analysis) on AI-written code just as you would on human-written code. In practice, this means budgeting some time to go through what the AI produced. The good news is that reading and understanding code is usually faster than writing it from scratch, so even with verification, you come out ahead productivity-wise.</p><p>As you gain experience, you&#8217;ll also learn which kinds of tasks the AI is <strong>weak</strong> at - for example, many LLMs struggle with precise arithmetic or highly domain-specific logic - and you&#8217;ll know to double-check those parts extra carefully or perhaps avoid using AI for those. Building this intuition ensures that by the time you trust an AI-generated change enough to commit or deploy, you&#8217;ve mitigated risks. A useful mental model is to <strong>treat AI like a highly efficient but not infallible teammate</strong>: you value its contributions but always perform the final review yourself.</p><p><strong>Step 8: Expand to more complex uses gradually.</strong> </p><p>Once you&#8217;re comfortable with AI handling small tasks, you can explore more advanced integrations. For example, move from using AI in a reactive way (asking for help when you think of it) to a proactive way: let the AI monitor as you code. Tools like <strong>Cursor</strong> or <strong>Windsurf</strong> can run in agent mode where they watch for errors or TODO comments and suggest fixes automatically. Or you might try an <strong>autonomous agent</strong> mode like what <strong>Cline</strong> offers, where the AI can plan out a multi-step task (create a file, write code in it, run tests, etc.) with your approval at each step.</p><p>These advanced uses can unlock even greater productivity, but they also require more vigilance (imagine giving a junior dev more autonomy - you&#8217;d still check in regularly).</p><p>A powerful intermediate step is to use AI for <strong>end-to-end prototyping</strong>. For instance, challenge yourself on a weekend to build a simple app using mostly AI assistance: describe the app you want and see how far a tool like Replit&#8217;s AI or Bolt can get you, then use your skills to fill the gaps. This kind of exercise is fantastic for understanding the current limits of AI and learning how to direct it better. And it&#8217;s fun - you&#8217;ll feel like you have a superpower when, in a couple of hours, you have a working prototype that might have taken days or weeks to code by hand.</p><p>By following these steps and ramping up gradually, you&#8217;ll go from an AI novice to someone who instinctively weaves AI into their development workflow. The next section will dive deeper into the landscape of <strong>tools and platforms</strong> available - knowing what tool to use for which job is an important part of being productive with AI.</p><h2><strong>AI Tools and Platforms - from prototyping to production</strong></h2><p>One of the reasons it&#8217;s an exciting time to be an engineer is the sheer variety of AI-powered tools now available. As an AI-native software engineer, part of your skillset is knowing <strong>which tools to leverage for which tasks</strong>. In this section, we&#8217;ll survey the landscape of AI coding tools and platforms, and offer guidance on choosing and using them effectively. We&#8217;ll broadly categorize them into two groups - <strong>AI coding assistants</strong> (which integrate into your development environment to help with code you write) and <strong>AI-driven prototyping tools</strong> (which can generate entire project scaffolds or applications from a prompt). Both are valuable, but they serve different needs.</p><p><em>Before diving into specific tools, it's crucial for any professional to adopt a "data privacy firewall" as a core part of their mindset. Always ask yourself: "Would I be comfortable with this prompt and its context being logged on a third-party server?" This discipline is fundamental to using these tools responsibly. An AI-native engineer learns to distinguish between tasks safe for a public cloud AI and tasks that demand an enterprise-grade, privacy-focused, or even a self-hosted, local model.</em></p><h3><strong>AI Coding Assistants in the IDE</strong></h3><p>These tools act like an &#8220;AI pair programmer&#8221; integrated with your editor or IDE. They are invaluable when you&#8217;re working on an existing codebase or building a project in a traditional way (writing code, file by file). Here are some notable examples and their nuances:</p><ul><li><p><strong>GitHub&#8239;Copilot </strong>has transformed from an autocomplete tool into a true coding agent: once you assign it an issue or task it can autonomously analyze your codebase, spin up environments (like via GitHub Actions), propose multi&#8209;file edits, run commands/tests, fix errors, and submit draft pull requests complete with its reasoning in the logs. Built on state&#8209;of&#8209;the&#8209;art models, it supports multi&#8209;model selection and leverages Model Context Protocol (MCP) to integrate external tools and workspace context, enabling it to navigate complex repo structures including monorepos, CI pipelines, image assets, API dependencies, and more .Despite these advances, it&#8217;s optimized for low&#8209; to medium&#8209;complexity tasks and still requires human oversight - especially for security, deep architecture, and multi&#8209;agent coordination purpose</p></li><li><p><strong>Cursor - AI-native code editor:</strong> <strong>Cursor</strong> is a modified VS Code editor with AI deeply integrated. Unlike Copilot which is an add-on, Cursor is built around AI from the ground up. It can do things like AI-aware navigation (ask it to find where a function is used, etc.) and smart refactorings. Notably, Cursor has features to generate tests, explain code, and even an &#8220;Agent&#8221; mode where it will attempt larger tasks on command. Cursor&#8217;s philosophy is to &#8220;supercharge&#8221; a developer especially in <strong>large codebases</strong>. If you&#8217;re working in a monorepo or enterprise-scale project, Cursor&#8217;s ability to understand project-wide context (and even customize it with project-specific rules using something like a .cursorrules file) can be a game changer. Many developers use Cursor in &#8220;Ask&#8221; mode to begin with - you ask for what you want, get confirmation, then let it apply changes - which helps ensure it does the right thing. The trade-off with Cursor is that it&#8217;s a standalone editor (though familiar to VS Code users) and currently it&#8217;s a paid product. It&#8217;s very popular, with millions of developers using it, including in enterprises, which speaks to its effectiveness.</p></li><li><p><strong>Windsurf - AI agent for coding with large context:</strong> <strong>Windsurf</strong> is another AI-augmented development environment. Windsurf emphasizes enterprise needs: it has strong data privacy (no data retention, self-hosting options) and even compliance certifications like HIPAA and FedRAMP, making it attractive for companies concerned about code security. Functionally, Windsurf can do many of the same assistive tasks (code completion, suggesting changes, etc.), but anecdotally it&#8217;s especially useful in scenarios where you might feed entire files or lots of documentation to the AI. If you are working on a codebase with tens of thousands of lines and need the AI to be aware of most of it (for instance, a sweeping refactor across many files), a tool like Windsurf is worth considering.</p></li><li><p><strong>Cline - autonomous AI coding agent for VS Code:</strong> <strong>Cline</strong> takes a unique approach by acting as an <em>autonomous agent</em> within your editor. It&#8217;s an open-source VS Code extension that not only suggests code, but can create files, execute commands, and perform multi-step tasks with your permission. Cline operates in dual modes: <em>Plan</em> (where it outlines what it intends to do) and <em>Act</em> (where it executes those steps) under human supervision. The idea is to let the AI handle more complex chores, like setting up a whole feature: it could plan &#8220;Add a new API endpoint, including route, controller, and database migration&#8221; and then implement each part, asking for confirmation. This aligns AI assistance with professional engineering workflows by giving the developer <strong>control and visibility into each step</strong>. I&#8217;ve noted that Cline &#8220;treats AI not just as a code generator but as a systems-level engineering tool&#8221; meaning it can reason about the project structure and coordinate multiple changes coherently. The downsides: because it can run code or modify many files, you have to be careful and review its plans. There&#8217;s also cost if you connect it to powerful models (some users note it can use a lot of tokens, hence $$, when running very autonomously). But for serious use - say you want to quickly prototype a new module in your app with tests and docs - Cline can be incredibly powerful. It&#8217;s like having an eager junior engineer that asks &#8220;Should I proceed with doing X?&#8221; at each step. Many developers appreciate this more collaborative style (Cline &#8220;asks more questions&#8221; by design) because it reduces the chance of the AI going off-track.<br></p></li></ul><p>Use <strong>AI coding assistants</strong> when you&#8217;re iteratively building or maintaining a codebase - these tools fit naturally into your cycle of edit&#8209;compile&#8209;test. They&#8217;re ideal for tasks like writing new functions (just type a signature and they&#8217;ll often co&#8209;complete the body), refactoring (&#8220;refactor this function to be more readable&#8221;), or understanding unfamiliar code (&#8220;explain this code&#8221; - and you get a concise summary). They&#8217;re not meant to build an entire app in one pass; instead, they augment your day&#8209;to&#8209;day workflow. For seasoned engineers, invoking an AI assistant becomes second nature - like an on&#8209;demand search engine - used dozens of times daily for quick help or insights.</p><p>Under the hood, modern <strong>asynchronous coding agents</strong> like <a href="https://openai.com/codex/">OpenAI&#8239;Codex</a> and <a href="https://jules.google/">Google&#8217;s Jules</a> go a step further. Codex operates as an autonomous cloud agent - handling parallel tasks in isolated sandboxes: writing features, fixing bugs, running tests, generating full PRs - then presents logs and diffs for review.</p><p>Google&#8217;s Jules, powered by Gemini&#8239;2.5&#8239;Pro, brings asynchronous autonomy to your GitHub workflow: you assign an issue (such as upgrading Next.js), it clones your repo in a VM, plans its multi&#8209;file edits, executes them, summarizes the changes (including audio recap), and issues a pull request - all while you continue working . These agents differ from inline autocomplete: they&#8217;re autonomous collaborators that tackle defined tasks in the background and return completed work for your review, letting you stay focused on higher&#8209;level challenges.</p><h3><strong>AI-Driven prototyping and MVP builders</strong></h3><p>Separate from the in-IDE assistants, a new class of tools can generate entire working applications or substantial chunks of them from high-level prompts. These are great when you want to <strong>bootstrap a new project or feature quickly</strong> - essentially to get from zero to a first version (the &#8220;v0&#8221;) with minimal manual coding. They won&#8217;t usually produce final production-quality code without further iteration, but they create a remarkable starting point.</p><ul><li><p><strong><a href="http://bolt.new">Bolt (bolt.new)</a></strong> - <em>one-prompt full-stack app generator:</em> <strong>Bolt</strong> is built on the premise that you can type a natural language description of an app and get a deployable full-stack MVP in minutes. For example, you might say &#8220;A job board with user login and an admin dashboard&#8221; and Bolt will generate a React frontend (using Tailwind CSS for styling) and a Node.js/Prisma backend with a database, complete with the basic models for jobs and users. In testing, Bolt has proven to be <strong>extremely fast</strong> - often assembling a project in 15 seconds or so. The output code is generally clean and follows modern practices (React components, REST/GraphQL API, etc.), so you can open it in your IDE and continue development. Bolt excels at <strong>rapid iteration</strong>: you can tweak your prompt and regenerate, or use its UI to adjust what it built. It even has an &#8220;export to GitHub&#8221; feature for convenience. This makes it ideal for <strong>founders, hackathon participants, or any developer who wants to shortcut the initial setup of an app</strong>. The trade-off is that Bolt&#8217;s creativity is bounded by its training - it might use certain styling by default and might not handle very unique requirements without guidance. But as a starting point, it&#8217;s often impressive. In comparisons, users noted Bolt produces great-looking UIs very consistently and was a top pick for quickly getting a prototype UI that &#8220;wows&#8221; users or stakeholders.</p></li><li><p><strong><a href="http://v0.dev">v0 (v0.dev by Vercel)</a></strong> - <em>text to Next.js app generator:</em> <strong>v0</strong> is a tool from Vercel that similarly generates apps, especially focusing on Next.js (since Vercel is behind Next.js). You give it a prompt for what you want, and it creates a project. One thing to note about v0: it has a distinct design aesthetic. Testers observed that <strong>v0 tends to style everything in the popular ShadCN UI style</strong> - basically a trendy minimalist component library - whether you asked for it or not. This can be good if you like that style out of the box, but it means if you wanted a very custom design, v0 might not match it precisely. In one comparison, v0 was found to &#8220;re-theme designs&#8221; towards its default look instead of faithfully matching a given spec. So, v0 might be best if your goal is a quick <em>functional</em> prototype and you&#8217;re flexible on appearance. The code output is usually Next.js React code with whatever backend you specify (it might set up a simple API or use Vercel&#8217;s Edge Functions, etc.). As part of Vercel&#8217;s ecosystem, it&#8217;s also oriented toward <em>deployability</em> - the idea is you could take what it gives you and deploy on Vercel immediately. If you&#8217;re a fan of Next.js or building a web product that you plan to host on Vercel, v0 is a natural choice. Just keep in mind you might need to do some re-theming if you have your own design, since v0 has &#8220;opinions&#8221; about how things should look.</p></li><li><p><strong><a href="https://lovable.dev/">Lovable</a></strong> - <em>prompt-to-UI mockups (with some code):</em> <strong>Lovable</strong> is aimed more at beginners or non-engineers who want to build apps through a simpler interface. It lets you describe an app and provides a visual editor as well. Users have noted that Lovable&#8217;s <strong>strength is ease of use</strong> - it&#8217;s quite guided and has a nice UI for assembling your app - but its weakness is when you need to dive into code, it can be cumbersome. It tends to hide complexity (which is good if you want no-code), but if you are an engineer who wants to tweak what it built, you might find the experience frustrating. In terms of output, Lovable can create both UI and some logic, but perhaps not as completely as Bolt or v0. In one test, Lovable interestingly did better when given a screenshot to imitate than when given a Figma design - a bit inconsistent. It&#8217;s targeted at quick prototyping and maybe building simple apps with minimal coding. If you&#8217;re a tech lead working with a designer or PM who can&#8217;t code, Lovable might be something to let them play with to visualize ideas, which you then refine in code. However, for a seasoned engineer, Lovable might feel a bit limiting.</p></li><li><p><strong><a href="http://replit.com">Replit</a>:</strong> Replit&#8217;s online IDE has an AI mode where you can type a prompt like &#8220;Create a 2D Zelda-like game&#8221; or &#8220;Build a habit tracker app&#8221; and it will generate a project in their cloud environment. Replit&#8217;s strength is that it can run and host the result immediately, and it often takes care of both frontend and backend seamlessly since it&#8217;s all in one environment. A standout example: when asked to make a simple game, Replit&#8217;s AI agent not only wrote the code, but <em>ran it and iteratively improved it</em> by checking its own work with screenshots. In comparisons, Replit sometimes produced the most <strong>functionally complete</strong> result (for instance, a working game with enemies and collision when others barely produced a moving character). However, it might take longer to run and use more computational resources in doing so. Replit is great if you want a one-shot outcome that is actually runnable and possibly closer to production. It&#8217;s like having an AI that not only writes code, but also <em>tests it live</em> and fixes it. For full-stack apps, Replit likewise can wire up client and server and even set up a database if asked. The output might not be the cleanest or most idiomatic code in every case, but it&#8217;s often a very workable starting point. One consideration: because Replit&#8217;s agent runs in the cloud and can execute code, you might hit some limits for very big apps (and you need to be careful if you prompt it to do something that could run malicious code - though it&#8217;s sandboxed). Overall, if your goal is <em>&#8220;I want an app that I can run immediately and play with, and I don&#8217;t mind if the code needs refactoring later&#8221;</em> Replit is a top choice.</p></li><li><p><strong><a href="http://firebase.studio">Firebase Studio</a></strong> is Google&#8217;s cloud-based, agentic IDE built powered by Gemini, which lets you rapidly prototype and ship full&#8209;stack, AI&#8209;infused apps entirely in your browser. You can import existing codebases - or start from scratch using natural&#8209;language, image, or sketch prompts via the App Prototyping agent - to generate a working Next.js prototype (frontend, backend, Firestore, Auth, hosting, etc.) and immediately preview it live, then seamlessly switch into full&#8209;coding mode in a Code&#8209;OSS (VS&#8239;Code) workspace powered by Nix and integrated Firebase emulators. Gemini in Firebase offers inline code suggestions, debugging, test generation, documentation, migrations, even running terminal commands and interpreting outputs, so you can prompt &#8220;Build a photo&#8209;gallery app with uploads and authentication&#8221; see the app spun up end to end, tweak it, deploy it to Hosting or Cloud Run, and monitor usage - all without switching tools</p></li></ul><p><strong>When to use prototyping tools:</strong> These shine when you are <strong>starting a new project or feature</strong> and want to eliminate the grunt work of initial setup. For instance, if you&#8217;re a tech lead needing a quick proof-of-concept to show stakeholders, using Bolt or v0 to spin up the base and then deploying it can save days of effort. They are also useful for exploring ideas - you can generate multiple variations of an app to see different approaches. However, expect to iterate. Think of what these tools produce as a <strong>first draft</strong>.</p><p>After generating, you&#8217;ll likely bring the code into your own IDE (perhaps with an AI assistant there to help) and refine it. In many cases, the best workflow is <strong>hybrid</strong>: <em>prototype with a generation tool, then refine with an in-IDE assistant</em>. For example, you might use Bolt to create the MVP of an app, then open that project in Cursor to continue development with AI pair-programming on the finer details. These approaches aren&#8217;t mutually exclusive at all - they complement each other. Use the right tool for each phase: prototypers for initial scaffolding and high-level layout, assistants for deep code work and integration.</p><p>Another consideration is <strong>limitations and learning</strong>: by examining what these prototyping tools generate, you can learn common patterns. It&#8217;s almost like reading the output of a dozen framework tutorials in one go. But also note what they <em>don&#8217;t</em> do - often they won&#8217;t get the last <a href="https://addyo.substack.com/p/beyond-the-70-maximizing-the-human">20-30%</a> of an app done (things like polish, performance tuning, handling edge-case business logic), which will fall to you.</p><p>This is akin to the &#8220;<a href="https://addyo.substack.com/p/the-70-problem-hard-truths-about">70% problem</a>&#8221; observed in AI-assisted coding: AI gets you a big chunk of the way, but the final mile requires human insight. Knowing this, you can budget time accordingly. The good news is that initial 70% (spinning up UI components, setting up routes, hooking up basic CRUD) is usually the boring part - and if AI does that, you can focus your energy on the interesting parts (custom logic, UX finesse, etc.). Just don&#8217;t be lulled into a false sense of security; always review the generated code for things like security (e.g., did it hardcode an API key?) or correctness.</p><p><strong>Summary of tools vs use-cases:</strong> It&#8217;s helpful to recap and simplify how these tools differ. In a nutshell: <em>Use an IDE assistant when you&#8217;re evolving or maintaining a codebase; use a generative prototype tool when you need a new codebase or module quickly.</em> If you already have a large project, something like Cursor or <a href="http://cline.bot">Cline</a> plugged into VS Code will be your day-to-day ally, helping you write and modify code intelligently.</p><p>If you&#8217;re starting a project from scratch, tools like Bolt or v0 can do the heavy lifting of setup so you aren&#8217;t spending a day configuring build tools or creating boilerplate files. And if your work involves both (which is common: starting new services and maintaining old ones), you might very well use both types regularly. Many teams report success in <strong>combining</strong> them: for instance, generate a prototype to kickstart development, then manage and grow that code with an AI-augmented IDE.</p><p>Lastly, be aware of the <strong>&#8220;not invented here&#8221; stigma some might have with AI-gen code.</strong> It&#8217;s important to communicate within your team about using these tools. Some traditionalists may be skeptical of code they didn&#8217;t write themselves. The best way to overcome that is by demonstrating the benefits (speed, and after your review, the code quality can be made good) and making AI use collaborative. For example, share the prompt and output in a PR description (&#8220;This controller was generated using v0.dev based on the following description...&#8221;). This demystifies the AI&#8217;s contribution and can invite constructive review just like human-generated code.</p><p>Now that we&#8217;ve looked at tools, in the next section we&#8217;ll zoom out and walk through <strong>how to apply AI across the entire software development lifecycle</strong>, from design to deployment. AI&#8217;s role isn&#8217;t limited to coding; it can assist in requirements, testing, and more.</p><h2><strong>AI across the Software Development Lifecycle</strong></h2><p>An AI-native software engineer doesn&#8217;t only use AI for writing code - they leverage it at <strong>every stage of the <a href="https://www.geeksforgeeks.org/software-engineering/software-development-life-cycle-sdlc/">software development lifecycle</a> (SDLC)</strong>. This section explores how AI can be applied pragmatically in each phase of engineering work, making the whole process more efficient and innovative. We&#8217;ll keep things domain-agnostic, with a slight bias to common web development scenarios for examples, but these ideas apply to many domains of software (from cloud services to mobile apps).</p><h3><strong>1. Requirements &amp; ideation</strong></h3><p>The first step in any project is figuring out <em>what</em> to build. AI can act as a brainstorming partner and a requirements analyst. </p><p>For example, if you have a high-level product idea (&#8220;We need an app for X&#8221;), you can ask an AI to help <strong>brainstorm features or user stories</strong>. A prompt like: <em>&#8220;I need to design a mobile app for a personal finance tracker. What features should it have for a great user experience?&#8221;</em> can yield a list of features (e.g., budgeting, expense categorization, charts, reminders) that you might not have initially considered.</p><p>The AI can aggregate ideas from countless apps and articles it has ingested. Similarly, you can task the AI with writing preliminary <strong>user stories or use cases</strong>: <em>&#8220;List five user stories for a ride-sharing service&#8217;s MVP.&#8221;</em> This can jumpstart your planning with well-structured stories that you can refine. AI can also help clarify requirements: if a requirement is vague, you can ask <em>&#8220;What questions should I ask about this requirement to clarify it?&#8221;</em> - and the AI will propose the key points that need definition (e.g., for &#8220;add security to login&#8221;, AI might suggest asking about 2FA, password complexity, etc.). This ensures you don&#8217;t overlook things early on.</p><p>Another ideation use: <strong>competitive analysis</strong>. You could prompt: <em>&#8220;What are the common features and pitfalls of task management web apps? Provide a summary.&#8221;</em> The AI will list what such apps usually do and common complaints or challenges (e.g., data sync, offline support). This information can shape your requirements to either include best-in-class features or avoid known issues. Essentially, AI can serve as a <strong>research assistant</strong>, scanning the collective knowledge base so you don&#8217;t have to read 10 blog posts manually.</p><p>Of course, all AI output needs critical evaluation - use your judgment to filter which suggestions make sense in context. But at the early stage, quantity of ideas can be more useful than quality, because it gives you options to discuss with your team or stakeholders. Engineers with an AI-native mindset often walk into planning meetings with an AI-generated list of ideas, which they then augment with their own insights. This accelerates the discussion and shows initiative.</p><p>AI can also help <strong>non-technical stakeholders</strong> at this stage. If you&#8217;re a tech lead working with, say, a business analyst, you might generate a draft product requirements document (PRD) with AI&#8217;s help and then share it for review. It&#8217;s faster to edit a draft than to write from scratch. Google&#8217;s prompt guide suggests even role-specific prompts for such cases - e.g., <em>&#8220;Act as a business analyst and outline the requirements for a payroll system upgrade&#8221;</em>. The result gives everyone something concrete to react to. In sum, in requirements and ideation, AI is about casting a wide net of possibilities and organizing thoughts, which provides a strong starting foundation.</p><h3><strong>2. System design &amp; architecture</strong></h3><p>Once requirements are in place, designing the system is next. Here, AI can function as a <strong>sounding board for architecture</strong>. For instance, you might describe the high-level architecture you&#8217;re considering - &#8220;We plan to use a microservice for the user service, an API gateway, and a React frontend&#8221; - and ask the AI for its opinion: <em>&#8220;What are the pros and cons of this approach? Any potential scalability issues?&#8221;</em> An AI well-versed in tech will enumerate points perhaps similar to what an experienced colleague might say (e.g., microservices allow independent deployment but add complexity in devops, etc.). This is useful to validate your thinking or uncover angles you missed.</p><p>AI can also help with <strong>specific design questions</strong>: &#8220;Should we choose SQL or NoSQL for this feature store?&#8221; or &#8220;What&#8217;s a robust architecture for real-time notifications in a chat app?&#8221; It will provide a rationale for different choices. While you shouldn&#8217;t take its answer as gospel, it can surface considerations (latency, consistency, cost) that guide your decision. Sometimes hearing the reasoning spelled out helps you make a case to others or solidify your own understanding. Think of it as rubber-ducking your architecture to an AI - except the duck talks back with fairly reasonable points!</p><p>Another use is generating <strong>diagrams or mappings</strong> via text. There are tools where if you describe an architecture, the AI can output a pseudo-diagram (in Mermaid markdown, for example) that you can visualize. For example: <em>&#8220;Draw a component diagram: clients -&gt; load balancer -&gt; 3 backend services -&gt; database.&#8221;</em> The AI could produce a Mermaid code block that renders to a diagram. This is a quick way to go from concept to documentation. Or you can ask for <strong>API design suggestions</strong>: <em>&#8220;Design a REST API for a library system with endpoints for books, authors, and loans.&#8221;</em> The AI might list endpoints (GET /books, POST /loans, etc.) along with example payloads, which can be a helpful starting point that you then adjust.</p><p>A particularly powerful use of AI at this stage is <strong>validating assumptions</strong> by asking it to think of failure cases. For example: <em>&#8220;We plan to use an in-memory cache for session data in one data center. What could go wrong?&#8221;</em> The AI might remind you of scenarios like cache crashes, data center outage, or scaling issues. It&#8217;s a bit like a <strong>risk checklist</strong> generator. This doesn&#8217;t replace doing a proper design review, but it&#8217;s a nice supplement to catch obvious pitfalls early.</p><p>On the flip side, if you encounter pushback on a design and need to articulate your reasoning, AI can help you <strong>frame arguments clearly</strong>. You can feed the context to AI and have it help articulate the concerns and explore alternatives. The AI will enumerate issues and you can use that to formulate a respectful, well-structured response. In essence, AI can bolster your <strong>communication</strong> around design, which is as important as the design itself in team settings.</p><p>A more profound shift is that <strong>we&#8217;re moving to spec-driven development</strong>. It&#8217;s not about code-first; in fact, we&#8217;re practically <a href="https://x.com/danshipper/status/1937888424800719283">hiding the code</a>! Modern software engineers are creating (or asking AI for) <a href="https://x.com/_philschmid/status/1937887668710355265">implementation plans first</a>. Some start projects by asking the tool to create a technical design (saved to a markdown file) and an implementation plan (similarly saved locally and fed in later).&#8221; </p><p><a href="https://www.ignorance.ai/p/ai-at-pulley">Some</a> note that they find themselves &#8220;thinking less about writing code and more about writing specifications - translating the ideas in my head into clear, repeatable instructions for the AI.&#8221; These design specs have <a href="https://writing.nikunjk.com/p/the-work-behind-the-work-is-dead">massive follow-on value</a>; they can be used to generate the PRD, the first round of product documentation, deployment manifests, marketing messages, and even training decks for the sales field. Today&#8217;s best engineers are great at documenting intent that in-turn spawns the technical solution.</p><p>This strategic application of AI has profound implications for what defines a senior engineer today. It marks a shift from being a superior problem-solver to becoming a forward-thinking solution-shaper. A senior AI-native engineer doesn't just use AI to write code faster; they use it to see around corners - to model future states, analyze industry trends, and shape technical roadmaps that anticipate the next wave of innovation. Leveraging AI for this kind of architectural foresight is no longer just a nice-to-have; it's rapidly becoming a core competency for technical leadership.</p><h3><strong>3. Implementation (Coding)</strong></h3><p>This is the phase most people immediately think of for AI assistance, and indeed it&#8217;s one of the most transformative. We covered in earlier sections how to use coding assistants in your IDE, so here let&#8217;s structure it around typical coding sub-tasks:</p><ul><li><p><strong>Scaffolding and setup:</strong> Setting up new modules, libraries, or configuration files can be tedious. AI can generate boilerplate configs (Dockerfiles, CI pipelines, ESLint configs, etc.) based on descriptions. For example, <em>&#8220;Provide a minimal Vite and TypeScript config for a React app&#8221;</em> may yield decent config files that you might only need to tweak slightly. Similarly, if you need to use a new library (say authentication or logging), you can ask AI, <em>&#8220;Show an example of integrating Library X into an Express.js server.&#8221;</em> It often can produce a minimal working example, saving you from combing through docs for the basics.</p></li><li><p><strong>Feature implementation:</strong> When coding a feature, use AI as a partner. You might start writing a function and hit a moment of doubt - you can simply ask, <em>&#8220;What&#8217;s the best way to implement X?&#8221;</em> Perhaps you need to parse a complex data format - the AI might even recall the specific API you need to use. It&#8217;s like having Stack Overflow threads summarized for you on the fly. Many AI-native devs actually use a rhythm: they outline a function in comments (steps it should take), then prompt the AI to fill it in code. This often yields a nearly complete function which you then adjust. It&#8217;s a different way of coding: you focus on logic and intent, the AI fleshes out syntax and repetitive parts.</p></li><li><p><strong>Code reuse and referencing:</strong> Another everyday scenario - you vaguely remember writing similar code before or know there&#8217;s an algorithm for this. You can describe it and ask the AI. For instance, <em>&#8220;I need to remove duplicates from a list of objects in Python, treating objects with same id as duplicates. How to do that efficiently?&#8221;</em> And if the first answer isn&#8217;t what you need, you can refine or just say &#8220;that&#8217;s not quite it, I need to consider X&#8221; and it will try again. This interactive Q&amp;A for coding is a huge quality-of-life improvement.</p></li><li><p><strong>Maintaining consistency and patterns:</strong> In a large project, you often follow patterns (say a certain way to handle errors or logging). AI can be taught these if you provide context (some tools let you add a style guide or have it read parts of your repo). Even without explicit training, if you point the AI to an existing file as an example, you can prompt <em>&#8220;Create a new module similar to this one but for [some new entity]&#8221;</em>. It will mimic the style and structure, which means the new code fits in naturally. It&#8217;s like having an assistant who read your entire codebase and documentation and always writes code following those conventions (one day, AI might truly do this seamlessly with features like the Model Context Protocol to plug into different environments).</p></li><li><p><strong>Generating tests alongside code:</strong> A highly effective habit is to have AI generate unit tests immediately after writing a piece of code. Many tools (Cursor, Copilot, etc.) can suggest tests either on demand or even automatically. For example, after writing a function, you could prompt: <em>&#8220;Generate a unit test for the above function, covering edge cases.&#8221;</em> The AI will create a test method or test case code. This serves two purposes: it gives you quick tests, and it also serves as a quasi-review of your code (if the AI&#8217;s expected behavior in tests differs from your code, maybe your code has an issue or the requirements were misunderstood). It&#8217;s like doing TDD where the AI writes the test and you verify it matches intent. Even if you prefer writing tests yourself, AI can suggest additional cases you might miss (like large input, weird characters, etc.), acting as a safety net.</p></li><li><p><strong>Debugging assistance:</strong> When you hit a bug or an error message, AI can help diagnose it. For instance, you can copy an error stack trace or exception and ask, <em>&#8220;What might be causing this error?&#8221;</em> Often, it will explain in plain terms what the error means and common causes. If it&#8217;s a runtime bug without obvious errors, you can describe the behavior: <em>&#8220;My function returns null for input X when it shouldn&#8217;t. Here&#8217;s the code snippet&#8230; Any idea why?&#8221;</em> The AI might spot a logic flaw. It&#8217;s not guaranteed, but even just explaining your code in writing (to the AI) sometimes makes the solution apparent to you - and the AI&#8217;s suggestions can confirm it. Some AI tools integrated into runtime (like tools in Replit) can even execute code and check intermediate values, acting like an interactive debugger. You could say, &#8220;Run the above code with X input and show me variable Y at each step&#8221; and it will simulate that. This is still early, but it&#8217;s another dimension of debugging that will grow.</p></li><li><p><strong>Performance tuning &amp; refactoring:</strong> If you suspect a piece of code is slow or could be cleaner, you can ask the AI to refactor it for performance or readability. For instance: <em>&#8220;Refactor this function to reduce its time complexity&#8221;</em> or <em>&#8220;This code is doing a triple nested loop, can you make it more efficient?&#8221;</em> The AI might recognize a chance to use a dictionary lookup or a better algorithm (e.g., going from O(n^2) to O(n log n)). Or for readability: <em>&#8220;Refactor this 50-line function into smaller functions and add comments.&#8221;</em> It will attempt to do so. Always double-check the changes (especially for subtle bugs), but it&#8217;s a great way to see alternative implementations quickly. It&#8217;s like having a second pair of eyes that isn&#8217;t tired and can rewrite code in seconds for comparison.</p></li></ul><p>In all these coding scenarios, the theme is <strong>AI accelerates the mechanical parts of coding and provides just-in-time knowledge</strong>, while you remain the decision-maker and quality control. It&#8217;s important to interject a note on <strong>version control and code reviews</strong>: treat AI contributions like you would a junior developer&#8217;s pull request. Use git diligently, diff the changes the AI made, run your test suite after major edits, and do code reviews (even if you&#8217;re reviewing code the AI wrote for you!). This ensures robustness in your implementation phase.</p><h3><strong>4. Testing &amp; quality assurance</strong></h3><p>Testing is an area where AI can shine by reducing the toil. We already touched on unit test generation, but let&#8217;s dive deeper:</p><ul><li><p><strong>Unit tests generation:</strong> You can systematically use AI to generate unit tests for existing code. One approach: take each public function or class in your module, and prompt AI with a short description of what it should do (if there isn&#8217;t clear documentation, you might have to infer or write a one-liner spec) and ask for a test. For example, <em>&#8220;Function normalizeName(name) should trim whitespace and capitalize the first letter. Write a few PyTest cases for it.&#8221;</em> The AI will output tests including typical and edge cases like empty string, all caps input, etc. This is extremely helpful for legacy code where tests are missing - it&#8217;s like AI-driven test retrofitting. Keep in mind the AI doesn&#8217;t <em>know</em> your exact business logic beyond what you describe, so verify that the asserted expectations match the intended behavior. But even if they don&#8217;t, it&#8217;s informative: an AI might make an assumption about the function that&#8217;s wrong, which highlights that the function&#8217;s purpose wasn&#8217;t obvious or could be misused. You then improve either the code or clarify the test.</p></li><li><p><strong>Property-based and fuzz testing:</strong> You can use AI to suggest properties for property-based tests. For instance, <em>&#8220;What properties should hold true for a sorting function?&#8221;</em> might yield answers like &#8220;the output list is sorted, has same elements as input, idempotent if run twice&#8221; etc. You can turn those into property tests with frameworks like Hypothesis or fast-check. The AI can even help write the property test code. Similarly, for fuzzing or generating lots of input combinations, you could ask AI to generate a variety of inputs in a format. <em>&#8220;Give me 10 JSON objects representing edge-case user profiles (some missing fields, some with extra fields, etc.)&#8221;</em> - use those as test fixtures to see if your parser breaks.</p></li><li><p><strong>Integration and end-to-end tests:</strong> For more complex tests like API endpoints or UI flows, AI can assist by outlining test scenarios. <em>&#8220;List some end-to-end test scenarios for an e-commerce checkout process.&#8221;</em> It will likely enumerate scenarios: normal purchase, invalid payment, out-of-stock item, etc. You can then script those. If you&#8217;re using a test framework like Cypress for web UI, you could ask AI to write a test script given a scenario description. It might produce a pseudo-code that you tweak to real code (Cypress or Selenium commands). This again saves time on boilerplate and ensures you consider various paths.</p></li><li><p><strong>Test data generation:</strong> Creating realistic test data (like a valid JSON of a complex object) is mundane. AI can generate fake data that looks real. For example, <em>&#8220;Generate an example JSON for a university with departments, professors, and students.&#8221;</em> It will fabricate names and arrays etc. This data can then be used in tests or to manually try out an API. It&#8217;s like having an infinite supply of realistic dummy data without writing it yourself. Just be mindful of any privacy - if you prompt with real data, ensure you anonymize it first.</p></li><li><p><strong>Exploratory testing via agents:</strong> A frontier area: using AI agents to simulate users or adversarial inputs. There are experimental tools where an AI can crawl your web app like a user, testing different inputs to see if it can break something. Anthropic&#8217;s <em>Claude Code</em> best practices talk about multi-turn debugging, where the AI iteratively finds and fixes issues. You might be able to say, &#8220;Here&#8217;s my function, try different inputs to make it fail&#8221; and the AI will do a mini fuzz test mentally. This isn&#8217;t foolproof, but as a concept it points to AI helping in QA beyond static test cases - by actively trying to find bugs like a QA engineer would.</p></li><li><p><strong>Reviewing test coverage:</strong> If you have tests and want to ensure they cover logic, you can ask AI to analyze if certain scenarios are missing. For example, provide a function or feature description and the current tests, and ask <em>&#8220;Are there any important test cases not covered here?&#8221;</em>. The AI might notice, e.g., &#8220;the tests didn&#8217;t cover when input is null or empty&#8221; or &#8220;no test for negative numbers&#8221;, etc. It&#8217;s like a second opinion on your test suite. It won&#8217;t know if something is truly missing unless obvious, but it can spot some gaps.</p></li></ul><p>The end goal is higher quality with less manual effort. Testing is typically something engineers <em>know</em> they should do more of, but time pressure often limits it. AI helps remove some friction by automating the creation of tests or at least the scaffolding of them. This makes it likelier you&#8217;ll have a more robust test suite, which pays off in fewer regressions and easier maintenance.</p><h3><strong>5. Debugging &amp; maintenance</strong></h3><p>Bugs and maintenance tasks consume a large portion of engineering time. AI can reduce that burden too:</p><ul><li><p><strong>Explaining legacy code:</strong> When you inherit a legacy codebase or revisit code you wrote long ago, understanding it is step one. You can use AI to <strong>summarize or document code</strong> that lacks clarity. For instance, copy a 100-line function and ask, <em>&#8220;Explain in simple terms what this function does step by step.&#8221;</em> The AI will produce a narrative of the code&#8217;s logic. This often accelerates your comprehension, especially if the code is dense or not well-commented. It might also identify what the code is supposed to do versus what it actually does (catching subtle bugs). Some tools integrate this - you can click a function and get an AI-generated docstring or summary. This is invaluable when you maintain systems with scarce documentation.</p></li><li><p><strong>Identifying the root cause:</strong> When facing a bug report like &#8220;Feature X is crashing under condition Y&#8221; you can involve AI as a rubber duck to reason through the possible causes. Describe the situation and the code path as you know it, and ask for theories: <em>&#8220;Given this code snippet and the error observed, what could be causing the null pointer exception?&#8221;</em> The AI might point out, &#8220;if data can be null then data.length would throw that exception, check if that can happen in condition Y.&#8221; It&#8217;s akin to having a knowledgeable colleague to bounce ideas off of, even if they can&#8217;t see your whole system, they often generalize from known patterns. This can save time compared to going down the wrong path in debugging.</p></li><li><p><strong>Fixing code with AI suggestions:</strong> If you localize a bug in a piece of code, you can simply tell the AI to fix it. <em>&#8220;Fix the bug where this function fails on empty input.&#8221;</em> The AI will provide a patch (like adding a check for empty input). You still have to ensure that&#8217;s the correct fix and doesn&#8217;t break other things, but it&#8217;s quicker than writing it yourself, especially for trivial fixes. Some IDEs do this automatically: for example, if a test fails, an AI could suggest a code change to make the test pass. One must be careful here - always run tests after accepting such changes to ensure no side effects. But for maintenance tasks like upgrading a library version and fixing deprecated calls, AI can be a huge help (e.g., &#8220;We upgraded to React Router v7, update this v6 code to v7 syntax&#8221; - it will rewrite the code using the new API, a big time saver).</p></li><li><p><strong>Refactoring and improving old code:</strong> Maintenance often involves refactoring for clarity or performance. You can employ AI to do large-scale refactors semi-automatically. For instance, <em>&#8220;Our code uses a lot of callback-based async. Convert these examples to async/await syntax.&#8221;</em> It can show you how to update a representative snippet, which you can then apply across code (perhaps with a search/replace or with the AI&#8217;s help file by file). Or at a smaller scale, <em>&#8220;Refactor this class to use dependency injection instead of hardcoding the database connection.&#8221;</em> The AI will outline or even implement a cleaner pattern. This is how AI helps you <strong>keep the codebase modern and clean</strong> without spending excessive time on rote transformations.</p></li><li><p><strong>Documentation and knowledge management:</strong> Maintaining software also means keeping docs up to date. AI can make documenting changes easier. After implementing a feature or fix, you can ask AI to draft a short summary or update documentation. For example, <em>&#8220;Generate a changelog entry: Fixed the payment module to handle expired credit cards by adding a retry mechanism.&#8221;</em> It will produce a nicely worded entry. If you need to update an API doc, you can feed it the new function signature and ask for a description. The AI may not know your entire system&#8217;s context, but it can create a good first draft of docs which you then tweak to be perfectly accurate. This lowers the activation energy to write documentation.</p></li><li><p><strong>Communication with team/users:</strong> Maintenance involves communication - explaining to others what changed, what the impact is, etc. AI can help write <strong>release notes</strong> or <strong>migration guides</strong>. E.g., <em>&#8220;Write a short guide for developers migrating from API v1 to v2 of our service, highlighting changed endpoints.&#8221;</em> If you give it a list of changes, it can format it into a coherent guide. For user-facing notes, <em>&#8220;Summarize these bug fixes in non-technical terms for our monthly update.&#8221;</em> Once again, you&#8217;ll refine it, but the heavy lifting of prose is handled. This ensures important information actually gets communicated (since writing these can often fall by the wayside when engineers are busy).</p></li></ul><p>In essence, <strong>AI can be thought of as an ever-present helper throughout maintenance</strong>. It can search through code faster than you (if integrated), recall how something should work, and even keep an eye out for potential issues. For example, if you let an AI agent scan your repository, it might flag suspicious patterns (like an API call made without error handling in many places).</p><p>Anthropic&#8217;s <a href="https://www.anthropic.com/engineering/claude-code-best-practices">approach</a> with a CLAUDE.md to give the AI context about your repo is one technique to enable more of this. In time, we may see AI tools that proactively create tickets or PRs for certain classes of issues (security or style). As an AI-native engineer, you will welcome these assists - they handle the drudgery, you handle the final judgment and creative problem-solving.</p><h3><strong>6. Deployment &amp; operations</strong></h3><p>Even after code is written and tested, deploying and operating software is a big part of the lifecycle. AI can help here, too:</p><ul><li><p><strong>Infrastructure as code:</strong> Tools like Terraform or Kubernetes manifests are essentially code - and AI can generate them. If you need a quick Terraform script for an AWS EC2 with certain settings, you can prompt, <em>&#8220;Write a Terraform configuration for an AWS EC2 instance with Ubuntu, t2.micro, in us-west-2.&#8221;</em> It&#8217;ll give a reasonable config that you adjust. Similarly, <em>&#8220;Create a Kubernetes Deployment and Service for a Node.js app called myapp, image from ECR, 3 replicas.&#8221;</em> The YAML it produces will be a good starting point. This saves a lot of time trawling through documentation for syntax. One caution: verify all credentials and security groups etc., but the structure will be there.</p></li><li><p><strong>CI/CD pipelines:</strong> If you&#8217;re setting up a continuous integration (CI) workflow (like a GitHub Actions YAML or a Jenkins pipeline), ask AI to draft it. For example: <em>&#8220;Write a GitHub Actions workflow YAML that lints, tests, and deploys a Python Flask app to Heroku on push to main.&#8221;</em> The AI will outline the jobs and steps pretty well. It might not get every key exactly right (since these syntaxes update), but it&#8217;s far easier to correct a minor key name than to write the whole file yourself. As CI pipelines can be finnicky, having the AI handle the boilerplate and you just fix small errors is a huge time saver.</p></li><li><p><strong>Monitoring and alert queries:</strong> If you use monitoring tools (like writing a Datadog query or a Grafana alert rule), you can describe what you want and let the AI propose the config. E.g., <em>&#8220;In PromQL, how do I write an alert for if error_rate &gt; 5% over 5 minutes on service X?&#8221;</em> It will craft a query that you can plug in. This is particularly handy because these domain-specific languages (like PromQL, Splunk query language, etc.) can be obscure - AI has likely seen examples and can adapt them for you.</p></li><li><p><strong>Incident analysis:</strong> When something goes wrong in production, you often have logs, metrics, traces to look at. AI can assist in analyzing those. For instance, paste a block of log around the time of failure and ask <em>&#8220;What stands out as a possible issue in these logs?&#8221;</em>. It might pinpoint an exception stack trace in the noise or a suspicious delay. Or describe the symptom and ask <em>&#8220;What are possible root causes of high CPU usage on the database at midnight?&#8221;</em> It could list scenarios (backup running, batch job, etc.), helping your investigation. OpenAI&#8217;s enterprise guide emphasizes using AI to surface insights from data and logs - this is becoming an emerging use-case: AI ops or AIOps.</p></li><li><p><strong>ChatOps and automation:</strong> Some teams integrate AI into their ops chat. For example, a Slack bot backed by an LLM that you can ask, &#8220;Hey, what&#8217;s the status of the latest deploy? Any errors?&#8221; and it could fetch data and summarize. While this requires some setup (wiring your CI or monitoring into an AI-friendly format), it&#8217;s an interesting direction. Even without that, you can manually do it: copy some output (like test results or deployment logs) and have AI summarize it or highlight failures. It&#8217;s a bit like a personal assistant that reads long scrollbacks of text for you and says &#8220;here&#8217;s the gist: 2 tests failed, looks like a database connection issue.&#8221; You then know where to focus.</p></li><li><p><strong>Scaling and capacity planning:</strong> If you need to reason about scaling (e.g., &#8220;If each user does X requests and we have Y users, how many instances do we need?&#8221;), AI can help do the math and even account for factors you mention. This isn&#8217;t magic - it&#8217;s just calculation and estimation, but phrasing it to AI can sometimes yield a formatted plan or table, saving you some mental load. Additionally, AI might recall known benchmarks (like &#8220;Usually a t2.micro can handle ~100 req/s for a simple app&#8221;) which can aid rough capacity planning. Always validate such numbers from official sources, but it&#8217;s a quick first estimate.</p></li><li><p><strong>Documentation &amp; runbooks:</strong> Finally, operations teams rely on runbooks - documents outlining what to do in certain scenarios. AI can assist by drafting these from incident post-mortems or instructions. If you solved a production issue, you can feed the steps to AI and ask for a well-structured procedure write-up. It will give a neat sequence of steps in markdown that you can put in your runbook repository. This lowers the friction to document operational knowledge, which is often a big win for teams (tribal knowledge gets documented in accessible form). Anthropic&#8217;s enterprise trust guide emphasizes process and people - having clear AI-assisted docs is one way to spread knowledge responsibly.</p></li></ul><p>By integrating AI throughout deployment and ops, you essentially have a co-pilot not just in coding but in DevOps. It reduces the lookup time (how often do we google for a particular YAML snippet or AWS CLI command?), providing directly usable answers. However, always remember to double-check anything AI suggests when it comes to infrastructure - a small mistake in a Terraform script could be costly. Validate in a safe environment when possible. Over time, as you fine-tune prompts or use certain verified AI &#8220;recipes&#8221;, you&#8217;ll gain confidence in which suggestions are solid.</p><div><hr></div><p>As we&#8217;ve seen, across the entire lifecycle from conception to maintenance, there are opportunities to inject AI assistance. </p><p>The pattern is: <strong>AI takes on the grunt work and provides knowledge, while you provide direction, oversight, and final judgment.</strong> </p><p>This elevates your role - you spend more time on creative design, critical thinking, and decision-making, and less on boilerplate and hunting for information. The result is often a faster development cycle and, if managed well, improved quality and developer happiness. In the next section, we&#8217;ll discuss some best practices to ensure you&#8217;re using AI effectively and responsibly, and how to continuously improve your AI-augmented workflow.</p><h2><strong>Best Practices for effective and responsible AI-augmented engineering</strong></h2><p>Using AI in software development can be transformative, but to truly reap the benefits, one must follow best practices and avoid common pitfalls. In this section, we distill key principles and guidelines for being highly effective with AI in your engineering workflow. These practices ensure that AI remains a powerful ally rather than a source of errors or false confidence.</p><h3><strong>1. Craft Clear, contextual prompts</strong></h3><p>We&#8217;ve said it multiple times: <strong>effective prompting is critical</strong>. Think of writing prompts as a new core skill in your toolkit - much like writing good code or good commit messages. A well-crafted prompt can mean the difference between an AI answer that is spot-on and one that is useless or misleading. As a best practice, <strong>always provide the AI with sufficient context</strong>. If you&#8217;re asking about code, include the relevant code snippet or a description of the function&#8217;s purpose. Instead of: &#8220;How do I optimize this?&#8221; say &#8220;Given this code [include snippet], how can I optimize it for speed, especially the sorting part?&#8221; This helps the AI focus on what you care about.</p><p>Be specific about the desired output format too. If you want a JSON, say so; if you expect a step-by-step explanation, mention that. For example, <em>&#8220;Explain why this test is failing, step by step&#8221;</em> or <em>&#8220;Return the result as a JSON object with keys X, Y&#8221;</em>. Such instructions yield more predictable, useful results. A great technique from prompt engineering is to <strong>break the task into steps or provide an example</strong>. You might prompt: &#8220;First, analyze the input. Then propose a solution. Finally, give the solution code.&#8221; This structure can guide the AI through complex tasks. Google&#8217;s advanced prompt engineering guide covers methods like chain-of-thought prompting and providing examples to reduce guesswork. If you ever get a completely off-base answer, don&#8217;t just sigh - <strong>refine the prompt and try again</strong>. Sometimes iterating on the prompt (&#8220;Actually ignore the previous instruction about X and focus only on Y&#8230;&#8221;) will correct the course.</p><p>It&#8217;s also worthwhile to maintain a <strong>library of successful prompts</strong>. If you find a way of asking that consistently yields good results (say, a certain format for writing test cases or explaining code), save it. Over time, you build a personal playbook. Some engineers even have a text snippet manager for prompts. Given that companies like Google have published extensive prompt guides, you can see how valued this skill is becoming. In short: <strong>invest in learning to speak AI&#8217;s language effectively</strong>, because it pays dividends in quality of output.</p><h3><strong>2. Always review and verify AI outputs </strong></h3><p>No matter how impressive the AI&#8217;s answer is, <strong><a href="https://addyo.substack.com/p/the-trust-but-verify-pattern-for">never blindly trust it</a></strong>. This mantra cannot be overstated. Treat AI output as you would a human junior developer&#8217;s work: likely useful, but in need of review and testing. There are countless anecdotes of bugs slipping in because someone accepted AI code without understanding it. Make it a habit to inspect the changes the AI suggests. If it wrote a piece of code, walk through it mentally or with a debugger. Add tests to validate it (which AI can help write, as we discussed). If it gave you an explanation or analysis, cross-check key points. For instance, if AI says &#8220;This API is O(N^2) and that&#8217;s causing slowdowns&#8221; go verify the complexity from official docs or by reasoning it out yourself.</p><p>Be particularly wary of <strong>factually precise-looking statements</strong>. AI has a tendency to hallucinate details - like function names or syntaxes that look plausible but don&#8217;t actually exist. If an AI answer cites an API or a config key, confirm it in official documentation. In an enterprise context, never trust AI with company-specific facts (like &#8220;according to our internal policy&#8230;&#8221;) unless you fed those to it and it&#8217;s just rephrasing them.</p><p>For code, a good practice is to run whatever quick checks you have: linters, type-checkers, test suites. AI code might not adhere to your style guidelines or could use deprecated methods. Running a linter/formatter not only fixes style but can catch certain errors (e.g., unused variables, etc.). Some AI tools integrate this - for example, an AI might run the code in a sandbox and adjust if it sees exceptions, but that&#8217;s not foolproof. So you as the engineer must be the safety net.</p><p>In security-sensitive or critical systems, apply extra caution. Don&#8217;t use AI to generate secrets or credentials. If AI provides a code snippet that handles authentication or encryption, double-check it against known secure practices. There have been cases of AI coming up with insecure algorithms because it optimized for passing tests rather than actual security. The <strong>responsibility lies with you</strong> to ensure all outputs are safe and correct.</p><p>One helpful tip: <strong>use AI to verify AI</strong>. For example, after getting a piece of code from the AI, you can ask the same (or another) AI, &#8220;Is there any bug or security issue in this code?&#8221; It might point out something you missed (like, &#8220;It doesn&#8217;t sanitize input here&#8221; or &#8220;This could overflow if X happens&#8221;). While this second opinion from AI isn&#8217;t a guarantee either, it can be a quick sanity check. OpenAI and Anthropic&#8217;s guides on coding even suggest this approach of iterative prompting and review - essentially debugging with the AI&#8217;s help.</p><p>Finally, maintain a healthy skepticism. If something in the output strikes you as odd or too good to be true, investigate further. AI is great at sounding confident. Part of becoming AI-native is learning where the AI is strong and where it tends to falter. Over time, you&#8217;ll gain an intuition (e.g., &#8220;I know LLMs tends to mess up date math, I&#8217;ll double-check that part&#8221;). This intuition, combined with thorough review, keeps you in the driver&#8217;s seat.</p><h3><strong>3. Manage Scope: Use AI to amplify, not to autopilot entire projects</strong></h3><p>While the idea of clicking a button and having AI build an entire system is alluring, in practice it&#8217;s rarely that straightforward or desirable. A best practice is to <strong>use AI to amplify your productivity, not to completely automate what you don&#8217;t oversee</strong>. In other words, keep a human in the loop for any non-trivial outcome. If you use an autonomous agent to generate an app (as we saw with prototyping tools), treat the output as a prototype or draft, not a finished product. Plan to <strong>iterate</strong> on it yourself or with your team.</p><p>Break big tasks into smaller AI-assisted chunks. For instance, instead of saying &#8220;Build me a full e-commerce website&#8221; you might break it down: use AI to generate the frontend pages first (and you review them), then use AI to create a basic backend (review it), then integrate and refine. This modular approach ensures you maintain understanding and control. It also leverages AI&#8217;s strengths on focused tasks, rather than expecting it to juggle very complex interdependent tasks (which is often where it may drop something important). Remember that AI doesn&#8217;t truly &#8220;understand&#8221; your project&#8217;s higher objectives; that&#8217;s your job as the engineer or tech lead. You decide the architecture and constraints, and then use AI as a powerful assistant to implement parts of that vision.</p><p><strong>Resist the temptation of over-reliance.</strong> It can be tempting to just ask the AI every little thing, even stuff you know, out of convenience. While it&#8217;s fine to use it for rote tasks, make sure you&#8217;re still <strong>learning and understanding</strong>. An AI-native engineer doesn&#8217;t turn off their brain - quite the opposite, they use AI to free their brain for more important thinking. For example, if AI writes a complex algorithm for you, take the time to understand that algorithm (or at least verify its correctness) before deploying. Otherwise, you might accumulate &#8220;AI technical debt&#8221; - code that works but no one truly groks, which can bite you later.</p><p>One way to manage scope is to set <strong>clear boundaries for AI agents</strong>. If you use something like Cline or Devin (autonomous coding agents), configure them with your rules (e.g., don&#8217;t install new dependencies without asking, don&#8217;t make network calls, etc.). And use features like dry-run or plan mode. For instance, have the agent show you its plan (like Cline does) and approve it step by step. This ensures the AI doesn&#8217;t go on a tangent or take actions you wouldn&#8217;t. Essentially, you act as a project manager for the AI worker - you wouldn&#8217;t let a junior dev just commit straight to main without code review; likewise, don&#8217;t let an AI do that.</p><p>By keeping AI&#8217;s role scoped and supervised, you avoid situations where something goes off the rails unnoticed. You also maintain your own engagement with the project, which is critical for quality and for your own growth. The flip side is also true: do use AI for all those <em>small</em> things that eat time but don&#8217;t need creative heavy lifting. Let it write the 10th variant of a CRUD endpoint or the boilerplate form validation code while you focus on the tricky integration logic or the performance tuning that requires human insight. This division of labor - AI for grunt work, human for oversight and creative problem solving - is a sweet spot in current AI integration.</p><h3><strong>4. Continue learning and stay updated</strong></h3><p>The field of AI and the tools available are evolving incredibly fast. Being &#8220;AI-native&#8221; today is different from what it will be a year from now. So a key principle is: <strong>never stop learning</strong>. Keep an eye on new tools, new model capabilities, and new best practices. Subscribe to newsletters or communities (there are developer newsletters dedicated to AI tools for coding). Share experiences with peers: what prompt strategies worked for them, what new agent framework they tried, etc. The community is figuring this out together, and being engaged will keep you ahead.</p><p>One practical way to learn is to integrate AI into side projects or hackathons. The stakes are lower, and you can freely explore capabilities. Try building something purely with AI assistance as an experiment - you&#8217;ll discover both its superpowers and its pain points, which you can then apply back to your day job carefully. Perhaps in doing so, you&#8217;ll figure out a neat workflow (like chaining a prompt from GPT to Copilot in the editor) that you can teach your team. In fact, <strong>mentoring others</strong> in your team on AI usage will also solidify your own knowledge. Run a brown bag session on prompt engineering, or share a success story of how AI helped solve a hairy problem. This not only helps colleagues but often they will share their own tips, leveling up everyone.</p><p>Finally, invest in your fundamental skills as well. AI can automate a lot, but the better your foundation in computer science, system design, and problem-solving, the better questions you&#8217;ll ask the AI and the better you&#8217;ll assess its answers. The human creativity and deep understanding of systems are not being replaced - in fact, they&#8217;re more important, because now you&#8217;re guiding a powerful tool. As one of my articles suggests, focus on <em><a href="https://news.ycombinator.com/item?id=43361801">maximizing the &#8220;human 30%&#8221;</a></em><a href="https://news.ycombinator.com/item?id=43361801"> </a>- the portion of the work where human insight is irreplaceable. That&#8217;s things like defining the problem, making judgment calls, and critical debugging. Strengthen those muscles through continuous learning, and let AI handle the rote 70%.</p><h3><strong>5. Collaborate and establish team practices</strong></h3><p>If you&#8217;re working in a team setting (most of us are), it&#8217;s important to <strong>collaborate on AI usage practices</strong>. Share what you learn with teammates and also listen to their experiences. Maybe you found that using a certain AI tool improved your commit velocity; propose it to the team to see if everyone wants to adopt it. Conversely, be open to guidelines - for example, some teams decide &#8220;We will not commit AI-generated code without at least one human review and testing&#8221; (a sensible rule). Consistency helps; if everyone follows similar approaches, the codebase stays coherent and people trust each other&#8217;s AI-augmented contributions.</p><p>You might even formalize this into team conventions. For instance, if using AI for code generation, some teams annotate the PR or code comments like // Generated with Gemini, needs review. This transparency helps code reviewers focus attention. It&#8217;s similar to how we treated code from automated tools (like &#8220;this file was scaffolded by Rails generator&#8221;). Knowing something was AI-generated might change how you review - perhaps more thoroughly in certain aspects.</p><p>Encourage pair programming with AI. A neat practice is <em>AI-driven code review</em>: when someone opens a pull request, they might run an AI on the diff to get an initial review comments list, and then use that to refine the PR before a human even sees it. As a team, you could adopt this as a step (with caution that AI might not catch all issues nor understand business context). Another collaborative angle is documentation: maybe maintain an internal FAQ of &#8220;How do I ask AI to do X for our codebase?&#8221; - e.g., how to prompt it with your specific stack. This could be part of onboarding new team members to AI usage in your project.</p><p>On the flip side, respect those who are cautious or skeptical of AI. Not everyone may be immediately comfortable or convinced. Demonstrating results in a non-threatening way works better than evangelizing abstractly. Show how it caught a bug or saved a day of work by drafting tests. Be honest about failures too (e.g., &#8220;We tried AI for generating that module, but it introduced a subtle bug we caught later. Here&#8217;s what we learned.&#8221;). This builds collective wisdom. A team that learns together will integrate AI much more effectively than individuals pulling in different directions.</p><p>From a leadership perspective (for tech leads and managers), think about how to <strong>integrate AI training and guidelines</strong>. Possibly set aside time for team members to experiment and share findings (hack days or lightning talks on AI tools). Also, decide as a team how to handle licensing or IP concerns of AI-generated code - e.g., code generation tools have different licenses or usage terms. Ensure compliance with those and any company policies (some companies restrict use of public AI services for proprietary code - in that case, perhaps you invest in an internal AI solution or use open-source models that you can run locally to avoid data exposure).</p><p>In short, <strong>treat AI adoption as a team sport</strong>. Everyone should be rowing in the same direction and using roughly compatible tools and approaches, so that the codebase remains maintainable and the benefits are multiplied across the team. AI-nativeness at an organization level can become a strong competitive advantage, but it requires alignment and collective learning.</p><h3><strong>6. Use AI responsibly and ethically</strong></h3><p>Last but certainly not least, always use AI responsibly. This encompasses a few things:</p><ul><li><p><strong>Privacy and security:</strong> Be mindful of what data you feed into AI services. If you&#8217;re using a hosted service like OpenAI&#8217;s API or an IDE plugin, the code or text you send might be stored or seen by the provider under certain conditions. For sensitive code (security-related, proprietary algorithms, user data, etc.), consider using self-hosted models or at least strip out sensitive bits before prompting. Many AI tools now have enterprise versions or on-prem options to alleviate this. Check your company&#8217;s policy: for example, a bank might forbid using any external AI for code. Anthropic&#8217;s enterprise guide suggests a three-pronged approach including process and tech to deploy AI safely. It&#8217;s your duty to follow those guidelines. Also, be cautious of phishing or malicious code - ironically, AI could potentially insert something if it were trained on malicious examples. So code review for security issues stays important.</p></li><li><p><strong>Bias and fairness:</strong> If AI helps generate user-facing content or decisions, be aware of biases. For instance, if you&#8217;re using AI to generate interview questions or analyze r&#233;sum&#233;s (just hypothetically), remember the models may carry biases from training data. In software contexts, this might be less direct, but imagine AI generating code comments or documentation that inadvertently uses non-inclusive language. You should still run such outputs through your usual processes for DEI (Diversity, Equity, Inclusion) standards. OpenAI&#8217;s guides on enterprise AI discuss ensuring fairness and checking model outputs for biased assumptions. As an engineer, if you see AI produce something problematic (even in a joke or example), don&#8217;t propagate it. We have to be the ethical filter.</p></li><li><p><strong>Transparency with AI usage:</strong> If part of your product uses AI (say, an AI-written response or a feature built by AI suggestions), consider being transparent with users where appropriate. This is more about product decisions, but it&#8217;s a growing expectation that users know when they&#8217;re reading content written by AI or interacting with a bot. From an engineering perspective, this might mean instrumenting logs to indicate AI involvement or tagging outputs. It could also mean putting guardrails: e.g., if an AI might free-form answer a user query in your app, put in checks or moderation on that output.</p></li><li><p><strong>Intellectual property (IP) concerns:</strong> The legal understanding is still evolving, but be cautious when using AI on licensed material. If you ask AI to generate code &#8220;like library X&#8221;, ensure you&#8217;re not inadvertently copying licensed code (the models sometimes regurgitate training data). Similarly, be mindful of attribution - if the AI produced a result influenced by a specific source, it won&#8217;t cite it unless prompted. For now, treating AI outputs as if they were your own work (with respect to licensing) is prudent - meaning you take responsibility as if you wrote it. Some companies even restrict using Copilot due to IP uncertainty for generated code. Keep an eye on updates in this area and when in doubt, consult with legal or stick to well-known algorithms.</p></li><li><p><strong>Managing expectations and human oversight:</strong> Ethically, engineers should prevent over-reliance on AI in critical areas where mistakes could be harmful (e.g., AI in medical software or autonomous driving). Even if you personally work on a simple web app, the principle stands: ensure there&#8217;s a human fallback for important decisions. For example, if AI summarizes a client&#8217;s requirements, have a human confirm the summary with the client. Don&#8217;t let AI be the sole arbitrator of truth in places where it matters. This responsible stance protects you, your users, and your organization.<br></p></li></ul><p>In sum, being an AI-native engineer also means being a <strong>responsible engineer</strong>. Our core duty to build reliable, safe, and user-respecting systems doesn&#8217;t change; we just have more powerful tools now. Use them in a way you&#8217;d be proud of if it was all written by you (because effectively, you are accountable for it). Many companies and groups (OpenAI, Google, Anthropic) have published guidelines and playbooks on responsible AI usage - those can be excellent further reading to deepen your understanding of this aspect (see the <strong>Further Reading</strong> section).</p><h3><strong>7. For Leaders and managers: cultivate an AI-First engineering culture</strong></h3><p>If you lead an engineering team, your role is not just to permit AI usage, but to <strong>champion it</strong> strategically. This means moving from passive acceptance to active cultivation by focusing on a few key areas:</p><ul><li><p><strong>Leading by example:</strong> Demonstrate how AI can be used for strategic tasks like planning or drafting proposals, and articulate a clear vision for how it will make the team and its products better. Model the learning process by openly sharing both your successes and stumbles with AI. An AI-native culture starts at the top and is fostered by authenticity, not just mandates.</p></li><li><p><strong>Investing in skills:</strong> Go beyond mere permission and actively provision resources for learning. Sponsor premium tool licenses, formally sanction time for experimentation (like hack days or exploration sprints), and create forums (demos, shared wikis) for the team to build a collective library of best practices and effective prompts. This signals that skill development is a genuine priority.</p></li><li><p><strong>Fostering psychological safety: </strong>Create an environment where engineers feel safe to experiment, share failures, and ask foundational questions without judgment. Explicitly address the fear of incompetence by framing AI adoption as a collective journey, and counter the fear of replacement by emphasizing how AI augments, rather than automates, the critical thinking and judgment that define senior engineering.</p></li><li><p><strong>Revisiting roadmaps and processes:</strong> Proactively identify which parts of your product or development cycle are ripe for AI-driven acceleration. Be prepared to adjust timelines, estimation, and team workflows to reflect that the nature of engineering work is shifting from writing boilerplate to specifying, verifying, and integrating. Evolve your code review process to place a higher emphasis on the critical human validation of AI-generated outputs.</p></li></ul><div><hr></div><p>Following these best practices will help ensure that your integration of AI into engineering yields positive results - higher productivity, better code, faster learning - without the downsides of sloppy usage. It&#8217;s about combining the best of what AI can do with the best of what <strong>you</strong> can do as a skilled human. The next and final section will conclude our discussion, reflecting on the journey to AI-nativeness and the road ahead, along with additional resources to continue your exploration.</p><h2><strong>Conclusion: Embracing the future</strong></h2><p>We&#8217;ve traveled through what it means to be an AI-native software engineer - from mindset, to practical workflows, to tool landscapes, to lifecycle integration, and best practices. It&#8217;s clear that the role of software engineers is evolving in tandem with AI&#8217;s growing capabilities. Rather than rendering engineers obsolete, AI is proving to be a powerful augmentation to human skills. By embracing an AI-native approach, you position yourself to <strong>build faster, learn more, and tackle bigger challenges</strong> than ever before.</p><p>To summarize a few key takeaways: being AI-native starts with seeing AI as a multiplier for your skills, not a magic black box or a threat. It&#8217;s about continuously asking, &#8220;How can AI help me with this?&#8221; and then judiciously using it to accelerate routine tasks, explore creative solutions, and even catch mistakes. It involves new skills like prompt engineering and agent orchestration, but also elevates the importance of timeless skills - architecture design, critical thinking, and ethical judgment - because those guide the AI&#8217;s application. The AI-native engineer is always learning: learning how to better use AI, and leveraging AI to learn other domains faster (a virtuous circle!).</p><p>Practically, we saw that there is a rich ecosystem of tools. There&#8217;s no one-size-fits-all AI tool - you&#8217;ll likely assemble a personal toolkit (IDE assistants, prototyping generators, etc.) tailored to your work. The best engineers will know when to grab which tool, much like a craftsman with a well-stocked toolbox. And they&#8217;ll keep that toolbox up-to-date as new tools emerge. Importantly, AI becomes a collaborative partner across all stages of work - not just coding, but writing tests, debugging, generating documentation, and even brainstorming in the design phase. The more areas you involve AI, the more you can focus your unique human talents where they matter most.</p><p>We also stressed caution and responsibility. The excitement of AI&#8217;s capabilities should be balanced with healthy skepticism and rigorous verification. By following best practices - clear prompts, code reviews, small iterative steps, staying aware of limitations - you can avoid pitfalls and build trust in using AI. As an experienced professional (especially if you are an IC or tech lead, as many of you are), you have the background to guide AI effectively and to mitigate its errors. In a sense, your experience is more valuable than ever: junior engineers can get a boost from AI to produce mid-level code, but it takes a senior mindset to prompt AI to solve complex problems in a robust way and to integrate it into a larger system gracefully.</p><p>Looking ahead, one can only anticipate that AI will get more powerful and more integrated into the tools we use. Future IDEs might have AI running continuously, checking our work or even optimizing code in the background. We might see specialized AIs for different domains (AI that is an expert in frontend UX vs one for database tuning). Being AI-native means you&#8217;ll adapt to these advancements smoothly - you&#8217;ll treat it as a natural progression of your workflow. Perhaps eventually &#8220;AI-native&#8221; will simply be <em>&#8220;software engineer&#8221;</em>, because using AI will be as ubiquitous as using Stack Overflow or Google is today. Until then, those who pioneer this approach (like you, reading and applying these concepts) will have an edge.</p><p>There&#8217;s also a broader impact: By accelerating development, AI can free us to focus on more ambitious projects and more creative aspects of engineering. It could usher in an era of rapid prototyping and experimentation. As I&#8217;ve mused in one of my pieces, we might even see a shift in <em>who</em> builds software - with AI lowering barriers, more people (even non-traditional coders) could bring ideas to life. As an AI-native engineer, you might play a role in enabling that, by building the tools or by mentoring others in using them. It&#8217;s an exciting prospect: engineering becomes more about imagination and design, while repetitive toil is handled by our AI assistants.</p><p>In closing, adopting AI in your daily engineering practice is not just a one-time shift, but a journey. Start where you are: try one new tool or apply AI to one part of your next task. Gradually expand that comfort zone. Celebrate the wins (like the first time an AI-generated test catches a bug you missed), and learn from the hiccups (maybe the time AI refactoring broke something - it&#8217;s a lesson to improve prompting).</p><p>Encourage your team to do the same, building an AI-friendly engineering culture. With pragmatic use and continuous learning, you&#8217;ll find that AI not only boosts your productivity but can also rekindle joy in development - letting you concentrate on creative problem-solving and seeing faster results from idea to reality.</p><p>The era of AI-assisted development is here, and those who skillfully ride this wave will define the next chapter of software engineering. By reading this and experimenting on your own, you&#8217;re already on that path. Keep going, stay curious, and code on - with your new AI partners at your side.</p><h2><strong>Further reading</strong></h2><p>To deepen your understanding and keep improving your AI-assisted workflow, here are some excellent free guides and resources from leading organizations. These cover everything from prompt engineering to building agents and deploying AI responsibly:</p><ul><li><p><strong><a href="https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf">Google - Prompting Guide 101 (Second Edition)</a></strong> - A quick-start handbook for writing effective prompts, packed with tips and examples for Google&#8217;s Gemini model. Great for learning prompt fundamentals and how to phrase queries to get the best results.</p></li><li><p><strong><a href="https://www.kaggle.com/whitepaper-prompt-engineering">Google - &#8220;More Signal, Less Guesswork&#8221; prompt engineering whitepaper</a></strong> - A 68-page Google whitepaper that dives into advanced prompt techniques (for API usage, chain-of-thought prompts, using temperature/top-p settings, etc.). Excellent for engineers looking to refine their prompt engineering beyond the basics.</p></li><li><p><strong><a href="https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf">OpenAI - </a></strong><em><strong><a href="https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf">A Practical Guide to Building Agents</a></strong></em> - OpenAI&#8217;s comprehensive guide (~34 pages) on designing and implementing AI agents that work in real-world scenarios. It covers agent architectures (single vs multi-agent), tool integration, iteration loops, and important safety considerations when deploying autonomous agents.</p></li><li><p><strong><a href="https://www.anthropic.com/engineering/claude-code-best-practices">Anthropic - </a></strong><em><strong><a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code: Best Practices for Agentic Coding</a></strong></em> - A guide from Anthropic&#8217;s engineers on getting the most out of Claude (their AI) in coding scenarios. It includes tips like structuring your repo with a CLAUDE.md for context, prompt formats for debugging and feature building, and how to iteratively work with an AI coding agent. Useful for anyone using AI in an IDE or planning to integrate an AI agent with their codebase.</p></li><li><p><strong><a href="https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf">OpenAI - </a></strong><em><strong><a href="https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf">Identifying and Scaling AI Use Cases</a></strong></em> - This guide helps organizations (and teams) find high-leverage opportunities for AI and scale them effectively. It introduces a methodology to identify where AI can add value, how to prototype quickly, and how to roll out AI solutions across an enterprise sustainably. Great for tech leads and managers strategizing AI adoption.</p></li><li><p><strong><a href="https://assets.anthropic.com/m/66daaa23018ab0fd/original/Anthropic-enterprise-ebook-digital.pdf">Anthropic - </a></strong><em><strong><a href="https://assets.anthropic.com/m/66daaa23018ab0fd/original/Anthropic-enterprise-ebook-digital.pdf">Building Trusted AI in the Enterprise</a></strong></em><strong><a href="https://assets.anthropic.com/m/66daaa23018ab0fd/original/Anthropic-enterprise-ebook-digital.pdf"> (Trust in AI)</a></strong> - An enterprise-focused e-book on deploying AI responsibly. It outlines a three-dimensional approach (people, process, technology) to ensure AI systems are reliable, secure, and aligned with organizational values. It also devotes sections to AI security and governance best practices - a must-read for understanding risk management in AI projects.</p></li><li><p><strong><a href="https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf">OpenAI - </a></strong><em><strong><a href="https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf">AI in the Enterprise</a></strong></em><a href="https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf"> </a>- OpenAI&#8217;s 24-page report on how top companies are using AI and lessons learned from those collaborations. It provides strategic insights and case studies, including practical steps for integrating AI into products and operations at scale. Useful for seeing the bigger picture of AI&#8217;s business impact and getting inspiration for high-level AI integration</p></li><li><p><strong><a href="https://www.kaggle.com/whitepaper-agent-companion">Google - </a></strong><em><strong><a href="https://www.kaggle.com/whitepaper-agent-companion">Agents Companion</a></strong></em><strong><a href="https://www.kaggle.com/whitepaper-agent-companion"> Whitepaper</a></strong> - Google&#8217;s advanced &#8220;102-level&#8221; technical companion to their prompting guide, focusing on AI agents. This guide explores complex topics like agent evaluation, tool use, and orchestrating multiple agents. It&#8217;s a deep dive for developers looking to push the envelope with agent development and deployment - essentially a toolkit for advanced AI builders.</p></li></ul><p>Each of these resources can help you further develop your AI-native engineering skills, offering both theoretical frameworks and practical techniques. They are all freely available (no paywalls), and reading them will reinforce many of the concepts discussed in this section while introducing new insights from industry experts. </p><p>Happy learning, and happy building!</p><p><em>I&#8217;m excited to share I&#8217;m writing a new <a href="https://www.oreilly.com/library/view/vibe-coding-the/9798341634749/">AI-assisted engineering book</a> with O&#8217;Reilly. If you&#8217;ve enjoyed my writing here you may be interested in checking it out.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WFGE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WFGE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 424w, https://substackcdn.com/image/fetch/$s_!WFGE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 848w, https://substackcdn.com/image/fetch/$s_!WFGE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 1272w, https://substackcdn.com/image/fetch/$s_!WFGE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WFGE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4499366,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://addyo.substack.com/i/165160941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WFGE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 424w, https://substackcdn.com/image/fetch/$s_!WFGE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 848w, https://substackcdn.com/image/fetch/$s_!WFGE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 1272w, https://substackcdn.com/image/fetch/$s_!WFGE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faba8cf11-c1d1-4cb8-8400-1fa7b7b91d83_5246x5246.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item></channel></rss>