The Batch Size Problem: Why Your Team’s Speed vs. Quality Debate Is Asking the Wrong Question
Back to Blog

The Batch Size Problem: Why Your Team’s Speed vs. Quality Debate Is Asking the Wrong Question

David Liu

David Liu

·8 min read

There’s a version of this conversation happening in every engineering org right now. A product manager says “we need to move faster.” An engineer says “if we move faster, we’ll break things.” Someone draws a Venn diagram. Nothing gets resolved. Everyone leaves the meeting having confirmed their priors.

The framing is wrong. Speed and quality aren’t on opposite ends of a slider. The real variable is batch size — how much you try to deliver at once before getting feedback. And almost every team that thinks they have a speed problem actually has a batch size problem.

The Waterfall Hangover

We declared waterfall dead years ago. But the instinct behind it — design everything, build everything, then ship everything — is remarkably persistent. It shows up as two-week sprints where nothing gets deployed until the last day. It shows up as “we’ll combine these three features into one release.” It shows up as AI products where the team spends four months building an evaluation harness before a single user ever touches the model.

The psychological appeal is real. Big batches feel safer. You can test more before you ship. You can coordinate the announcement. You have more to show. But every one of those benefits is an illusion that collapses on contact with actual users.

Big batches mean long feedback cycles. Long feedback cycles mean you’ve been optimizing for the wrong thing for weeks. By the time you find out, you’ve built a lot of things that need to be unbuit.

What Incremental Delivery Actually Means

Incremental delivery is not “ship it broken and call it beta.” That’s just a different kind of recklessness dressed up in lean vocabulary.

Real incremental delivery means breaking work into the smallest unit that delivers genuine value to a real user — not the smallest unit that’s technically completable. The distinction matters enormously.

A login page with no backend is not an increment. A feature behind a flag that three beta users can actually use is an increment. A rough version of your AI copilot that handles one specific workflow badly is an increment. A polished demo that lives only in Figma is not.

The test is: could someone in the world get value from this today, even if it’s imperfect? If yes, you have an increment worth shipping. If no, you’re still building in the dark.

Why AI Products Break This More Than Most

Building AI products has created a new category of batching mistake that most teams haven’t fully reckoned with yet.

Traditional software has deterministic behavior. You build a thing, you test the thing, the thing mostly works the way it did when you tested it. AI products don’t work that way. The model’s behavior is probabilistic and context-dependent. Your evaluation suite, no matter how comprehensive, will not capture what happens when real users bring real variation to your system.

This means the feedback loop between “code shipped” and “understood behavior” is uniquely long for AI products — unless you’re getting real usage data. Teams that batch their AI releases are essentially flying blind and then wondering why their launch performance doesn’t match their pre-launch evaluations.

The fix is boring but it works: ship narrow. Ship one capability to a small cohort. Watch what happens. Learn. Ship the next thing. The teams building AI products that actually work aren’t the ones with the most sophisticated pre-launch test suites. They’re the ones with the tightest post-ship learning loops.

The Hidden Cost of Large Releases

There’s a coordination tax that almost nobody accounts for when planning a big release. Every person on the team knows there’s a release coming. That knowledge shapes behavior in ways that slow everything down.

Engineers wait to raise concerns until closer to the date because “we can handle that in the next cycle.” Product managers scope out feedback that would require significant rework because the timeline can’t absorb it. Designers polish things that won’t matter because the feature turns out to be wrong. QA concentrates test effort at the end, when it’s least useful.

There’s also the political dimension. Large releases attract executive attention. Executive attention invites scope changes. Scope changes late in a cycle are expensive. You end up with bloated releases that represent months of negotiation rather than a clear point of view.

Small releases are boring in the best possible way. There’s nothing to get excited about, so there’s nothing to meddle with. The feature goes out. Users react. The team responds. Repeat.

Making Incremental Work in Practice

The practical obstacles to incremental delivery are real and worth addressing directly rather than just repeating “small batches good” at engineering all-hands.

The design completeness problem. Many designers are trained to think in flows, not increments. They want the full user journey before anything ships. This is understandable — partial experiences can feel confusing or broken. The resolution isn’t to rush design. It’s to get alignment early on what the “minimum complete experience” looks like for an increment, and to design that deliberately rather than treating it as a compromise.

The dependency problem. Sometimes features are genuinely coupled in ways that make incremental delivery feel impossible. The backend isn’t ready. The data model needs to change. The new component depends on a refactor that’s still in progress. When this happens repeatedly, it’s usually a signal that your architecture has become too tightly coupled — not that incremental delivery doesn’t apply to you. The solution is usually some combination of feature flags, abstraction layers, and a willingness to ship capabilities that aren’t yet wired to UI.

The customer expectations problem. B2B products in particular often operate under contracts and roadmap commitments that make incremental delivery feel like breaking promises. But there’s an important distinction between delivering value incrementally and communicating commitments incrementally. You can commit to delivering a capability by a date while still shipping the underlying components as you build them. Your customer benefits from earlier access to partial functionality. You benefit from early feedback. Everyone wins, but you have to have the conversation explicitly rather than assuming customers expect a big bang delivery.

Feature Flags Are Not Optional

If your team doesn’t have feature flags, everything else in this post is theoretical. Feature flags are the infrastructure that makes incremental delivery practical.

They let you merge code before it’s user-visible. They let you expose new capabilities to internal users first, then beta users, then everyone. They let you roll back immediately if something goes wrong, without a code deployment. They let you A/B test behavior at the system level rather than just the UI level — which matters enormously for AI products where the “design” of the system includes model parameters and prompt architecture.

Modern flag systems (LaunchDarkly, Statsig, Growthbook if you want open source) are not expensive relative to the problems they solve. If you’re treating feature flags as a nice-to-have, you’re leaving a major lever unpulled.

The Feedback Discipline

Incremental delivery without a feedback discipline is just releasing things more often and not learning any faster. The second half of the equation is what you do with the signal you’re generating.

This means having a real answer to: who looks at what data after a release, on what cadence, and with what authority to act on what they find? Many teams ship incrementally but then route the resulting feedback into a quarterly planning process that can’t respond to it quickly. The release cadence and the decision cadence need to match.

For AI products specifically, this means instrumenting not just engagement metrics but quality metrics. What percentage of responses are the user editing or ignoring? What’s the distribution of failure modes in production versus your eval suite? Where are users abandoning the AI-assisted workflow in favor of doing it manually? These signals are only valuable if someone is watching them and empowered to act.

Shipping Fast and Shipping Right Are the Same Skill

The teams that ship fastest over long time horizons are not the ones that cut corners on quality. They’re the ones with the smallest batch sizes, the tightest feedback loops, and the most ruthless prioritization of what actually needs to be right before anything ships.

They move fast because they’ve gotten good at deciding what small thing to build next. They maintain quality because they get feedback quickly enough to course-correct before problems compound. The two things reinforce each other rather than trading off.

If your team is in a chronic slow-vs-quality debate, the answer usually isn’t to pick a side. It’s to examine the batch size, find where the feedback loop is breaking down, and fix that. The argument tends to dissolve on its own once you’re actually learning fast enough to act on what you’re learning.

That’s what incremental delivery is actually about. Not shipping broken things quickly. Shipping the right small thing, learning from it, and doing it again — fast enough that you’re always building on real information rather than assumptions that have been drifting away from reality for months.