We Rebuilt Our Feature Flag System From Scratch. It Was Worth Every Hour.

David Liu

April 4, 2026·7 min read

Six months ago, our feature flag system was a mess. Boolean toggles scattered across environment variables, a shared Google Sheet tracking which flags were “on” in production, and a Slack channel where engineers would announce—sometimes—when they flipped something.

It worked. Barely. Until it didn’t.

The Breaking Point

The incident was predictable in retrospect. A flag meant for our internal beta group got enabled for everyone. Not gradually. Not behind a percentage rollout. Just… on. For all 12,000 active users. On a Friday afternoon.

The feature wasn’t broken—it was just unfinished. Half the copy was placeholder text. The onboarding flow dead-ended at a screen that said “TODO: add confirmation step.” Users were confused. Support tickets spiked. Our NPS dipped three points in a week.

The following Monday, we decided to rebuild the entire system. Here’s what we learned.

What We Actually Needed (vs. What We Thought We Needed)

Our first instinct was to evaluate third-party tools. LaunchDarkly, Split, Unleash, Flipt—the market has no shortage of options. We spent a week running evaluations.

But the more we talked to our team, the clearer it became that our problem wasn’t technical. It was organizational. We didn’t need a better toggle mechanism. We needed a system that encoded our release philosophy.

Here’s what that meant in practice:

Flags should have owners. Every flag has exactly one person responsible for its lifecycle. Not a team. A person.
Flags should expire. A flag without a removal date is technical debt with a countdown timer. We set a maximum lifetime of 90 days.
Flags should have rollout plans. Not just on/off, but a documented sequence: internal → beta → 10% → 50% → 100%.
Flag changes should be auditable. Who changed what, when, and why. No exceptions.

The Architecture We Landed On

We built a thin internal service—about 2,000 lines of TypeScript—that sits between our applications and a simple PostgreSQL table. Nothing fancy. The innovation, if you can call it that, is in the schema.

Each flag record contains:

{
  "key": "onboarding-v2",
  "owner": "sarah.chen@team.com",
  "created": "2026-01-15",
  "expires": "2026-04-15",
  "stage": "beta",
  "rollout_pct": 25,
  "rollout_plan": ["internal", "beta", "10pct", "50pct", "ga"],
  "current_step": 1,
  "description": "New onboarding flow with guided setup wizard",
  "jira_ticket": "PROD-1847",
  "kill_switch": true
}

The kill_switch field is important. When true, the flag can be instantly disabled from our incident response tooling without going through the normal change process. Every customer-facing flag has this enabled.

The Rollout Plan Pattern

The most impactful decision was making rollout plans a first-class concept. Before, “rolling out a feature” meant someone manually changing a percentage in a config file. Now, advancing a flag through its rollout plan is a single CLI command:

$ producto flags advance onboarding-v2
Current: beta (25%)
Next: 10pct (10% of all users)
Confirm? [y/N]

Each stage transition is recorded. If something goes wrong at 10%, we can see exactly when it was promoted and correlate with our error tracking. More importantly, rolling back is just as easy:

$ producto flags rollback onboarding-v2
Rolling back from 10pct → beta
Done. 10% cohort will see previous behavior on next request.

No one touches the database directly. No one edits environment variables. The CLI is the only interface, and it enforces our rules.

Expiration Is Non-Negotiable

This was the most controversial decision. Engineers pushed back hard. “What about long-lived feature flags? What about kill switches for critical paths?”

We held firm, with one concession: flags can be renewed. But renewal requires a brief written justification and approval from a tech lead. The friction is intentional.

Here’s why this matters: before the expiration policy, we had 147 active flags in production. We audited them and found that 89 were for features that had been fully shipped months ago. Thirty-two were for experiments that had concluded but never been cleaned up. The flags were still being evaluated on every request, adding latency and cognitive overhead to every debugging session.

Three months after implementing mandatory expiration, we’re down to 23 active flags. Our p99 API latency dropped 12ms. Not because the flag evaluation was slow—it wasn’t—but because removing dead flags let us simplify code paths that had accumulated years of conditional logic.

What We Got Wrong

It wasn’t all smooth. A few things we’d do differently:

We underestimated the migration effort. Moving from environment variables to the new system took three sprints, not the one sprint we planned. Every service had its own way of reading flags. Some used env vars. Some read from a YAML file. One particularly creative engineer had hard-coded flag values in a switch statement “for performance.”

We initially forgot about local development. Engineers need to test flag behavior locally. Our first version required a running instance of the flag service, which meant spinning up PostgreSQL just to work on a frontend component. We added a local override file (.flags.local.json) that takes precedence in development mode.

We didn’t build observability early enough. It took us a month to add proper metrics around flag evaluation—things like “how often is this flag checked?” and “what percentage of requests actually see the new behavior?” These should have been day-one features.

The Unexpected Benefits

Some outcomes we didn’t anticipate:

Product managers started using the system directly. Because flags have descriptions, rollout plans, and clear ownership, PMs can track feature rollout status without asking engineers. Our project management tool has fewer “what’s the status of X?” comments.

Incident response got faster. When something breaks, the first question is now “did any flags change recently?” The audit log answers this in seconds. Before, we’d spend 20 minutes in a war room trying to figure out if someone had changed a config.

Engineers think about releases differently. Having a structured rollout plan forces teams to think about gradual exposure before they write code. “How will we roll this out?” is now part of our design review template. Features that can’t be incrementally released get flagged (pun intended) early.

The Numbers

Six months in, here’s where we stand:

Active flags: 23 (down from 147)
Average flag lifetime: 34 days (down from “forever”)
Incidents caused by flag misconfiguration: 0 (down from ~2/quarter)
Time to roll back a flag: under 30 seconds
p99 latency improvement: 12ms (indirect, from code simplification)

Should You Build or Buy?

Honestly? For most teams, a managed service like LaunchDarkly is the right call. The operational overhead of running your own flag infrastructure is real, and the commercial tools are excellent.

We built our own because our problem was as much cultural as technical. We needed the build process itself to force conversations about how we release software. The tool is a byproduct of those conversations.

If your team already has good release discipline and just needs better tooling, buy. If your flag system is a symptom of deeper process issues—like ours was—building might force the conversations you’ve been avoiding.

Either way, the investment in getting feature flags right pays for itself quickly. The Friday afternoon incident that started this whole project? It cost us roughly $15,000 in support time and lost goodwill. The rebuild took about three engineer-weeks. The math works out.

The system isn’t perfect. We’re still iterating on the CLI, adding better integration with our CI/CD pipeline, and exploring flag dependencies (flag B should only be enabled if flag A is at 100%). But for the first time, our team trusts the release process. And that trust is worth more than any feature we’ve shipped this year.

Back to All Posts