The Context-First Product Development Framework
The Context-First Product Development Framework: How to Ship Features That Actually Get Used
90% of features ship and die. This framework fixes the 3 decision points where most teams lose the plot.
Read time: 14 minutes. Use time: years.
Why This Exists
Most product teams have the same workflow. Someone has an idea. They write a spec. Engineering builds it. They ship it. Nobody uses it.
This is not a prioritization problem. Prioritization frameworks assume you already know what to build. The real failure happens earlier, at the point where you decide what deserves a spec in the first place.
Teams that consistently ship features people use do three things differently. They validate the problem before they validate the solution. They define success before they start building. And they build decision checkpoints into the process so they can kill bad ideas before they consume a sprint.
This framework codifies that process. It is the exact system we use at ProductOS to go from signal to shipped feature without the usual 60% waste rate. Every section is standalone. If you stop reading after the first framework, you still walk away with something usable.
How to Use This
- Read the full framework once to understand the three phases
- Print or bookmark the decision checkpoint tables. These are the artifacts your team will reference weekly
- Start with Phase 1 on your next feature idea. Do not skip ahead
- Run the retrospective template after your first shipped feature to calibrate the framework to your team
Phase 1: Signal Validation (Before You Write a Single Line of Spec)
Most teams skip this phase entirely. Someone in a meeting says "we should build X" and two weeks later there is a Jira ticket. Signal validation is the 48-hour process that determines whether an idea deserves any further investment.
The Signal Strength Matrix
Every feature idea arrives as a signal. A customer request. A competitor launch. A metric dip. An internal intuition. The problem is that teams treat all signals equally.
The Signal Strength Matrix forces you to score each signal across four dimensions before it enters your backlog:
| Dimension | Score 1 (Weak) | Score 3 (Moderate) | Score 5 (Strong) |
|---|---|---|---|
| Frequency | One customer mentioned it once | 3-5 customers mentioned it in the last quarter | 10+ customers asked for it unprompted |
| Pain Severity | Nice to have, current workaround exists | Workaround exists but costs time or money | No workaround, users are stuck or churning |
| Strategic Fit | Tangential to our core value prop | Adjacent to our core value prop | Directly strengthens our core value prop |
| Effort Clarity | We have no idea how to build this | We have a rough sense of the approach | We know exactly what to build and how long it takes |
Scoring rules:
- Minimum score to proceed to Phase 2: 12 out of 20
- Any single dimension scoring 1 is an automatic hold, regardless of total score
- Frequency and Pain Severity are weighted 2x because they predict adoption
The 5-Question Signal Test
Before you score anything, run these five questions. They take 10 minutes and save weeks.
Question 1: Can I name 3 specific users who have this problem today?
Not "our enterprise customers" or "power users." Actual names. If you cannot name three, you do not understand the problem well enough to build for it.
Question 2: What are they doing instead right now?
The workaround tells you everything. If the workaround is "they use Excel," you are competing with the most flexible tool ever built. If the workaround is "they just do not do it," you have a greenfield opportunity.
Question 3: What would they stop using if we built this?
This question reveals whether you are adding value or adding complexity. If the answer is "nothing, they would use this AND everything else," you are building a feature that adds cognitive load without removing it.
Question 4: How would I know this feature succeeded 30 days after launch?
If you cannot define the metric, you cannot justify the investment. "Users like it" is not a metric. "DAU of the feature exceeds 40% of total DAU within 30 days" is a metric.
Question 5: What is the smallest version of this that would be useful?
This question prevents overbuilding. The smallest useful version is usually 30% of what you initially imagined. Start there.
Decision Checkpoint 1: Build or Kill
After scoring the Signal Strength Matrix and answering the five questions, you have enough information to make a go/no-go decision.
Go criteria:
- Signal score is 12 or above
- You can name at least 3 users with the problem
- You can define a 30-day success metric
- The smallest useful version takes less than 2 sprints
Kill criteria:
- Signal score is below 12
- You cannot name specific users
- The only evidence is a competitor built it
- The smallest version still takes more than 4 sprints
Hold criteria (revisit in 30 days):
- Signal score is 10-12
- Evidence is growing but not sufficient
- Strategic fit is strong but frequency is low
Write down the decision. Put it in a shared document. Teams that make verbal decisions revisit them endlessly. Teams that write decisions down move forward.
Phase 2: Solution Definition (Before You Open Figma or Your IDE)
You have validated that the problem is real, frequent, and painful. Now you need to define the solution without overbuilding it. This is where 80% of product waste happens. Teams go from validated problem to fully designed solution in one leap, skipping the critical step of defining what "done" looks like.
The Outcome-First Spec
Traditional specs start with features and work backward to outcomes. This is backwards. The Outcome-First Spec starts with three statements:
Statement 1: After this ships, users will be able to [specific action] that they cannot do today.
Statement 2: We will know it worked when [metric] changes by [amount] within [timeframe].
Statement 3: We are explicitly NOT building [list of related things] because [reason].
These three statements fit on an index card. If your spec cannot be summarized this way, it is not ready for engineering.
The Scope Ladder
Every feature has four possible scope levels. Most teams default to Level 3 or 4 without considering whether Level 1 or 2 would ship faster and validate the hypothesis.
Level 1 – Manual: Can you deliver this value manually for the first 10 users? If yes, do that first. You learn faster and it costs almost nothing.
Level 2 – Semi-automated: Build the core logic but keep the edges manual. Use admin tools instead of user-facing UI. Ship in days, not weeks.
Level 3 – Automated: Full user-facing feature with proper UI, error handling, and edge cases. This is what most teams build by default. It should be what you build after Level 1 or 2 validates the approach.
Level 4 – Polished: Onboarding flows, empty states, analytics dashboards, A/B test infrastructure. Only build this for features that have already proven product-market fit at Level 3.
The rule: always start one level lower than your instinct says.
The Dependency Map
Before engineering starts, map every dependency. Not just technical dependencies. Organizational ones.
Ask these questions:
- Does this require another team to change their API or data model?
- Does this require a design system component that does not exist yet?
- Does this require a data pipeline that is not built?
- Does this require legal or compliance review?
- Does this require customer communication or documentation?
Each dependency adds 1-3 weeks to your timeline. Most teams discover dependencies during the sprint, not before it. This is why sprints slip.
Decision Checkpoint 2: Scope Lock
Before engineering begins, the team locks scope using this template:
We are building: [one sentence]
We are NOT building: [three things]
Success metric: [one number]
Target date: [date]
Scope level: [1-4]
Known dependencies: [list]
Kill trigger: [what metric at what threshold causes us to stop]
Every person on the team signs off on this document. If scope changes after this point, the change goes through the same checkpoint process as a new feature.
Phase 3: Build and Validate (During and After Development)
The feature is scoped. Engineering is building. Most frameworks stop here. But the third phase is where the real value happens, because most features need adjustment after launch and teams rarely have a structured process for deciding what to adjust versus what to kill.
The Daily Signal Check
During development, the product manager runs a daily 5-minute check:
- Has any new information emerged that changes our assumptions? (Customer feedback, market shift, competitor launch)
- Are we on track for the target date? If not, what scope can we cut?
- Is the team blocked on a dependency we identified? Is there one we missed?
This is not a standup. It is a private check that the PM does before standup. It prevents the common failure mode where a team builds for two weeks on outdated assumptions.
The 72-Hour Post-Launch Protocol
Most teams ship and move on. The 72-hour protocol forces structured observation:
Hours 0-24: Watch raw usage data. Do not interpret. Do not react. Just observe. How many users tried the feature? Where did they drop off? What errors occurred?
Hours 24-48: Talk to 3 users who tried the feature. Not a survey. A conversation. Ask: What did you expect to happen? What actually happened? Would you use this again tomorrow?
Hours 48-72: Compare actual metrics to your success metric. Write a one-paragraph assessment: is this feature tracking toward the 30-day goal, or not?
The 30-Day Verdict
At 30 days post-launch, the feature faces one of four outcomes:
Double Down: The success metric exceeded the target. Invest in Level 4 polish. Add onboarding, documentation, and promotional effort.
Iterate: The success metric is within 50-100% of target. The feature is directionally correct but needs adjustment. Identify the single biggest friction point and fix it in one sprint.
Maintain: The feature met a minimum threshold but is not growing. Keep it live but do not invest further. Some features are utilities, not growth drivers. That is fine.
Kill: The feature is below 50% of the success metric at 30 days. Sunset it. Write a one-page retrospective documenting what you learned. This is not failure. This is the system working.
Decision Checkpoint 3: Retrospective
After the 30-day verdict, run this retrospective:
- What did we believe about the user problem? Were we right?
- What scope level did we choose? Was it the right one?
- What dependencies surprised us?
- How accurate was our effort estimate?
- Would we make the same build/kill decision at Checkpoint 1 knowing what we know now?
The answers feed back into your Signal Strength Matrix calibration. Over time, your team gets better at predicting which signals deserve investment.
The Full Framework at a Glance
Phase 1 – Signal Validation (48 hours):
- Score the Signal Strength Matrix
- Run the 5-Question Signal Test
- Decision Checkpoint 1: Build, Kill, or Hold
Phase 2 – Solution Definition (3-5 days):
- Write the Outcome-First Spec (3 statements)
- Choose the Scope Level (1-4)
- Map dependencies
- Decision Checkpoint 2: Scope Lock
Phase 3 – Build and Validate (during + 30 days post):
- Daily Signal Check during development
- 72-Hour Post-Launch Protocol
- 30-Day Verdict: Double Down, Iterate, Maintain, or Kill
- Decision Checkpoint 3: Retrospective
Common Pitfalls
Skipping Phase 1 because you "already know" the problem. Intuition is a signal, not a decision. Even experienced PMs are wrong 40% of the time about what users want. The 48-hour validation exists to protect you from your own conviction.
Defaulting to Scope Level 3. Building the full feature before validating the approach is the most expensive way to learn. Start at Level 1 or 2. You can always upgrade scope. You cannot un-build what you already shipped.
Not defining the kill trigger. Without a kill trigger, zombie features live forever, consuming maintenance effort and adding complexity to your product. Define what failure looks like in numbers before you start building.
Running retrospectives as blame sessions. The retrospective is a calibration tool, not a performance review. The question is not "who made the wrong call" but "what information would have changed the call."
Treating the framework as bureaucracy. Signal Validation takes 48 hours. Solution Definition takes 3-5 days. This is not overhead. This is the work. Teams that skip it spend 6-8 weeks building the wrong thing instead of 1 week validating the right thing.
Ignoring the Hold outcome. Not every idea is a Build or Kill. Some ideas are too early. The market is not ready, the infrastructure is not there, or the signal is growing but not strong enough. Hold is a valid decision. Revisit in 30 days.
Celebrating launches instead of outcomes. Shipping is not the goal. Usage is the goal. Teams that celebrate launches incentivize output. Teams that celebrate the 30-day verdict incentivize outcomes.
Why We Built This
At ProductOS, we spent two years building features nobody used. Not because we were bad at execution, but because we were bad at deciding what to execute.
Coding is becoming cheaper. Knowing what to build is becoming more valuable. Cursor, Lovable, Bolt, v0 can all generate code. None of them help you decide whether the code should exist in the first place.
This framework is the decision layer that sits before design and development. It is the difference between shipping 10 features where 2 succeed and shipping 4 features where 3 succeed.
If any of this lands and you want to see it in action, we are at productos.dev. No pressure. The framework stands on its own.
If you would rather have humans plus AI run this for you on a real product today, that is what 1Labs AI does.
Built by Heemang Parmar, Founder and CEO of ProductOS. 10+ years in product, 150+ builds. Also runs 1Labs AI, an AI product development agency.