Design FMEA (DFMEA)
Scope
The product, system, or service itself
Core question
What could fail about how this is designed, and what happens when it does?
When
Pre-launch β during product or service design, before tooling, sourcing, or build
Failure Mode and Effects Analysis β the structured method for predicting what can break, how badly, and what to do about it. With practical scoring guidance, the AIAG-VDA Action Priority approach, and how AI agents make FMEA a continuous practice.
FMEA β Failure Mode and Effects Analysis β is a structured method for anticipating what can fail about a product, process, or service before it does. Developed by the US military in the late 1940s and formalized by NASA in the 1960s, FMEA became standard practice in automotive and aerospace through the Ford / AIAG handbook, and is now used across manufacturing, pharma, medical devices, financial services, and software.
Three variants are in common use today: Design FMEA, Process FMEA, and Service/Software FMEA. All three follow the same scoring logic β rate Severity, Occurrence, and Detection of each failure mode on a 1β10 scale, then prioritize.
Scope
The product, system, or service itself
Core question
What could fail about how this is designed, and what happens when it does?
When
Pre-launch β during product or service design, before tooling, sourcing, or build
Scope
The process used to produce, deliver, or support
Core question
What could fail in how we run this process, and what happens when it does?
When
Process design and continuous improvement β every step, every hand-off, every decision point
Scope
Service flows or software systems
Core question
What can fail in customer interaction, system integration, or data flow?
When
SaaS reliability, customer experience, regulatory operations, fintech transaction flows
Anchored scoring guidance β calibrate the team before scoring or you'll get noise.
How bad is the failure when it happens?
How often does the failure happen?
How well do current controls catch the failure before the customer does?
Risk Priority Number is the classic prioritization number. It's useful β and misleading on its own.
Risk Priority Number = Severity Γ Occurrence Γ Detection. Range 1 to 1000. Higher means more attention. Used to rank failure modes for action.
Two failure modes can score the same RPN (e.g., 8Γ3Γ3 vs 3Γ8Γ3) but require very different responses. A high-severity failure that's rare and undetectable is a different risk than a low-severity failure that's common and obvious.
Modern automotive and aerospace FMEA uses Action Priority (High / Medium / Low) instead of a single RPN, with severity as the dominant factor. Severity 9β10 = always High AP, regardless of O and D. This forces the team to act on catastrophic failures even when they're rare.
RPN for ranking the middle of the distribution. Action Priority for the top of it. Severity 10 with O=1, D=1 still gets attention.
Failure modes from incident history. Occurrence from event data. Detection from actual control performance.
Agents pull every past incident, ticket, SAR, defect, or escalation tied to the process and propose the failure modes that have actually happened β not just the ones a team can recall in a workshop.
Instead of guessing 'maybe 1 in 100,' agents compute the actual occurrence rate from the event log. Severity and Detection still need human judgment; Occurrence becomes a measurement.
Agents test current process controls against the historical event log to measure how often each control actually caught its target failure. Detection scores stop being aspirational.
Traditional FMEAs go stale six months after the workshop. Agent-maintained FMEAs re-score Occurrence and Detection as the process runs, flag new failure modes as they appear, and re-prioritize Action Priority continuously.
FMEA stands for Failure Mode and Effects Analysis. It's a structured method for anticipating what can go wrong in a product, process, or service, evaluating how bad each failure would be, how often it might happen, and how well current controls would catch it. FMEA was developed by the US military in the late 1940s, formalized by NASA in the 1960s, and adopted into the automotive industry through the Ford / AIAG standard in the 1980s. The current canonical reference is the AIAG-VDA FMEA Handbook (2019).
Design FMEA (DFMEA) analyzes failures of the product or system itself: what could fail about the design? Process FMEA (PFMEA) analyzes failures of the process that produces or delivers it: what could fail in how we run this? You usually need both. DFMEA happens during design; PFMEA happens during process planning and ongoing improvement. Service-industry teams typically run PFMEA-style analyses on customer journeys, claims flows, or loan operations.
RPN = Severity Γ Occurrence Γ Detection, all rated 1β10. Range is 1 to 1000. There's no universal cutoff β teams typically act on the top decile of RPNs for a given FMEA, or on anything with Severity β₯ 9 regardless of RPN. The AIAG-VDA standard has moved away from RPN cutoffs to Action Priority (H/M/L) categories, which weight severity more heavily and avoid the math trap where two very different failures score the same RPN.
FMEA appears in two DMAIC phases. In Analyze, it helps identify which failure modes the project should target. In Improve and Control, it documents how the redesigned process will be protected against the failures the team learned about. FMEA also pairs with fishbone diagrams (which enumerate causes) and 5 Whys (which drill them) β FMEA quantifies and prioritizes; the other tools generate the candidates.
Three biggest changes. First, failure modes get seeded from real incident history, not workshop memory. Second, Occurrence scores become measurements from the event log rather than estimates. Third, the FMEA stops being a one-time artifact β agents re-score it continuously as the process runs and surface new failure modes the moment they appear in production. The Severity scores still require human and regulatory judgment, which is where they belong.
Fishbone enumerates causes; FMEA quantifies their risk. Use them in sequence.
Once FMEA identifies the failures to watch, SPC monitors them continuously in production.
FMEA shows up in Analyze and Control. See the full DMAIC workflow with AI augmentation.
Bring a process FMEA that's gone stale, or one you've never run. We'll seed it with real incident data, score Occurrence from your event log, and stand up continuous monitoring against the top failure modes.