FMEA: Failure Mode and Effects Analysis + AI-Augmented FMEA

What is FMEA?

FMEA — Failure Mode and Effects Analysis — is a structured method for anticipating what can fail about a product, process, or service before it does. Developed by the US military in the late 1940s and formalized by NASA in the 1960s, FMEA became standard practice in automotive and aerospace through the Ford / AIAG handbook, and is now used across manufacturing, pharma, medical devices, financial services, and software.

Three variants are in common use today: Design FMEA, Process FMEA, and Service/Software FMEA. All three follow the same scoring logic — rate Severity, Occurrence, and Detection of each failure mode on a 1–10 scale, then prioritize.

Three FMEA Variants

Design FMEA (DFMEA)

Scope

The product, system, or service itself

Core question

What could fail about how this is designed, and what happens when it does?

When

Pre-launch — during product or service design, before tooling, sourcing, or build

Process FMEA (PFMEA)

Scope

The process used to produce, deliver, or support

Core question

What could fail in how we run this process, and what happens when it does?

When

Process design and continuous improvement — every step, every hand-off, every decision point

Service / Software FMEA

Scope

Service flows or software systems

Core question

What can fail in customer interaction, system integration, or data flow?

When

SaaS reliability, customer experience, regulatory operations, fintech transaction flows

Severity, Occurrence, Detection

Anchored scoring guidance — calibrate the team before scoring or you'll get noise.

Severity (S)

How bad is the failure when it happens?

10 — Safety, regulatory, or compliance violation
7–9 — Major customer impact, financial loss, or significant rework
4–6 — Moderate impact, customer-visible degradation
2–3 — Minor inconvenience, rework within the team
1 — No noticeable effect

Occurrence (O)

How often does the failure happen?

10 — More than 1 in 2 cases
7–9 — 1 in 20 to 1 in 5
4–6 — 1 in 1,000 to 1 in 50
2–3 — 1 in 100,000 to 1 in 10,000
1 — Virtually never

Detection (D)

How well do current controls catch the failure before the customer does?

10 — No controls in place; customer is the detection mechanism
7–9 — Controls miss the failure most of the time
4–6 — Controls catch it sometimes
2–3 — Controls catch it almost every time
1 — Failure is caught with certainty by design

RPN, Action Priority, and the Trap to Avoid

Risk Priority Number is the classic prioritization number. It's useful — and misleading on its own.

What RPN means

Risk Priority Number = Severity × Occurrence × Detection. Range 1 to 1000. Higher means more attention. Used to rank failure modes for action.

Why RPN alone is a trap

Two failure modes can score the same RPN (e.g., 8×3×3 vs 3×8×3) but require very different responses. A high-severity failure that's rare and undetectable is a different risk than a low-severity failure that's common and obvious.

Action Priority (the AIAG-VDA update)

Modern automotive and aerospace FMEA uses Action Priority (High / Medium / Low) instead of a single RPN, with severity as the dominant factor. Severity 9–10 = always High AP, regardless of O and D. This forces the team to act on catastrophic failures even when they're rare.

Use both

RPN for ranking the middle of the distribution. Action Priority for the top of it. Severity 10 with O=1, D=1 still gets attention.

AI-Augmented FMEA

Failure modes from incident history. Occurrence from event data. Detection from actual control performance.

Seed the failure modes from history

Agents pull every past incident, ticket, SAR, defect, or escalation tied to the process and propose the failure modes that have actually happened — not just the ones a team can recall in a workshop.

Score Occurrence from real data

Instead of guessing 'maybe 1 in 100,' agents compute the actual occurrence rate from the event log. Severity and Detection still need human judgment; Occurrence becomes a measurement.

Detect Detection gaps

Agents test current process controls against the historical event log to measure how often each control actually caught its target failure. Detection scores stop being aspirational.

Keep the FMEA alive

Traditional FMEAs go stale six months after the workshop. Agent-maintained FMEAs re-score Occurrence and Detection as the process runs, flag new failure modes as they appear, and re-prioritize Action Priority continuously.

Frequently Asked Questions

What is FMEA and when did it originate?

FMEA stands for Failure Mode and Effects Analysis. It's a structured method for anticipating what can go wrong in a product, process, or service, evaluating how bad each failure would be, how often it might happen, and how well current controls would catch it. FMEA was developed by the US military in the late 1940s, formalized by NASA in the 1960s, and adopted into the automotive industry through the Ford / AIAG standard in the 1980s. The current canonical reference is the AIAG-VDA FMEA Handbook (2019).

Process FMEA vs Design FMEA — what's the difference?

Design FMEA (DFMEA) analyzes failures of the product or system itself: what could fail about the design? Process FMEA (PFMEA) analyzes failures of the process that produces or delivers it: what could fail in how we run this? You usually need both. DFMEA happens during design; PFMEA happens during process planning and ongoing improvement. Service-industry teams typically run PFMEA-style analyses on customer journeys, claims flows, or loan operations.

How is RPN calculated and what's a good threshold for action?

RPN = Severity × Occurrence × Detection, all rated 1–10. Range is 1 to 1000. There's no universal cutoff — teams typically act on the top decile of RPNs for a given FMEA, or on anything with Severity ≥ 9 regardless of RPN. The AIAG-VDA standard has moved away from RPN cutoffs to Action Priority (H/M/L) categories, which weight severity more heavily and avoid the math trap where two very different failures score the same RPN.

How does FMEA relate to DMAIC and the rest of Six Sigma?

FMEA appears in two DMAIC phases. In Analyze, it helps identify which failure modes the project should target. In Improve and Control, it documents how the redesigned process will be protected against the failures the team learned about. FMEA also pairs with fishbone diagrams (which enumerate causes) and 5 Whys (which drill them) — FMEA quantifies and prioritizes; the other tools generate the candidates.

How do AI agents change FMEA in practice?

Three biggest changes. First, failure modes get seeded from real incident history, not workshop memory. Second, Occurrence scores become measurements from the event log rather than estimates. Third, the FMEA stops being a one-time artifact — agents re-score it continuously as the process runs and surface new failure modes the moment they appear in production. The Severity scores still require human and regulatory judgment, which is where they belong.

FMEA

What is FMEA?

Three FMEA Variants

Design FMEA (DFMEA)

Process FMEA (PFMEA)

Service / Software FMEA

Severity, Occurrence, Detection

Severity (S)

Occurrence (O)

Detection (D)

RPN, Action Priority, and the Trap to Avoid

What RPN means

Why RPN alone is a trap

Action Priority (the AIAG-VDA update)

Use both

AI-Augmented FMEA

Seed the failure modes from history

Score Occurrence from real data

Detect Detection gaps

Keep the FMEA alive

Frequently Asked Questions

Fishbone Diagram

Statistical Process Control

AI Agents for DMAIC

Make FMEA a living document

FMEA

What is FMEA?

Three FMEA Variants

Design FMEA (DFMEA)

Process FMEA (PFMEA)

Service / Software FMEA

Severity, Occurrence, Detection

Severity (S)

Occurrence (O)

Detection (D)

RPN, Action Priority, and the Trap to Avoid

What RPN means

Why RPN alone is a trap

Action Priority (the AIAG-VDA update)

Use both

AI-Augmented FMEA

Seed the failure modes from history

Score Occurrence from real data

Detect Detection gaps

Keep the FMEA alive

Frequently Asked Questions

Keep Reading

Fishbone Diagram

Statistical Process Control

AI Agents for DMAIC

Make FMEA a living document