Fail on Paper First: A Working Guide to FMEA
Failure Mode and Effects Analysis, or FMEA, is the Lean Six Sigma tool for failing on paper before you fail in production. Instead of waiting for a process to break and then reacting, a team works through every way it could break, judges how bad and how likely each is, and fixes the worst ones first. The discipline earned its keep in aerospace and automotive long before it reached service and supply-chain work, and in early 2021, with strained suppliers and reshuffled, partly-remote teams introducing fresh failure points, the habit of anticipating breakage is as relevant as ever. Here is how to run one without drowning in spreadsheet cells.
Build the table, one row per failure mode
An FMEA is a structured table. For each step of the process, you list the ways it could fail and reason through three things. Keep the team small and cross-functional; the value comes from the people who actually run the process, not from a tidy form.
Name the failure mode and its effect. What goes wrong, and what does the customer or next step feel as a result? Be specific: not order is wrong but customer receives the wrong part number.
Score Severity (S). How bad is the effect if it happens, on a 1 to 10 scale? A safety or compliance breach sits near 10; a minor cosmetic issue near 1.
Score Occurrence (O). How likely is this cause to occur, 1 to 10? Base it on data or honest experience, not optimism.
Score Detection (D). How likely are your current controls to catch it before it reaches the customer, 1 to 10? Here the scale inverts: a failure you would almost certainly catch scores low, one you would miss scores high.
Multiply for the RPN. Risk Priority Number = S x O x D. It ranks the rows so you attack the biggest risks first instead of the loudest complaint.
Act on the score, then re-score
The RPN is a triage tool, not a verdict. A high number tells you where to spend attention. For the top rows, decide which factor to attack: reduce Severity by changing the design so the failure does less harm, reduce Occurrence by removing the cause, or improve Detection by adding a control. Detection fixes are often the cheapest but the least durable, since they catch the problem rather than prevent it; prefer reducing Occurrence where you can. Then re-score the row to confirm the action actually moved the risk.
Do not chase low RPNs to zero; stop when the remaining risk is acceptable and your effort is better spent elsewhere.
Watch for a high Severity hiding behind a low RPN; a catastrophic but rare, easily-caught failure can still deserve action on its own.
Keep the FMEA a living document, revisited when the process or its inputs change.
FMEA fits naturally in the Analyze and Improve phases of DMAIC, but its real strength is cultural: it gives a team permission to talk openly about how things break, before a customer or an auditor finds out for them. An hour spent failing on paper is far cheaper than the failure itself.
If you want to find and design out the failure points in a critical process before they cost you, XNM's strategic advisory can help you run the analysis and act on what it surfaces.