A typical interpretation of a Root Cause Analysis (RCA) is to identify parties responsible and apportion blame. I prefer to believe a Root Cause Analysis is a tool to discover internal and external deficiencies and put in place changes to improve them. These deficiencies can span the entire spectrum of a system of people, processes, tools and techniques, all contributing to what is ultimately a regrettable problem.
Rarely is there a singular causal event or action that snowballs into a particular problem that necessitates a Root Cause Analysis. Biases, assumptions, grudges, viewpoints are typically hidden baggage when investigating root causes. Hence it is preferable to use a somewhat analytical technique when faced with a Root Cause Analysis. An objective analytical technique assists in removing these personal biases that make many Root Cause Analysis efforts less effective than they should .
I present below a rationale and template that I have used successfully for conducting Root Cause Analysis. This template is light enough to be used within a couple of short facilitated meetings. This contrasts to exhaustive Root Cause Analysis techniques that take days or weeks of applied effort to complete. In most occasions, the regrettable action is avoidable in the future by making changes that become evident in a collective effort of a few hours to a few days. When having multiple people working on a Root Cause Analysis, this timebox allows analysis within a day.
What goes into a Root Cause Analysis?
I’ll work through the analysis and background that went into the template, as well as the sections themselves. The research 5 Ws can be sparingly used here to guide what we need to look at to understand the Root Cause – namely “What”, “When” and “Why”. I also close with the 5-W optional “How”, to close and help drive actions for prevention or avoidance in the future.
First, the metadata.
The metadata is the top level information that frames and differentiates the issue. Typical information would Event Date, Impacted Parties, Impact, Names, References, etc. The meta data should provide sound bytes that can be strung together to ensure that people can identify and understand the issue unambiguously.
For example, you can weave a coherent sentence with metadata such as “on <Date> that <Impact>” – e.g. The Production Database Deployment Failure we had on the 15th that took out finance”). Giving people consistency in information allows for faster recall of information and easily allow people to quickly find where the RCA <span class=”hiddenGrammarError” pre=”RCA “>is filed</span> both electronically and mentally.
What occured and When?
To understand what occurred we need to look at as many facts surrounding the regrettable event. Objective discussion is critical in defining what occurred. The two strongest ways that I have found are using a Narrative or a Timeline.
A Narrative is a fact based description of the chain of events. It is written as prose providing a description of visible chain of events. It should be primarily describing the impact and the facts surround the impact. Care must be take to not explore the root cause itself, and focus purely on what occurred. You need to almost imagine yourself as a lawyer shouting “Objection your Honor, Conjecture” as your are writing a narrative.
The alternate method of capturing the What and When is through a Timeline. In production environments, you may be fortunate enough to have a clear timeline that can be transcribed. Having discrete times and events often helps in remaining objective and focusing purely on the visible events and impacts around the system.
The narrative helps bring focus into the actors (systems, individuals, etc) that are at play with a root cause. To help identify the actors, consistency is again key. Same name, same role, same terms.
Using either of these techniques can usually restrict the What and When to half to a full page.
Why did it occur?
This is the fun part of the Root Cause Analysis – well I find it fun anyway.. Any analytical, repeatable and reportable technique is fine here.
Techniques include 5 Whys, Ishikawa Diagram and many others. Think Reliability explore a lot of these techniques in their Cause Mapping system (with lots of interesting case studies). I will eventually post a blog post examining the different analytical techniques.
I personally tend to use a Modified Ishikawa Diagram where I use People, Process, Environment, Code Design & Requirements since I am in the software world. This is included in my template below. The common references that I see have domain specific considerations below, with People, Process & Environment above. Modify as you see fit.
The analysis portion of the Root Cause Analysis can generally be a facilitated meeting that captures what is discovered. Stop analysis when you start seeing diminishing returns of the analysis.
How do we prevent it happening in the future?
As the analysis winds down to terminal contributing root causes you will typically have quite a few possible contributory root causes. For each of these root causes, you should ask yourself what is needed to prevent that root cause.
Each corrective action should have your typical task items (Priority, Owner, Description, Delivery Date). Each root cause may result in multiple corrective actions. A recent Root Cause Analysis I conducted resulted in 11 corrective actions across 8 contributing root causes.
The Template
Thanks for exploring what goes into a good root cause analysis document. Although ridiculously simple, the following sections are all that are contained within the template I have used successfully a number of times. The section mappings to the Ws+H above should be self evident (1 – metadata; 2 & 3 – What & When; 4 – Why; 5 – How)
-
Issue Information
-
Issue Narrative
-
Timeline
-
Modified Ishikawa Analysis
-
Corrective Actions
I have shared my reference template as a published Google Document, a read-only Google Document, a PDF and Word Document word document and finally a pdf document. Choose what ever takes your fancy and feeds into your workflow. I also have a published version embedded below. The Google based documents will always be up to date.
If you derive and republish, feel free to link back here.
Within the template I have included italicized guidance for each section. Unfortunately, I have not been able to generate a real-world root cause analysis that I am able to publish as a good example without divulging proprietary information. If you use this template and generate a good case study, I would love to link to it.
Like all good templates, after working through the template a number of times, you may find that your approach and thinking adapt to see the world similar to templates.