Root Cause Analysis; Template and Discussion

A typical interpretation of a Root Cause Analysis (RCA) is to identify parties responsible and apportion blame.  I prefer to believe a Root Cause Analysis is a tool to discover internal and external deficiencies and put in place changes to improve them.  These deficiencies can span the entire spectrum of a system of people, processes, tools and techniques, all contributing to what is ultimately a regrettable problem.

Rarely is there a singular causal event or action that snowballs into a particular problem that necessitates a Root Cause Analysis.  Biases, assumptions, grudges, viewpoints are typically hidden baggage when investigating root causes.  Hence  it is preferable to use a somewhat analytical technique when faced with a Root Cause Analysis.  An objective analytical technique assists in removing these personal biases that make many Root Cause Analysis efforts less effective than they should .

I present below a rationale and template that I have used successfully for conducting Root Cause Analysis.   This template is light enough to be used within a couple of short facilitated meetings.  This contrasts to exhaustive Root Cause Analysis techniques that take days or weeks of applied effort to complete.  In most occasions, the regrettable action is avoidable in the future by making changes that become evident in a collective effort of a few hours to a few days.  When having multiple people working on a Root Cause Analysis, this timebox allows analysis within a day.

Continue reading “Root Cause Analysis; Template and Discussion”

Using Regression Isolation to Decimate Bug Time-to-Fix

Software defects generally fall into two primary classifications: defects that represent functional misses of a new implementation, or defects that are regressions in the behaviour of the system due to some change.  This paper discusses a major reduction in the Time-To-Fix for regressions.  In a previous role I lead a team of engineers in implementing this system.

To improve the Time-To-Fix, it was ultimately determined that we needed to resolve two problems that accounted for the majority of the time developers spent dealing with regressions:

  • Reduce effort to isolate the regression.  Early effort on bug analysis can be greatly accelerated if the guess-work on which change causes the regression is removed. Using bisection provides an absolute level of confidence of the regression trigger.   There is no guessing or supposition.  I have written on the use of bisection in preference of code diving in this blog post.
  • Ensure that the regression goes to the correct team the first time. Reduction of the bouncing around of the defect report is achieved by identifying the engineer who regressed the issue and ensuring that it is fixed by that person or team.  In particular breaking the attempt to reproduce, debug, re-queue cycle prevents days of wasted engineering time.

We implemented and deployed the system described between 2008 and 2010. Read on for more information.

Continue reading “Using Regression Isolation to Decimate Bug Time-to-Fix”

All Projects Great and Small

Projects come in all shapes and sizes. However there is a human tendency to begin to look for consistency in the way projects are run. If there is the possibility to make all the projects look the same and be approximately the same size, then we’ll force them to the same size. In my particular domain, software projects will aggregate sub projects until they get to a particular size that warrants an organization’s standard project methodology.  This has a number of possibly unintended consequences that bear consideration.
Continue reading “All Projects Great and Small”