Bringing project defect count under control using root cause analysis

Overview

Defects in software, if left unchecked, can threaten the viability of a project.  Performing root cause analysis using Toyota's '5 Whys' approach can uncover the source of the problem, and limit the number of defects injected.

The scenario

It is an unfortunate state of our industry that we accept bugs, or defects, on our software projects as a matter of course.  When the software in question is not related to human safety, such as in medical, nuclear power, or aeronautical systems, a bug threshold of some degree seems to be just a matter of course.  However, there is by g difference between being in control of the bugs in the project, and the bugs threatening the viability of the project. Once that line has been crossed, it is a very difficult path back.

The solution is never as simple as having a bug-bashing fest or instituting a 'bug killer of the week' award. I've been in that situation and have found that you often end of creating, or discovering, as many bugs as you fix. One of the tools you need in your arsenal as a software professional is the courage and ability to 'down-tools', step away from the keyboard, and work out where all the bugs are coming from in the first place.

Despite it's criticisms, for technically minded people like us, performing root cause analysis using the '5 Why's approach can be a simple, but effective way of gaining insights into how to prevent the defects from being created in the first place.

The 5 Why's

Originally developed at Toyota, the 5 Whys  is an iterative question asking approach for finding the source of a problem.

An example

Here is an example from a project I was working on where the defect count was starting to get out of control:

1.Why was this bug raised?
Because the expected behaviour was missing.

2.Why?
Because the developer forgot to implement it?

3. Why?
Because there was a long time between when they read the specification for the requirement and when they completed the work.

4. Why?
Because it was a large requirements with many edge cases.

5. Why?
Because it was thought that it would be more efficient for one set of related work to be done by one person.

...except it wasn't. A number of use cases were forgotten, a bug raised for each, and they were fixed by different developers. And is it really a bug if rather than an implementation fault, it was actually an omission?

Can you see how these questions helped us understand the issue?

The result

Based on these insights, the team made two changes:

Firstly, we started breaking down requirements into smaller chunks - to a size where a single developer could complete it in 2 - 3 days.
Secondly, developers started talking testers through the changes they had made prior to committing their code to source control. This introduced a straw man effect as developers started finding their own omissions, prior to, or during the discussion. Additionally, the testing was more focussed and more effective.

These practices slowed our defect injection rate and brought the project back under control again.

Next steps

I hope you don't find yourself on project with a high defect count, but if you do, give this a go and see if it helps. 

Share your experiences in the comments below.

Comments