What’s the right time to ensure our products are free from defects? Is it before or after you launch, or some other time?
I’ve written recently on the topic of how to handle defects and bugs through feature development and the purpose and value of stabilisation before release. The conversations that this sparked led me to reflect on when is the right time to handle these, and I found the following analogy to be quite useful.
Imagine you are the general manager of a toy company. When would the best time be to ensure you have no harmful defects in your products?
Several months ago you released ‘Loko‘, a new toy train, to market and your advertising has been effective at getting many eager children to pester their parents into getting them one. You now have many hundreds of happy children playing with it, at home, in the playground, everywhere. It has become ubiquitous.
Then you start to hear reports—from retailers and on social media—of complaints from parents about their child needing the doctor because they have become ill; and you discover that, when wet, Loko releases mildly-toxic odours. You issue a product advisory that children should not allow their Loko to get wet, but more and more reports come in. You do some checking yourself and realise that this is potentially very serious, so you issue a product recall, pulling it off shop shelves and provide collection boxes at the retailers so that parents can return the toy and get a refund.
The cost to your company was $500,000 and a tarnished brand. This has been an expensive lesson to learn, both in terms of financials and reputation; and while there has been no lasting damage, the potential harm that could have been caused is very sobering. So your mind turns to thinking about what has gone wrong, and where you could have caught this problem.
This is often all too familiar with software products released to market. There seems occasionally to be an attitude of knowing that customers will find defects with your software and that you have processes to handle that, so that can’t be too bad, right? Wrong. As with our fictitious toy scenario, allowing defective software into the market can harm your reputation, and depending on the application, could potentially lead to more far-reaching effects too.
Your research discovers that while your product had been manufactured some months before shipment, they had been stored in one of your warehouses. While talking to staff at the warehouse, you hear people say that they do remember a funny smell from the racks on which they were stored, but they didn’t say anything because they didn’t want to “rock the boat”.
This is an insidious problem, when people are aware that their might be an issue but don’t speak out. Lot’s of things have gone wrong in history with that attitude … to misappropriate Edmund Burke’s famous quote: “all it takes for defects to triumph, is for good men to do nothing”.
In our example, people suspected that there might be something wrong, but thought that it was not their responsibility to do anything about it, or worse yet, that they would be in trouble if they pointed it out. While the product itself has already been made, so that people might think you couldn’t do anything about the smell anyway, the costs of releasing it to market were far higher.
Catching the problems with Loko before it was shipped would have meant writing off the cost of production to that point — $150,000 — still a lot of money, but far easier to swallow than half a million dollars. You formulate some training and posters to reinforce that this is everyone’s responsibility, that stopping shipments going out when staff had concerns would have prevented all that cost and heartache. Your mind now turns to the root cause; what had gone wrong to allow these to have been manufactured in this way?
On our software projects, unless we our fortunate to be working with a company with continuous deployment, there is always a period of time elapsed between completing the software and it being put into production. This is a period where we might expect to put our software through additional testing (for performance, security, compliance, operational acceptance, etc.). How fastidious are we with ensuring that any defect we uncover is resolved before it is certified for release? We can let low severity defects go, because the next project can resolve those, right?. Wrong! Releasing software with known issues is a common cause of technical debt, and can cause many problems later on. We need to focus on releasing software with as near ‘zero debt’ as we can.
I once worked with a telecommunications company whose released technical debt included more than 5,000 low severity defects. This was clearly embarrassing, so every six months they would convene a meeting to cull the oldest defects off the list. That is, they removed them from the list, not actually fixed them, on the basis that if it truly is a problem it will be raised again. In other words, act as if they had never been found in the first place.
You look into the manufacturing process and can see from the records that Loko was tested at key points in its production. While you are reassured that quality control is happening, you are concerned that this didn’t pick up the odour or, if it did, it wasn’t thought to be a problem.
However, that’s relatively straight forward to fix. Some time with the quality department would lead to an improved approach to sampling and some additional tests to run on the samples. You calculate that this would have caught the problem within the first sample of the first batch, and while scrapping that production run would have meant writing off $50,000 — that’s even better than the figures we’ve already been looking at. Where did the problem originate, though?
This is why we test as we go on agile projects. Especially on Scrum projects, we have testers and programmers embedded in the same development team, testing each component and each feature as it is developed. Nothing is classified as ‘done’ unless it is agreed to meet our ‘acceptance criteria’ and ‘definition of done’. While this often means we are doing testing much earlier, and implies a higher cost in staffing levels, being able to fix problems as soon as they occur is far cheaper overall. We support this by using a different name. When we find them at the same time as they are developed, we call these just bugs or errors, rather than defects.
Finally, you trace the problem down to the type of plastic used. You find that the raw materials were imported, and by looking into the supplier you discover that other companies that used the same supplier also had the same problems.
Perhaps there was a way to have avoided the problem altogether, by considering key decisions made during design and material selection. To have taken greater care at this point, would have meant changing to a different supplier, and while this would have cost and additional $10,000 for more expensive materials, this again is far cheaper than catching later at any point.
If we can recognise that our quality controllers have something to contribute, that we need to consider the whole process from the beginning, we can say legitimately say that we are talking about ‘quality assurance‘, rather than just ‘quality control‘, because we are designing quality in, right from the start.
On our software projects, involving all the disciplines right from the outset pays dividends in a similar way. Consider including testers, operators, credit control, sales, and customer support at the beginning to ensure that all groups affected have an opportunity to explain their needs. On most software projects, nearly 45% of defects arise from requirements and design alone.
At my current company, the product owner refines the backlog 1-2 sprints ahead, usually involving an analyst, architect, UX designer, developer, and tester. This refining ensures much sharper acceptance criteria and more clearly defined scenarios of how the feature should work. Following this approach, over a period of 18 months, our teams have reduced the volume of defects by around 70% each release, from over 900 defects to the low tens — while delivering lots of new features and supporting a far wider range of smartphones and tablets.