Root cause analysis (RCA) is a systematic process of identifying the origin of an incident.
When feeling under the weather, it’s perfectly natural to address any pain or discomfort by some sort of first aid treatment or superficial remedy. However, if you consult a medical professional, then the approach might be a little more thorough. You might find yourself being asked a series of specific questions about your condition, and might even go through some laboratory tests to get to the source of your illness.
The same is true for plant and maintenance incidents. While an immediate response is usually required, there's always value in performing a systematic analysis of possible root causes.
RCA is the process that aims to identify the cause of a particular event. In the plant setting, this event usually refers to any potential problems that will disrupt standard operations. At a very high level, the usual causes of problems can be categorized as:
The general process of RCA requires you to describe what happened, why and how it happened, and what steps are needed to prevent the same event from happening in the future. The process can get very complex depending on the situation. Thankfully, some common methods were developed to aid in identifying the root cause.
Root cause analysis makes use of a number of methods that help teams brainstorm and pinpoint likely causes of issues in a facility. The following methods can assist maintenance teams when performing root cause analysis.
The name of the method pretty much explains the steps: ask why and ask it again. Asking “why?” five times usually gets to the bottom of the problem, but don’t let the name stop you from asking more times. The idea is to drill down to the details of an event until you are left with the actual root cause.
An example involving a faulty mixer subjected to 5 Whys is shown below.
The 5 Whys method is the simplest RCA tool at your disposal. As such, it’s often best for operators and others performing the day-to-day labor in the facility.
A more visual method to determine root causes is the fault tree diagram. A fault tree diagram starts by having the problem at the uppermost block. The immediate causes preceding the problem event are listed, then they branch out to form the second layer of the diagram. Each immediate cause branches out to its own prior causes. This process is continued until the most basic events are identified, which then become your potential root causes.
The same mixer can resemble the following fault tree diagram:
Another visual method to identify root causes is the fishbone diagram (also known as an Ishikawa diagram, named after its creator, Kaoru Ishikawa). It starts by specifying the problem on the rightmost part of the diagram. The factors contributing to the main problem are then listed as categories. Specific causes under each category are then listed down to identify the source of the problem.
As a general guide, the following categories are used as starting points:
Applying these base categories as a starting point, the mixer problem can be translated into a fishbone diagram.
The following two methods—FMEA and the Pareto method—tend to be more forward-looking than most other RCA tools, and work best when performed on a routine basis rather than only after an equipment failure.
Failure Mode and Effects Analysis, or FMEA for short, is a method of identifying ways in which assets might fail. One takes stock of the potential failure modes that individual assets might experience and analyzes how those failures might impact business processes.
FMEA differs from the other RCA tools discussed so far because it looks forward at what might happen rather than hypothesizing over a failure that already occurred. However, it can still be useful when it comes to finding root causes. Facilities that take the time to perform FMEA will have a ready-to-use database of potential causes and effects to draw upon when analyzing a failure event, ultimately expediting the process.
The Pareto method is based on what’s commonly called the Pareto principle, which states that 80% of all problems result from 20% of all causes.
When drawn into a chart, potential causes of the problem are listed from left to right in order of impact (greatest on the left, least on the right) and frequency. Each problem is represented in the diagram as a bar, and that bar’s height represents its frequency.
In addition to the bars in the chart, a line is also charted across the diagram to show the cumulative impact of each cause (ascending from left to right).
A Pareto diagram can be used to visualize data from FMEA in a way that helps maintenance teams target the most important issues first. That way, the team spends less time on tasks that don’t matter.
Effective root cause analysis helps maintenance teams focus on fixing the core causes of problems rather than constantly treating symptoms. A few ways in which RCA achieves that include the following.
Whenever a machine breaks down, maintenance teams often focus solely on bringing it back online. In fact, about 56% of all facilities use a run-to-failure maintenance strategy with at least some of their assets.
However, without researching the root causes of these breakdowns, they're unlikely to go away. Odds are the asset will break down again in the future.
When performed correctly, RCA helps teams focus on important preventive maintenance tasks. Given that as much as half of all PMs ultimately accomplish nothing, that could translate into vastly reduced maintenance costs.
When getting to the root of a problem, it’s common for individuals to blame other people, departments, etc. One goal of RCA is to avoid this type of situation where everyone blames one another for problems instead of looking at core systemic issues.
The problem here is that issues related to human error need to be resolved with adequate processes and controls—the issue won’t necessarily be solved by removing a given human being from the situation since any other person could make the same mistake. As such, the root cause is related to processes and procedures, not people.
Proper RCA avoids this problem by helping the team work together to identify issues that are related to systems, processes, and machines while driving toward actionable plans. Ultimately, it helps people get on the same page.
By focusing on identifying root causes of problems, maintenance teams switch their perspective from maintaining the status quo toward continuous improvement. In fact, a core facet of Kaizen (or continuous improvement) is the analysis of existing processes, which RCA embodies perfectly.
As maintenance teams perform RCA on failed equipment, that process naturally translates into finding ways to improve existing processes as well. After all, the purpose of root cause analysis is to get to the fundamental causes of an issue and work on repairing those rather than focusing on fixing failed equipment alone.
When root causes are discovered and properly dealt with, equipment runs more reliably, resulting in fewer breakdowns, overall better processes, and more consistent output quality.
While RCA methods are very common and well-known to the maintenance community, there can be challenges to making RCA thrive.
The first step to mastering this process is knowing the methods that are available to conduct RCAs. The next steps are setting the proper mindset and improving the quality of execution to drive the initiative toward success.
Keep in mind the importance of collecting data accurately and involving the correct groups to analyze that data. To implement RCA effectively, it should be a repeatable process that's collaboratively executed by the group.
In order to successfully implement RCA and receive its full benefits, it must be done correctly. The following pointers can help you implement root cause analysis effectively in your facility.
Good information is vital to completing any process successfully, and RCA is no exception to that rule. In order to get the most out of it, you’ll need to make sure you’re collecting data from your facility’s processes.
There are several ways to do this, of course. One of the simplest is to implement a CMMS at your facility if you haven’t already. Computerized maintenance management systems provide a way to collect data from work orders, meter readings, and so forth, all of which can be invaluable when analyzing an issue.
As you consistently collect good information on your facility’s equipment and processes, you’ll make RCA more precise. In addition, the practice of collecting that information supports proactive RCA as you notice trends in the data leading up to potential future problems.
Generally, the most effective processes aren’t necessarily the most perfect, but the ones that can be easily repeated. While making sure you’re continuously improving your root cause analysis is important, it’s unlikely to become a regularly used tool in your facility if it’s not fundamentally repeatable.
Some ways to create a repeatable RCA process include:
In order to analyze incidents, you first need to be aware of them. Logging asset data can help with that, but it’s absolutely vital for your employees to feel free to report incidents or problems when they occur.
As such, incident reporting in your facility should be fear-free and open to everyone. One way to accomplish this is to make your incident reporting process anonymous. Employees can fill out a form without having their own name attached to it, which helps eliminate the anxiety that’s often associated with reporting an equipment breakdown, fault, or accident.
RCA is most effective when you’re able to prioritize causes. Rather than spreading your time and efforts across numerous potential causes, you’re able to focus on resolving the issues that have the most impact (and the greatest cost).
As mentioned above, FMEA and Pareto diagrams can help your team prioritize the right causes. After figuring out a number of potential causes, it’s often worthwhile to analyze the potential impact of each one to see where you can make the greatest difference.
It’s important not to rush the RCA process. While you don’t want to delay it or spend too much time analyzing the issue—resulting in “analysis paralysis”—neither do you want to rush to a superficial conclusion of what caused your problem.
Make sure you’ve assessed as many probable causes as are reasonable to consider and have gotten to the true underlying issues in your facility before creating a plan of action. Remember, it’s often important to try to find multiple potential causes rather than stopping after the first since most complex problems have multiple contributing factors.
RCA is best done as a collaborative effort. After all, there may be multiple issues at play, and it’s important to have a variety of skillsets and expertise at the table. Potential qualified team members include:
In addition, you’ll want someone who has enough authority to help the team overcome organizational roadblocks in the investigation process.
Finally, at least one person you select for your RCA team should have solid investigation skills. They should be the sort of person who's naturally diligent and impartial with a keen eye for detail.
Even with a repeatable process and a solid team, RCA will still get you nowhere if you’re unclear on the actual problems you’re discussing. Before beginning your discussions, you’ll need to pinpoint exactly what the problem is and how it shows itself in your processes.
Without that, one of two things might happen:
Neither result will help you solve the actual issue, so make sure everyone is clear on the problem before you begin your analysis.
Finally, it’s important to measure the results of your RCA process in order to gauge its success. If the same incident occurs again, that’s your cue to perform a more in-depth analysis or make other adjustments to your process in the future. In the end, your RCA and other processes will be in a consistent state of improvement.
Root cause analysis is a powerful process that enables an organization to identify the source of a problem. Performing RCA processes effectively can significantly improve a plant’s performance by implementing correct solutions that last.
What are the most common root cause analysis (RCA) mistakes and how do I avoid them?
Which root cause analysis (RCA) method should I use?
What are some quick wins for my reliability program?
4,000+ COMPANIES RELY ON ASSET OPERATIONS MANAGEMENT
Your asset and equipment data doesn't belong in a silo. UpKeep makes it simple to see where everything stands, all in one place. That means less guesswork and more time to focus on what matters.