Root Cause Analysis

Root cause analysis (RCA) is a systematic process of identifying the origin of an incident.

Overview

When feeling under the weather, it’s perfectly natural to address any pain or discomfort by some sort of first aid treatment or superficial remedy. However, if you consult a medical professional, then the approach might be a little more thorough. You might find yourself being asked a series of specific questions about your condition, and might even go through some laboratory tests to get to the source of your illness.

The same is true for plant and maintenance incidents. While an immediate response is usually required, there's always value in performing a systematic analysis of possible root causes.

RCA is the process that aims to identify the cause of a particular event. In the plant setting, this event usually refers to any potential problems that will disrupt standard operations. At a very high level, the usual causes of problems can be categorized as:

  • Technical issues affecting physical parts
  • Human causes, or when an assigned individual does not perform a task correctly
  • System causes, or lapses in processes

The general process of RCA requires you to describe what happened, why and how it happened, and what steps are needed to prevent the same event from happening in the future. The process can get very complex depending on the situation. Thankfully, some common methods were developed to aid in identifying the root cause.

Common RCA Methods

Root cause analysis makes use of a number of methods that help teams brainstorm and pinpoint likely causes of issues in a facility. The following methods can assist maintenance teams when performing root cause analysis.

5 Whys

The name of the method pretty much explains the steps: ask why and ask it again. Asking “why?” five times usually gets to the bottom of the problem, but don’t let the name stop you from asking more times. The idea is to drill down to the details of an event until you are left with the actual root cause.

An example involving a faulty mixer subjected to 5 Whys is shown below.

5 why method diagram

The 5 Whys method is the simplest RCA tool at your disposal. As such, it’s often best for operators and others performing the day-to-day labor in the facility.

Fault Tree Analysis

A more visual method to determine root causes is the fault tree diagram. A fault tree diagram starts by having the problem at the uppermost block. The immediate causes preceding the problem event are listed, then they branch out to form the second layer of the diagram. Each immediate cause branches out to its own prior causes. This process is continued until the most basic events are identified, which then become your potential root causes.

The same mixer can resemble the following fault tree diagram:

fault tree analysis diagram

Fishbone Diagram (AKA Ishikawa Diagram)

Another visual method to identify root causes is the fishbone diagram (also known as an Ishikawa diagram, named after its creator, Kaoru Ishikawa). It starts by specifying the problem on the rightmost part of the diagram. The factors contributing to the main problem are then listed as categories. Specific causes under each category are then listed down to identify the source of the problem.

As a general guide, the following categories are used as starting points:

  • Environmental
  • People
  • Equipment/material
  • Procedures

Applying these base categories as a starting point, the mixer problem can be translated into a fishbone diagram.

fishbone diagram

Tip: You won’t necessarily use all of these tools. In fact, you should only choose the RCA methods that will be of most benefit to your situation.

Additional RCA Tools

The following two methods—FMEA and the Pareto method—tend to be more forward-looking than most other RCA tools, and work best when performed on a routine basis rather than only after an equipment failure.

FMEA

Failure Mode and Effects Analysis, or FMEA for short, is a method of identifying ways in which assets might fail. One takes stock of the potential failure modes that individual assets might experience and analyzes how those failures might impact business processes.

FMEA differs from the other RCA tools discussed so far because it looks forward at what might happen rather than hypothesizing over a failure that already occurred. However, it can still be useful when it comes to finding root causes. Facilities that take the time to perform FMEA will have a ready-to-use database of potential causes and effects to draw upon when analyzing a failure event, ultimately expediting the process.

Pareto Method

The Pareto method is based on what’s commonly called the Pareto principle, which states that 80% of all problems result from 20% of all causes.

When drawn into a chart, potential causes of the problem are listed from left to right in order of impact (greatest on the left, least on the right) and frequency. Each problem is represented in the diagram as a bar, and that bar’s height represents its frequency.

In addition to the bars in the chart, a line is also charted across the diagram to show the cumulative impact of each cause (ascending from left to right).

A Pareto diagram can be used to visualize data from FMEA in a way that helps maintenance teams target the most important issues first. That way, the team spends less time on tasks that don’t matter.

Advantages of Effective Root Cause Analysis

Effective root cause analysis helps maintenance teams focus on fixing the core causes of problems rather than constantly treating symptoms. A few ways in which RCA achieves that include the following.

More Efficient Problem Resolution

Whenever a machine breaks down, maintenance teams often focus solely on bringing it back online. In fact, about 56% of all facilities use a run-to-failure maintenance strategy with at least some of their assets.

However, without researching the root causes of these breakdowns, they're unlikely to go away. Odds are the asset will break down again in the future.

When performed correctly, RCA helps teams focus on important preventive maintenance tasks. Given that as much as half of all PMs ultimately accomplish nothing, that could translate into vastly reduced maintenance costs.

Puts Everyone on the Same Page

When getting to the root of a problem, it’s common for individuals to blame other people, departments, etc. One goal of RCA is to avoid this type of situation where everyone blames one another for problems instead of looking at core systemic issues.

The problem here is that issues related to human error need to be resolved with adequate processes and controls—the issue won’t necessarily be solved by removing a given human being from the situation since any other person could make the same mistake. As such, the root cause is related to processes and procedures, not people.

Proper RCA avoids this problem by helping the team work together to identify issues that are related to systems, processes, and machines while driving toward actionable plans. Ultimately, it helps people get on the same page.

Builds a Culture of Continuous Improvement

By focusing on identifying root causes of problems, maintenance teams switch their perspective from maintaining the status quo toward continuous improvement. In fact, a core facet of Kaizen (or continuous improvement) is the analysis of existing processes, which RCA embodies perfectly.

As maintenance teams perform RCA on failed equipment, that process naturally translates into finding ways to improve existing processes as well. After all, the purpose of root cause analysis is to get to the fundamental causes of an issue and work on repairing those rather than focusing on fixing failed equipment alone.

Overall Better Quality

When root causes are discovered and properly dealt with, equipment runs more reliably, resulting in fewer breakdowns, overall better processes, and more consistent output quality.

Implementing Root Cause Analysis

While RCA methods are very common and well-known to the maintenance community, there can be challenges to making RCA thrive.

The first step to mastering this process is knowing the methods that are available to conduct RCAs. The next steps are setting the proper mindset and improving the quality of execution to drive the initiative toward success.

Keep in mind the importance of collecting data accurately and involving the correct groups to analyze that data. To implement RCA effectively, it should be a repeatable process that's collaboratively executed by the group.

8 Tips for Performing Effective Root Cause Analysis

In order to successfully implement RCA and receive its full benefits, it must be done correctly. The following pointers can help you implement root cause analysis effectively in your facility.

1. Collect Solid Information

Good information is vital to completing any process successfully, and RCA is no exception to that rule. In order to get the most out of it, you’ll need to make sure you’re collecting data from your facility’s processes.

There are several ways to do this, of course. One of the simplest is to implement a CMMS at your facility if you haven’t already. Computerized maintenance management systems provide a way to collect data from work orders, meter readings, and so forth, all of which can be invaluable when analyzing an issue.

As you consistently collect good information on your facility’s equipment and processes, you’ll make RCA more precise. In addition, the practice of collecting that information supports proactive RCA as you notice trends in the data leading up to potential future problems.

2. Create a Repeatable Process

Generally, the most effective processes aren’t necessarily the most perfect, but the ones that can be easily repeated. While making sure you’re continuously improving your root cause analysis is important, it’s unlikely to become a regularly used tool in your facility if it’s not fundamentally repeatable.

Some ways to create a repeatable RCA process include:

Tip: A few ways to develop a culture of continuous improvement include utilizing preventive maintenance practices, regularly checking for sources of waste, and tracking data.

3. Facilitate Incident Reporting

In order to analyze incidents, you first need to be aware of them. Logging asset data can help with that, but it’s absolutely vital for your employees to feel free to report incidents or problems when they occur.

As such, incident reporting in your facility should be fear-free and open to everyone. One way to accomplish this is to make your incident reporting process anonymous. Employees can fill out a form without having their own name attached to it, which helps eliminate the anxiety that’s often associated with reporting an equipment breakdown, fault, or accident.

4. Prioritize Causes

RCA is most effective when you’re able to prioritize causes. Rather than spreading your time and efforts across numerous potential causes, you’re able to focus on resolving the issues that have the most impact (and the greatest cost).

As mentioned above, FMEA and Pareto diagrams can help your team prioritize the right causes. After figuring out a number of potential causes, it’s often worthwhile to analyze the potential impact of each one to see where you can make the greatest difference.

5. Take Your Time

It’s important not to rush the RCA process. While you don’t want to delay it or spend too much time analyzing the issue—resulting in “analysis paralysis”—neither do you want to rush to a superficial conclusion of what caused your problem.

Make sure you’ve assessed as many probable causes as are reasonable to consider and have gotten to the true underlying issues in your facility before creating a plan of action. Remember, it’s often important to try to find multiple potential causes rather than stopping after the first since most complex problems have multiple contributing factors.

6. Get a Qualified Team Together

RCA is best done as a collaborative effort. After all, there may be multiple issues at play, and it’s important to have a variety of skillsets and expertise at the table. Potential qualified team members include:

  • Maintenance professionals
  • Operators
  • Reliability engineers

In addition, you’ll want someone who has enough authority to help the team overcome organizational roadblocks in the investigation process.

Finally, at least one person you select for your RCA team should have solid investigation skills. They should be the sort of person who's naturally diligent and impartial with a keen eye for detail.

7. Be Clear on the Problem

Even with a repeatable process and a solid team, RCA will still get you nowhere if you’re unclear on the actual problems you’re discussing. Before beginning your discussions, you’ll need to pinpoint exactly what the problem is and how it shows itself in your processes.

Without that, one of two things might happen:

  1. Your team finds a solution to a problem you don’t actually have, or;
  2. Each member of the team has their own mental concept of the issue, turning the discussion into an unproductive argument.

Neither result will help you solve the actual issue, so make sure everyone is clear on the problem before you begin your analysis.

8. Measure Your Results

Finally, it’s important to measure the results of your RCA process in order to gauge its success. If the same incident occurs again, that’s your cue to perform a more in-depth analysis or make other adjustments to your process in the future. In the end, your RCA and other processes will be in a consistent state of improvement.

Tip: Be sure to avoid common RCA mistakes during the process.

Conclusion

Root cause analysis is a powerful process that enables an organization to identify the source of a problem. Performing RCA processes effectively can significantly improve a plant’s performance by implementing correct solutions that last.

Want to keep reading?

What are the most common root cause analysis (RCA) mistakes and how do I avoid them?

Some of the most common root cause analysis mistakes involve poor definitions, focusing on the wrong thing, or ignoring root causes entirely.
View Article

Which root cause analysis (RCA) method should I use?

Of the numerous RCA methods used, the one you choose depends on the depth and nature of the problem, as well as the number of contributing causes.
View Article

What are some quick wins for my reliability program?

A few quick wins for your reliability program include improving lubrication, optimizing preventive maintenance, and tightening STO.
View Article

4,000+ COMPANIES RELY ON ASSET OPERATIONS MANAGEMENT

Leading the Way to a Better Future for Maintenance and Reliability

Your asset and equipment data doesn't belong in a silo. UpKeep makes it simple to see where everything stands, all in one place. That means less guesswork and more time to focus on what matters.

Capterra Shortlist 2021
IDC CMMS Leader 2021
[Review Badge] GetApp CMMS 2022 (Dark)
[Review Badge] Gartner Peer Insights (Dark)
G2 Leader