Root Cause Analysis During Incident Investigations
Posted September 04, 2020
The Occupational Safety and Health Administration (OSHA) and the Environmental Protection Agency (EPA) urge employers, owners and operators to conduct a root cause analysis following an incident or near miss at a facility. A root cause is a fundamental, underlying or system-related reason why an incident occurred that identifies one or more correctable system failures.
By conducting a root cause analysis and addressing root causes, an employer may be able to substantially or completely prevent the same or a similar incident from recurring.
OSHA Process Safety Management and EPA Risk Management Program Requirements
Employers covered by OSHA’s Process Safety Management (PSM) standard are required to investigate incidents that resulted in, or could reasonably have resulted in, catastrophic releases of highly hazardous chemicals. Similarly, owners or operators of facilities regulated under the EPA’s Risk Management Program (RMP) regulations must conduct incident investigations.
During an incident investigation, an employer must determine which factors contributed to the incident, and both OSHA and the EPA encourage employers to go beyond the minimum investigation required and conduct a root cause analysis. A root cause analysis allows an employer to discover the underlying or systemic—rather than the generalized or immediate—causes of an incident. Correcting only an immediate cause may eliminate the symptom of a problem, but not the problem itself.
How to Conduct a Root Cause Analysis
A successful root cause analysis identifies all root causes—there are often more than one.
A root cause is a fundamental, underlying or system-related reason why an incident occurred that identifies one or more correctable system failures.
Consider the following example: A worker slips on a puddle of oil on the floor and falls. A traditional investigation may find the cause to be “oil spilled on the floor,” with the remedy limited to cleaning up the spill and instructing the worker to be more careful. A root cause analysis would reveal that the oil on the floor was merely a symptom of a more basic or fundamental problem in the workplace. An employer conducting a root cause analysis to determine whether there are systemic reasons for an incident should ask the following questions:
- Why was the oil on the floor in the first place?
- Were there changes in conditions, processes or the work environment?
- What is the source of the oil?
- What tasks were underway when the oil was spilled?
- Why did the oil remain on the floor?
- How long had the oil been there?
- Was the spill reported?
It is important to consider all possible “what,” “why” and “how” questions to discover the root causes of an incident.
In this case, a root cause analysis may have revealed that the root cause of the spill was a failure to have an effective mechanical integrity program in place that would prevent or detect oil leaks. In contrast, an analysis that focused only on the immediate cause (a failure to clean up the spill) would not have prevented future incidents because there was no system to prevent, identify and correct leaks.
Properly framing and conducting a root cause investigation is important for a PSM- or RMP-related incident. Take, for example, an incident involving an overfill and subsequent leak of hydrocarbons from a relief valve system that ignites and kills multiple workers.
Prior to this fatal incident, there were multiple flammable releases from the relief valve system, but none ignited. The employer previously performed incident investigations on the non-lethal incidents and determined that operator error was the cause of the overfill and subsequent leaks. However, a proper root cause investigation would have looked deeper into the incident, and determined that funding cuts—which resulted in a deficient mechanical integrity program and malfunctioning instrumentation—led to a dangerous situation that operators could not have prevented. Had these root causes been previously identified, the employer could have taken action to improve the mechanical integrity program and repair the instrumentation system, preventing the fatal incident.
Root Cause Analysis Tools
Below is a list of tools that may be used by employers to conduct a root cause analysis. The tools are not meant to be used exclusively. Ideally, a combination of tools will be used, such as the following:
- Logic or event trees
- Timelines or sequence diagrams
- Causal factor determination
For simpler incidents, brainstorming and checklists may be sufficient to identify root causes. For more complicated incidents, logic or event trees should also be considered. Timelines, sequence diagrams and causal factor identification are often used to support the logic or event tree tool.
Regardless of the combination of tools chosen, employers should use these tools to answer four important questions:
- What happened?
- How did it happen?
- Why did it happen?
- What needs to be corrected?
Interviews and reviews of documents, such as maintenance logs, can be used to help answer these questions. Involving employees in the root cause investigative process—and sharing the results of those investigations—will also go a long way toward preventing future similar incidents.