In this article, we've got an interview with safety professional Joe Estey to learn a whole bunch about incident investigations and looking for root causes. This article focuses on explaining what an incident investigation is, how to perform one, and some common errors people commit while performing one.
In addition to this article, we've also held an interview with Joe on the search for root causes while performing a root-cause analysis during the incident investigation process.
We're excited about and grateful for Joe's participation in this interview, and think you'll enjoy learning from his knowledge and insights about incident investigations.
You might also be interested in a few other incident-related articles, including:
Finally, we've also had a chat with Joe about performing pre-task pre-mortems, discussions just before performing the job that help to avoid incidents and reduce the need for incident investigations.
Let's get right to the interview about incident investigations.
OK, great. I started out as a utility operator involved in managing water, electrical, and radio-chemical systems, which has to do with the production of things like uranium and plutonium, and then I was upgraded for a time to a production operator, lead operator, and then I entered the world of management for the Westinghouse Electrical Corporation. Part of my duties were to understand events that may have happened on my shift, or within our plant, and then troubleshoot, repair, and offer a diagnosis, and mainly I did that early in my career through trail by fire, so we would through the process of elimination, who was the likely culprit, who could we blame and then move on to continue the operations with, and so I didn't get a good rearing in that area.
That came when I was assigned by the Westinghouse Electric Corporation to be on the national team to look at pollution prevention, waste minimization, and industrial ecology opportunities within their divisions, so they took me out of the plant and they put me at a division level to go and work with teams who would look at the use of resources, the way they were processed, and the waste that was being generated. And I started learning investigations tools, analysis tools, and remediative and preventive methodologies to try to optimize the process by understanding why things were happening and I am trying to mitigate the consequences of the wrong things being done.
An incident investigation is an attempt to understand and learn from the unexpected. So an incident is defined as anything you didn't want to have happen in your process or anything that can be improved from the way things are currently being done. People routinely confuse that with consequences. So it is far better to be curious about the practices you are using to obtain an outcome and figure out where you might improve those than it is to wait for a consequence to draw your attention.
Sure. Somebody can be doing a job or executing a task in a certain way, and there may be nine different ways to do that. Some of those ways can lead to injury, even if they haven't happened yet, or damage a product, even if that hasn't happened yet, or be less productive than optimal. So if an organization has not defined a standardized process that reduces injury, improves quality, and optimizes performance, really those are incidents. If you are working outside of the best that you can be doing, you've got an incident.
Unfortunately, too many organizations continue working outside those optimum work conditions and they wait until the worst thing happens, and then they go back and fix why it happened at that time. So incident investigation really is being being curious enough about your process to see how you might improve it.
[Note: The importance of an incident investigation is explained in the sample below from our online incident investigation training course.]
Exactly. And primarily the reason is that you've already had the thing you didn't want to have happen happen, based now solely upon an outcome, when you could have been studying your process the entire time, to determine that outcome could have been prevented in the first place.
Number one, people have to be trained within an organization to recognize whether an incident has occurred or not. Otherwise, once again, they are limited to only following up on outcomes they didn't want. So if an organization detects that "hey, this is something that is outside of expectations, a job that should have taken two hours took four, we got 10 widgets out when we should have had 30, or somebody nearly missed getting injured, we still have something worth looking at.
So once an incident notification is made, the next step is how much effort are you going to throw at investigating it? It can be a person who is trained to investigate the incident, it can be a small work team, so how much are we willing to spend to try to keep this from happening again or escalating?
So number one, you've got notification, and number two, you determine how many resources you're going to throw at it. Now obviously, if it's a severity, maybe you've got an environmental release, a significant safety issue, you've already got that answer. You know you've met the threshold that deems this incident worth of investigating. And a lot of organizations I work with did not have at time time but now have in their possession what they call a threshold table. So they have defined for their organization and their people the events that constitute when an incident investigation is required, instead of just saying "let's wait until it gets as bad as it can be until we do it." So somewhere there's a table, they've had this thing happen, they go to the table, the table says if you've had a release or you've had property damage that costs this much, or you've had a first aid case but not an OSHA recordable, this is the level of effort that you need to use to study it. So the level of effort is defined by the threshold.
And then the next thing is there are a couple of real fun "don'ts" along with some "do's" in just the next few minutes following in incident.
The first is to not have anyone who hasn't been properly trained conduct an interview, except to gather only the facts and not to read into those facts and intentions. Because that's what happens with an untrained interviewer. So normally, if you're in a processing plant or a construction site, the first person who responds is a supervisor or a working lead, and they haven't always been properly trained to perform an interview that is unbiased in its approach. They might walk up to an an individual and ask "What were you thinking? Why didn't you pay attention when you were exceeding the safety limit? Why didn't you catch it?" There are all kinds of really poor questions that can be asked. What happens is you get one shot when an incident has happened to to do a proper interview, you won't get two and three, because the minute the wrong question has been asked, the individual involved in the incident is going to put up defenses, they are going to try to recalculate what they should have done versus what they actually did, they're going to come up with a story that seems plausible, because they're going to try to make sense of the incident as well, and so bottom line is you won't get a mulligan if you fail to do a proper interview the first time.
So if an organization doesn't properly train the first-line folks who will do the interviews, the best they can do is teach them the SWIM Principle, which simply is to have them STOP, WARN others of the incident, ISOLATE the incident to make sure it's not going to escalate in some unreasonable or inappropriate fashion, and then simply MITIGATE what they can until a person who has been trained to do the interview can come in and do it.
Then once we've got the SWIM acronym in, then the next biggest thing is to preserve the scene. If there is a single biggest error following improper interviewing, it's not preserving the scene. So, somebody had an incident with a forklift because the masts were about half-staff high, the driver couldn't see around the load so they were using a spotter, the spotter was on the incorrect side of the load, so they couldn't see the other side, and the bottom line is they ran into a car, ran into a piece of equipment, or ran into the side of the building. And then, when the incident investigators shows up, all of it is neat and tidy, put away, swept up, forklift is back in the service bay, and now you've only got the interview to use as a guide to what happened, and trust me, the interviewee's story is not the best evidence ever.
So proper interviewing and scene preservation, those are the first two things you want to make sure you get right in the incident investigation process.
Oh, absolutely. For instance, let's say in a paper mill, that's a continuous process, and something at the business end of that machine, the dry end or where they are cutting up the jumbo roll or the winder, let's say someone unfortunately put their hand where it shouldn't have gone, and you don't want to shut the process down, because (and I hate to say this) it might cost a fortune to start up again, and you could end up with a real mess on your hands, and yet you have to treat the individual, you have to take him to triage, and people are going to keep working. You know, there's that mentality, it's something bad, it's a significant event, and yet we've got to keep working. And the more time that passes between the incident and the story that you've asked them to tell about it, the more perishable the information. So while they're running, while they're restoring activities, there is a very delicate balance, and there's a tension there because the longer we wait, I have seen a story change from a Friday, where an incident occurred mid-afternoon, and we were there two hours after the event to talk to people, and then by Monday, a lot of details about the story were let's say created over the weekend in the minds of the individual so it didn't even sound like the same story on Monday morning. The facts were not the same.
And there are a couple of big caveats there. Number one, the stories of the interviewees change, first of all if we make them feel like the object of the investigation, which is always improper, instead of making them feel like part of the process to fix the system, you're going to get the wrong result. So the axiom is never treat anyone involved in an incident as the object of an investigation, and always treat them as part of the process to fix it.
And the second one is, people will always try to make sense of an accident, when by definition, most accidents don't make sense. If they were using the procedure, and if the equipment worked the way it was supposed to, and if they were applying their training, you wouldn't have had the accident in the first place. So there's something nonsensical about every incident, and as some experts in that field state, the worst thing we can do is try to line things up so it makes logical sense. Because it shouldn't--you're going to find things that are going to find things that are going to surprise you.
That's a great point, and there are three ways of doing it: internally only, externally only, and a hybrid approach, where there's a mix of internal working with external. The internal folks can do a real good job because they understand their systems, they understand what should have happened, they understand the expectations, they know how to measure the outcome, and they have a history there.
The downside obviously is that they may be, as the old saying goes, too down in the forest to to see the trees, and might be unwilling to change any of their practices, so with an internal-only kind of focus, it will be more about preserving the status quo with the idea being "we know we have a great system, have great procedures, and we have great people, this is a one-off or anomaly, let's look for all the possible anomalies and not consider anything about our system being in error. So that's the downside: "Why would I change my practice when we haven't had this problem in six years?" And yet, they've only been lucky in not experiencing a consequence for six years, they've had the problem every time they did the job, and in the past, the system compensated for it, it just didn't fester until it caught up to them. So that's the internal only.
Externally, the upside of that is you don't suffer from "The Boomerang Effect," and The Boomerang Effect is: if I find it, I have to fix it. So if internally, I'm the finder and the fixer, trust me, I'm not going to find the problems that are going to take me a lot of effort to fix. Now, if I'm external, I don't care about that. I'm going to walk in and I could care less about what this is going to cost you to fix, because this is your problem and here's how you have to change your process. So externally, I probably will be more unbiased.
However, there are external consultants who have a bias themselves, that they always or in many cases gear the investigation towards. Especially if they are proprietary in selling a product, like training, a checklist, an automated software package. You know, if I'm from a work-planning consultancy, and you're having a problem with work execution, and I have a process like Phoenix or Maximo or one of the other proprietary softwares, you bet that my solution is probably going to be using my system. And so that happens.
The hybrid is usually a good one, when you've got an (external) person who doesn't have anything proprietary that they're selling, whether it be continued training, or checklists, or safety observation programs, they don't own any of that. Or if they do, they don't see it as a prism to see the incident through. And then working as a team with the internal person, they can ask questions that the internal person wouldn't ask, and the internal person can provide the history that an external person wouldn't understand.
Almost always in hybrid. It's a rarity I'm alone as an external third party, unless there has been a contractor-driven process and then a state regulatory enforcement process, and they are in disagreement. Now what they're looking for is a third party to objectively look at all the evidence and then propose a report, and then they see it, and in that case I never look at either of their work products before I do my own investigation, because I don't want to be biased going in.
Ah, that's a great question, because that really is where everything starts. They should always communicate to their employees that the incident investigation process is part of a continuous improvement process, and in order to understand why something happened, the defenses or mitigations we were using at the time, which should have prevented it from happening, so that we can improve those defenses to keep it from happening in the future. And so we are in this together to try to understand, because nobody wants any of these incidents to occur at any level in the company. And there support is needed to stop when they recognize there might be an incident, even at the lowest level possible (a near miss or close call), to tell somebody about the incident in terms of the action, that this isn't a hunt for intentions, it is never about why we think you did it, all we want to understand is what happened at the time, and then to work with the employees who are paramount importance in understanding what corrective actions can be put in place so they're not more burdensome or unreasonable in order to get the work done.
Question: OK, good answer. And the last of the questions I had posed for our first blog post, and then we'll move on to the second, is, in your experience, what are some of the most common errors you see people commit when they're conducting an incident investigation?
Number one is improper interviewing--treating them like the object of the investigation instead of the process. Number two is asking, and this is a big one in interviewing, asking WHY? questions about their behavior instead of WHAT? and HOW? questions about their actions. And that's a big one.
OK. It's very common for people who have not been trained to ask something like "Well, why did you miss that step in the procedure," or "Why didn't you put that machine guard in place?" Number one, if they knew why they did it, they wouldn't have done it, unless it was a deliberate violation, and if it was deliberate violation, they're not likely to tell you the truth anyway. And most incidents are not caused by deliberate violations. The history of reporting tells us that less than 2% have to do with a deliberate, intentional violation, and the majority have to do with honest, unintentional mistakes.
But if you set the stage for me to believe I'm being investigated, rather than being part of the investigation process, I'm going to think you're looking at me as a culprit. And so, even though I know it was an honest mistake, and I didn't mean to do what I did, my answers to you are going to be cloaked like they would on a witness stand, and you'll never know. The NTSB, the National Transportation Safety Board, has a saying that if a pilot survives a crash, you may never know why it actually happened. And as tragic as that sounds, it's because the story that is being told may not match the physical evidence that is unfolding.
So that's an error. We don't ask WHY? questions. We wouldn't ask "Why did you miss a step in the procedure?" A better question would be, "Hey, how do you use the procedure in this process?" Or, better yet, can you just walk me through your process, and then you let them explain how they do those things.
And then the next mistake is not properly recording the information, so what happens is you have to come back two or three times to the same individual, and about the second or third time, there's what we call "The Proctoring Hazard," which means the individual begins to believe you're only there to test them like the proctor for an exam, as to what they told you the first time. You're not really there to get information, no matter how much you say that ("Hey, you know, I missed a few things," or "I didn't record that the first time"), they're thinking, "OK, you're trying to test me like a defense attorney or prosecutor on the stand, you're trying to see where I went the errant way, and so that's a common error, is not properly recording the facts at the time that you're gathering them.
You know, there is, and it has to do with your own bias. You know, the biggest danger for anyone in an incident investigation is if they have done what they are investigating as part of their vocation, it's very hard to separate your own story from the story you're supposed to be learning. And so when I was an operator, if I go back and have to work for a manufacturing plant or an operations plant, I continually can hear myself fill in the blank of the story that the individual is telling me, and I might bias the investigation with thoughts about what they should have been doing, what they could have been doing, instead of what they were doing. And that is a big one-the bias errors are a tough one.
And in the bias errors--like the confirmation bias, where I'm only going to look for evidence that supports what I already believe; the availability bias, which is I only have available to me the things you share with me, so I will not know how to look at other things, so you know, I have available to me your work package, your procedure, your training records, but maybe there's something else I need to be looking at, but you have not made them available to me. And then the primacy bias, which is an interesting one, and that is, if early in the investigation process, I begin to hear the same thing about people or the organization, like "they always put safety above production" v. "they always put production above safety," the first few times I hear that, it begins to bias me the entire time to only hear those similar patterns or messages the rest of the time. And maybe somebody says something that should have caught my attention, but it didn't, because it doesn't match my pattern. So we call that the primacy bias.
We'd like to thank Joe Estey for his time, knowledge, and insights on explaining more about incident investigations, including what an incident investigation is, how and when to perform one, who to involve in performing an incident investigation, some common errors that occur during incident investigations, and more.
If you need some help with incident investigations at your work, you may be interested in our Incident Management Software, explained briefly in the video below.
If you need help training employees about incident investigations, you may find our online incident investigation training course helpful. You guessed it--that's a short sample video below.
For even more information about incident investigations, you may find these articles helpful:
Here's a little more about Joe.
As Principal Performance Improvement Specialist for Lucas Engineering and Management Solutions, Joe Estey mentors and trains executives, managers and front line workers from a variety of industries on Human Performance Improvement and Leadership. Clients include national research and development laboratories, manufacturing plants, construction and demolition sites and one of a kind/first of their kind production facilities. As the recipient of three National Awards from the White House Executive Leadership council for his work in public outreach and education, he frequently speaks to public agencies, corporate and small businesses across North America.
Serving as one of six members on the National Board of Directors for the Human Performance Root Cause Trending Organization (HPRCT.ORG), Joe works with principal investigators, managers and analysts from fields as varied as aviation, pharmaceutical, medical, manufacturing and power generation to implement best management practices for reducing the frequency and severity of human error.
His book, The Tomorrow Tapestry, Life Woven on the Fabric of Change, was one of Publish America’s top ten business books in 2008 and has been used in leadership and organizational development courses throughout North America.