I have had the pleasure of working in a number of unique engineering environments and for the most part they all performed Failure Modes and Effects Analysis (FMEA). Whether it was a Systems FMEA (SFMEA), Design FMEA (DFMEA), Process FMEA (PFMEA), or Failure Modes, Effects, and Criticality Analysis (FMECA), they executed and delivered them in good faith to reduce risk. The question is, did it bring them value? I will take you through several examples of how they were used and how they brought value to those organizations. Then, I will present how I perform an FMEA, the thinking behind that process, and an example of how it brought value.
As I walk you through this, there are a couple of key points about FMEAs that are key to my thinking that I must state up front. First, the goal is to reduce the risk to the end user. The effects that we are looking to tie to a severity are related to what happens when the failure occurs while the product is in operation. Important considerations here are: Who are the users? What are the operations? What might they interact with?
At one of my employers, I was surprised by some of the items identified as functions in the DFMEA. Examples of these were: the product shall be white, the product shall fit in the vehicle, and the product shall be RoHS compliant. These do not represent functions. I had been taught that the functions in an FMEA should represent the actions the component under analysis was to perform when in operation by the end user. While none of the items identified suggest functionality, they certainly were constraints on the product design. When I inquired with the team about managing these in a requirements management tool rather than the FMEA, they did not know what I meant by requirements management tool. It was clear then that this was their requirements management tool.
I was also shocked by how they had rated the severity of the effects (7,9,9 respectively). I could not see how these severity values were even remotely possible in operation. It appeared obvious to me then that they were identifying risk related to delivery of the requirements. The severity of a 9 for being able to install into vehicle would certainly be appropriate as a disastrous failure for product development. While it may not have been how I would use an FMEA, the value for this team was minimizing risk to delivering on requirements.
At another company, I noticed that the failure modes were not very descriptive. They were as simple as “failed open,” “failed short to ground,” and “failed short to power.” While these failure modes were consistent with the way that individual components could fail in the design, you could not directly see the functionality of the component and how it would be affected. You had to have an in-depth knowledge of the system to identify what effect the failure would have. Someone else had to go and fill in the information about the effect and severity in order to determine the risk. This always seemed to get done at the last minute before product release.
I realized that while there were no severity numbers in the FMEA, there were always detection actions in place and tied to verification and validation activities. While this may not have identified the risk to the field, it did identify whether the component failure mode was able to be detected during testing. For this employer, the value was in ensuring that there was a test to catch every possible component failure mode.
My last example is from an exercise that I was pulled into. I was asked to review a PFMEA on a process that was measuring the weight of the product to determine if the product was acceptable to continue the manufacturing process. I jumped right in and started asking questions about what happens in the field if the product weighed too much or too little. I found myself with people looking at me strangely. They did not understand why I was concerned about how it failed in the field.
When I started to look at the details of the PFMEA, there were no effects that had been tied to the product in use. The effects were all related to business impacts or overall plant objectives. The one that surprised me most was that the process had rejected a good product. This was rated with a severity of 9. While it did not make sense to me, it was a way of managing risk to cost associated with manufacturing. A good product that did not ship has no risk of issue in the field because it never got to the field. But, it can have a huge impact on the business value of the product. The value for this team was to avoid cost associated with rejecting good parts.
The Potential Value
While these examples all show that you can apply an FMEA for risk management in many ways, I find that the greatest value in utilizing it is as a systems engineering and/or systems thinking exercise. I might be biased here as a systems engineer, but maybe that’s because I am trying to use the tool to help manage the risk of the unknown.
The first step in an FMEA, for me is to identify the functions of the system/subsystem/component. A methodology for doing this is to establish a black box view of the system with the input and outputs of the item clearly identified. In addition to this, the external entities to which the inputs and outputs are connected need to be identified. Focusing on the outputs when thinking of functions is essential as these are the things that impact what happens outside of the system.
Once the functions have been identified, questions about the behavior of the system should be asked. What does normal operation look like? What does abnormal operation potentially look like? It is the potentially abnormal behavior that needs to be the focus, because it presents risk to the end user. The more potential abnormal conditions considered, the more coverage for preventing risks. For systems engineers, it is a fundamental of the job to identify and prevent negative emergence (failure modes).
Once the failure modes have been identified, an understanding of the impact needs to be established. How are others reacting to the abnormal behavior? These are the effects. Assigning the effects with a severity allows prioritization of further analysis.
The design then needs to be evaluated taking into account the most severe effects. Looking at the design, the question becomes: In what way could the design fail and cause that behavior (failure mode)? The process intends for us to look at the design as a whole and identify the potential causes. Too often, people look at the design and say the component can fail in this way, and that is then associated to the failure mode and effect. You might be wondering if it matters. The answer is yes.
In the development of DFMEAs for electronics, I observed that a typical methodology was to list the components as the functions and define the failure modes of the component. The evaluation of the component failure modes led to a limited set of effects. Where electronic circuits can have very dynamic behaviors, pure component failures cannot tell the whole story.
The following example is one I took from my involvement in a project on a closed loop current control circuit.
I led a team of electrical engineers through an FMEA analysis on a hydraulic control module and used the process as described above. While we still identified the typical component causes of failure, we injected a number of new causes, failure modes, and effects not previously considered. A cause that had not been identify in previous FMEAs was a failure in the design of the printed circuit board (PCB). It was the failure mode of erratic current control and asking the question about possible causes in the design that led us to look at the PCB that we might otherwise have overlooked.
Unfortunately, we had a failure in verification. Because of the work on the FMEA, we were able to find the cause within an hour of notification of the failure. When we reported the issue to the customer that same day and demonstrated that we had identified the cause, they accused us of hiding the failure from them until we understood the cause. Fortunately for us, date and time stamps of test data cleared us.
When we reported the true root cause of the failure, it turned out to be an engineering process issue. We had failed to follow up on our preventive action and verify the review of this cause in the PCB. While this cause had already been identified in our standard PCB review checklist, it still wasn’t caught. A specific review for this cause was needed.
Now it may not seem like a lot of value due to the fact that there was a failure in verification, but if we had not been able to identify the cause as quickly, how would we have appeared? We had clearly demonstrated the value of a good FMEA to lead to rapid problem solving through good functional connection to failure modes and proper cause analysis.
While this is all good and well, it is not to say that my use is any better than any other use. It is more important that we understand FMEA with the end in mind. The value it will bring to your organization depends on the goal and objective for using it. Will it bring you value? Or, is it just a Dumb Form Making Engineers Angry (DFMEA)?
For more on FMEA, check out this recorded webinar: https://www.vitechcorp.com/failure-modes-effects-analysis-2/