Traditional safety approaches in aviation used to start with smoking holes in the ground. Once we had an accident, we analyzed the accident and developed corrective actions to prevent such an accident from recurring. The problem with this approach was that it required an accident before we could recognize a problem to solve. Aviation has evolved to an industry constantly seeking a better way to solve safety problems without waiting for the smoking hole.
SMS, or Safety Management Systems, is the buzzword on everyone’s lips; but what is SMS all about, anyway?
People have lauded SMS as the next great safety system. If it is implemented correctly then it has the potential to create a safer environment than traditional quality management systems because it is focused on proactive identification of hazards. This means that it doesn’t wait for a problem to find the solution … instead it identifies future potential issues and proactively mitigates them before they become real problems.
SMS is a system for managing a company’s compliance with the safety regulations of the relevant regulatory authorities. This makes it seem like a quality management system. While SMS does have some things in common with a quality management system, at its roots it is very different. It is important to understand this “root difference” when implementing SMS because misunderstanding it means that the implementer could miss many of the key benefits of SMS.
SMS, in aviation circles, is composed of four “pillars.” The four pillars are: 1. Safety Policy and Objectives; 2. Safety Risk Management; 3. Safety Assurance; and 4. Safety Promotion.
I like to focus on the second pillar: Safety Risk Management.Safety Risk Management has three elements: Hazard Identification, Safety Risk Assessment, and Safety Risk Mitigation.
The most important differences between SMS and other quality systems is hazard identification. It is normal in quality systems to identify occurrences, perform root cause analysis and apply corrective action based on the root cause analysis. SMS takes this one step further by using processes to identify all of the possible hazards that could arise. Ideally, they are identified before they can become occurrences. This includes hazards that are adequately mitigated, today. These hazard identification processes can be quite resource and time intensive, and they are often ongoing processes, as the company continues to supplement its database of possible hazards.
A reasonable method for a repair station to approach hazard identification – a method that can focus hazard identification on the processes most likely to yield hazards – is to begin with the processes contained in the repair station’s manuals, and then continue with the specific maintenance processes (usually from manuals) that are most often used in the facility. For example, in an engine shop this might be the overhaul manual for the engine that the facility most often handles. In each case, divide the processes into manageable chunks and analyze them. There are many ways to analyze the processes for hazards. One formal mechanism is a hazard and operability (HAZOP) study.
As you identify hazards, think about what can go wrong with each element of the process you are studying. If you are examining a cleaning step, then what happens if you skip the step? What happens if you use the wrong cleaning chemical? What happens if you apply too much or too little of the cleaning chemical? What happens if you apply the cleaning chemical using the wrong applicator (such as an abrasive applicator)? What could happen if the technicians are inadequately trained? These are the sorts of questions that a facilitator asks in a formal hazard study of cleaning processes.
As hazards are recognized, they should be documented in a centralized and comprehensive hazard log. The hazard log can be a database that is tied to mitigations (corrective actions). Among other benefits, such a log allows repeat hazards to be recognized and mitigated as a group. This allows a better assessment of the success of risk mitigation activities. The hazard log can also help to organize mitigations (including those already implemented) in order to build a successful safety assurance program (including auditing of the mitigations to ensure they are in place and working as expected).
It is normal to use a taxonomy – a tree-like structure of classifications – to group hazards together. The taxonomy will have high-level categories (like “Maintenance Instructions” “Human Factors” etc.). Below these will be additional sub-categories. For example, the Maintenance Performance Category might have a sub-category like “Tooling” and that might, in turn, have sub-sub-categories like “calibration,” “tool missing,” “wrong tool.” A robust taxonomy allows similar hazards to be identified together in order to mitigate their risks together, and to look for trends. For example, a single tool that mysteriously does not have calibration records might lead you to send the tool out for calibration. A series of tools that do not have calibration records might lead you to look for a more systemic issue, like calibration records being removed by well-meaning cleaning staff!
A robust taxonomy also makes it easier to track similar hazards for precursor data that might suggest the onset of contributing factors that could affect (or effect) the hazard. A set of hazards with a uniform cause (even if it is not the “root cause”) might be corrected through a single mitigation that targets that identified causal factor.
In other cases, grouping similar hazards together can help to recognize and document common corrections that already exist and that are already mitigating the risks posed by the hazards. For example, an inspection might be catching potential hazards and preventing those hazards. This could be identified as a mitigating action for each of those hazards. If this is the case, then the inspection needs to be identified in the hazard database as an important mitigation related to each of those hazards. Using the database to research a future decision to potentially eliminate the inspection should reveal that the inspection is an important mitigation related to a number of hazards; before eliminating the inspection, the facility will want to ensure that each of the hazards is adequately mitigated using other mechanisms. By creating our hazard database, we are taking the first step in creating a tool that will help to manage safety throughout the life cycle of the business.