Tag: HRO

  • What it means to tailor a system

    What it means to tailor a system

    A tailor’s shop, somewhere downtown. A back room, two mirrors, a table covered with bolts of fabric, chalk and pins on a board. A man stands on a low platform, in a rough cut of light wool, and the tailor walks around him. He doesn’t just take measurements. He observes. He sees how the customer shifts his weight, how he holds his shoulders, whether the seam at the back pulls to one side. A chalk mark where it doesn’t yet sit right. Then the measuring tape again, then a stitch, then a fitting. Adjust. Try again.

    What’s happening here isn’t fitting a suit. It’s a conversation between fabric, body, and habit. The tailor knows that no human stands exactly the way the pattern assumes. He knows the seams meant to sit centred will shift the moment the person moves. He plans for it. He builds in reserve at points where he knows the fabric needs room to settle. He isn’t surprised when the customer has to come back twice more. That’s his craft.

    What does this workshop have to do with safety? That’s the question this magazine owes its name to. I made the case in the opening article Three Assumptions We Need to Leave Behind; in short: safety doesn’t arise when people adapt to systems, but when systems are designed so they can be adapted to people. Tailoring Safer Systems. Measure, draft, fit, wear, adjust. The cycle repeats, only with different material. And just as in tailoring, it isn’t a one-off act but a stance.

    What this stance asks of us, in concept and tool, I want to lay out here. Three principles, each with a term you’ll recognise from the literature.

    Measure, don’t assume

    The tailor who doesn’t put down the measuring tape knows something that many safety departments treat as needless effort: that reality isn’t in the pattern.

    Steven Shorrock and Claire Williams, in Human Factors and Ergonomics in Practice, frame the distinction that’s been at the centre of the human-factors tradition since Hollnagel so simply that it works as a test. Work-as-Imagined is the picture designers, auditors, and executives have of how work gets done. Work-as-Done is what people actually do. Between them there’s regularly a gap. The question isn’t whether the gap exists. It always does. The question is whether the organisation knows it.

    Whoever doesn’t know it tailors into the assumption. They design procedures on the basis of what their model says. And the model says what’s convenient, what’s legible by audit standards, what sounds executive-ready. The procedure fits the assumption, not the practice. Within a short time, practice and procedure drift apart without anyone noticing, because no one ever measured how the fabric actually hangs.

    What measuring actually means isn’t spectacular. It means observing. It means walking the floor: what the Lean tradition calls a Gemba Walk, and what the safety world circulates under terms like “operational learning visit”. It means shadowing across more than one shift. It means asking questions open enough that they don’t contain the answer: not “Do you stick to the procedure?”, but “When was the last time the procedure didn’t fit your situation, and what did you do instead?”

    These questions regularly produce answers no one wants to hear. People describe workarounds that look like violations to compliance and look like the only way through to the system, on a day when a tool is missing, a stand-in is new, the plant has been moody since the update. The temptation is to read these answers as defect. And to close the case there. The work is to read them as finding.

    Whoever measures accepts what they see. What they see is regularly not what’s in the pattern. That’s exactly why they’re there.

    Measuring isn’t compliance on trial. It’s the willingness to see something that contradicts your own assumption.

    Respect the fabric

    Not every fabric can be tailored any way you like. Treat a soft knit like a firm wool and the seam won’t hold. The tailor knows the material’s properties before drafting the cut, and adapts the design to the fabric, not the other way around.

    Transposed to organisations: context, culture, and history are the material with which a system is tailored. What works in an airline where Crew Resource Management has been embedded practice for decades doesn’t translate directly to an industrial organisation where hierarchies are lived differently and “Stop the Line” still has to be explained as a concept. What takes hold on a ward where the unit lead has built a reporting culture over years runs into nothing on another ward, where every report passes through two layers of HR before anyone gets to see it.

    David Snowden’s Cynefin framework helps at this point. Simply put, it distinguishes between two kinds of problems: complicated and complex. Complicated problems are those where the link between cause and effect can be made visible with enough expertise: a machine, an accounting system, a construction plan. Best practices work here. Complex problems are those where cause and effect are only readable in hindsight, because the system shifts on every intervention. Culture, risk behaviour, learning capacity belong in this category. Best practices don’t work here. What worked in one organisation isn’t guaranteed to work in the next.

    The most common mistake in safety programmes I work with is mixing these two up. A proven concept from a best-practice collection gets sold as a universal solution, draped over an organisation made of different fabric. And everyone’s surprised when the seam doesn’t hold. What the organisation needed wasn’t the solution. It was the diagnosis: what kind of fabric is in front of us?

    Respecting the fabric doesn’t mean finding everything fine as it is. It means checking the cut against the material before reaching for the scissors. Whoever skips that builds a safety programme that fits the quarterly report, not the practice.

    Build adjustment in

    A good cut has give. The tailor doesn’t pull the fabric so tight that it tears at the first breath. He knows the body changes, the day changes, the fabric settles after the first few wearings. He builds that in. Where he leaves room, where he doesn’t, is craft. Eliminate the give, bind yourself to the exact measurement, and you get a garment that fits exactly once. Not the next moment.

    Erik Hollnagel’s work has circled this insight in safety language for years. In FRAM (the Functional Resonance Analysis Method), he argues against linear incident models that read variation as defect. Variation, Hollnagel writes, isn’t the opposite of function. It’s a condition of function. Complex socio-technical systems work because their components (people, tools, procedures) are flexible enough to respond to conditions that aren’t in the plan. When the plan tries to switch off this variation, it switches off adaptive capacity at the same time.

    In practice this means: a good procedure describes not only the intended path, but makes visible the conditions under which it holds. It knows the assumptions it makes, and it knows the places where it will break if those assumptions fail. A good procedure is aware of its limits. More than that: a good system keeps resources free that aren’t tied to the plan (slack in the staffing plan, time in the shift, room in the communication), because without these no adjustment is possible. What looks like inefficiency is the precondition for the system to make it through the day on which reality departs from the plan. And it departs. Every day.

    Building adjustment in means giving the system permission to adjust. Not afterwards, in the case of damage, but beforehand, in the design. It means shaping room deliberately rather than tolerating it by default. And it means making visible what otherwise stays hidden: that the workarounds no one admits to are often the last adjustments an over-standardised system still allows.

    What tailoring isn’t

    Off-the-rack sits in the warehouse waiting for someone it fits. It’s efficient, it’s cheap, it’s clean in the reporting. It’s a complete solution as long as the measurement is right. When it isn’t, it becomes the source of a quiet compromise: the person adapts to the suit, holds their shoulders differently, breathes shallower, moves as if they belonged in the pattern. For a while, this goes well.

    Safety from the compliance catalogue works on exactly this logic. It comes with finished procedures, standardised KPIs, audit templates that fit everything because they look at nothing. The problem isn’t that it’s structured. The problem is that it takes its own description of the system for the system. When reality departs from it (and it does), no adjustment is provided for in the catalogue. What remains is the admonition to please stick to the procedure.

    In contrast stands the tailor who doesn’t put down the measuring tape. Who knows he’ll have to come back twice. Who respects the give in the fabric. Who doesn’t finish the cut today, but draws it in conversation with what’s in front of him. Who accepts that the end product isn’t perfect on the first attempt, and that adjustment is part of the craft, not an admission of error.

    This is what Tailoring Safer Systems means. Shape room rather than eliminate it. Make adjustment visible so the system can learn from it. This is harder work than a dense catalogue. It’s also the only thing that works under conditions where the next measurement is already a different one.

    Sources

    • Steven Shorrock & Claire Williams (Eds.) – Human Factors and Ergonomics in Practice: Improving System Performance and Human Well-Being in the Real World, CRC Press 2017
    • Erik Hollnagel – FRAM: The Functional Resonance Analysis Method – Modelling Complex Socio-technical Systems, Ashgate 2012
    • David J. Snowden & Mary E. Boone – A Leader’s Framework for Decision Making, Harvard Business Review, November 2007
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
  • Three Assumptions We Need to Leave Behind

    Three Assumptions We Need to Leave Behind

    It is the night of 28 March 1979, shortly after four in the morning. In the control room at Three Mile Island, Unit 2, a light is on: pressure relief valve closed. The light says that because it doesn’t measure position. It displays the control signal, the command that was sent to the valve to close. What the valve is actually doing, nobody in the room knows. It has been open for two minutes and thirteen seconds, and it will stay open for the next two hours.

    In the hours that follow, the operators will do something that the later investigation will identify as the primary cause of the partial meltdown: they throttle back the emergency cooling. They do it because their instruments tell them the system is over-pressurised, and because their training has taught them to avoid exactly that condition. They act rationally given what they see. In the days that follow, the press will speak of “human error.”

    This reflex (the diagnosis of “human error” that follows a scene like this almost automatically) sits behind most of the safety conversations I have in consulting practice. Not because those involved are unwise. But because three assumptions are so deeply embedded in our safety tradition that they pass as common sense. We read them differently. What follows are three counter-positions, one per assumption.

    “Human error” is a diagnosis, not a finding

    Anyone working in this field knows the statistic: 80 to 90 percent of all incidents are attributed to “human error.” The number has been cited since the 1980s in talks, audits, executive reports, and it works: it makes plausible that the answer to safety problems must lie with people. More training, clearer standards, stricter discipline. The logic is clean: if the problem sits in the cockpit, the solution must sit in the cockpit too.

    The problem with this logic isn’t the statistic. It’s the interpretation. Sidney Dekker puts it in his Field Guide so sharply it hurts: “human error” is never the end of an investigation, it is the beginning. Whoever explains incidents this way has stopped asking: they have found a label and settled into it. Local rationality, the concept Dekker keeps sharpening, says: nobody comes to work intending to take a reactor into meltdown, harm a patient, or bring an aircraft down. What looks like failure from the bird’s-eye view of an investigation made sense at the moment of action, given what the person could see, given the pressure, given the training.

    Reconstructing that sense is the actual work.

    Hollnagel adds a second thread. His Safety-II argument runs, simplified: the same thing we call “failure” is the other side of an adaptive capacity without which the system wouldn’t function for an hour. People accomplish daily what procedures cannot accomplish on their own: they interpret context, they improvise when reality diverges from the script assumption (which it does constantly), they fill the gaps that designers and rule-books have left open. Whoever treats people as a weak point cuts themselves off from the only real source of resilience the system has.

    Back in the TMI control room, read through this lens: the operators throttle the emergency cooling because their instruments say the system is over-pressurised, and because their training has sensitised them to exactly that risk. At the moment of action, their decision is the only coherent interpretation of the data available to them. That we know today the valve was open and the system under-pressurised rather than over: that is information of the investigation, not information the operators had. This asymmetry between investigator and actor, “hindsight bias” in the research vocabulary, is not a methodological cosmetic flaw. It is the structural condition under which every incident investigation operates. Whoever doesn’t reflect on it sees in every past what those involved could have done. And overlooks what they actually could see.

    In training sessions, I now routinely ask participants: what is the most frequent cause of incidents and accidents in your operation? The answer comes every time, without exception: human error. It comes fast, it comes self-evidently, and it comes before the actual work of the training has begun. Over the hours that follow, there is regularly a moment when something dawns on the participants. And it isn’t a new term, no additional tool, but a shift of perspective: their own incident investigations, as they themselves recognise, have ended exactly where they should have begun. What that costs isn’t only a weaker investigation. It is the willingness of employees to report anything at all next time.

    The question that interests us more than “How do we prevent human errors?” is this: How does our system support the adaptive work people have to do for it to function at all?

    Human error is never an explanation. It is a diagnosis that says more about those diagnosing than about the incident.

    Compliance is a minimum, not safety

    The second assumption follows the first like a shadow. If people are the risk, then regulations, audits, and certifications are the instruments of control. Safety becomes a question of whether the right boxes are ticked. Executive teams read safety KPIs (lost-time injury rate, audit findings, training completion rates) and draw conclusions about the state of the organisation. The governance is clear, the reporting is clean, the responsibility is distributed. There is a reason this model survives so robustly: it interfaces well with law, insurance, and corporate reporting.

    The model has just one problem: compliance and safety regularly come apart. Boeing’s 737 MAX held FAA certification, a compliance status that was green by every auditable measure. And an MCAS system whose malfunction cost 346 people their lives. The Bristol Heart Scandal of the 1990s revealed a hospital whose internal safety indicators showed no clear anomalies, while paediatric cardiac surgery mortality had climbed to twice the British average. In both cases the signals were reported, by insiders no one wanted to listen to, because the compliance picture was clean.

    What happens between the audits is the actual safety story. Diane Vaughan, in her study of the Challenger disaster, coined a term for it: “normalisation of deviance.” Drift rarely arises as deliberate rule-breaking. It arises because, under real conditions, the system gradually departs from the norm (a small tolerance here, a step shortened in time there) and because these deviations mostly turn out fine. Every repetition without consequence widens the bandwidth of the acceptable, without anyone ever having made a conscious decision. From the audit perspective, this drift is invisible: on audit day the picture aligns again, because everyone knows what to show. From the perspective of learning capacity, it would be visible, if the organisation had the mechanisms to see it.

    What these cases share is not a compliance failure. It is a learning failure. Compliance is a property of a moment: it says that at time X rule Y was being followed. Safety is a property of a process: it says that the organisation is able to pick up weak signals, revise assumptions, and correct its own behaviour, before the next audit date enters the stage. The one is a state, the other is a capability. An organisation can be fully compliant at any given moment and at the same time completely blind to the drift it is in.

    The operational question that follows from this is not “Are we compliant?” It is: Do weaknesses become visible without being punished? Are near-misses treated as learning opportunities, or as reputational risks? Does the system get smarter after every incident, or just more defensive? Just culture, in the precise sense of Reason and Dekker, is the precondition. It is not the poster in the break room.

    It is the lived answer to what happens when someone admits something they could have kept quiet about.

    Standardisation creates brittleness, not resilience

    The third assumption is the most stubborn, because it speaks most directly to the safety reflex. When something goes wrong, we raise the level of standardisation. We write the next step into the SOP, we narrow the latitude, we formalise what used to be a matter of experience. The underlying assumption is clean and mechanical: variation is defect, uniformity is safety. What does not behave deviantly cannot go wrong.

    The assumption holds for simple, linear systems. It does not hold for the systems we deal with in HRO-adjacent contexts. Erik Hollnagel uses a precise word for the consequence of this reflex: brittleness. An over-standardised system loses the capacity to adapt to conditions its designers did not anticipate. It functions exactly as long as reality follows the script. And reality never follows the script all the way. The moment deviation arrives, the system has no reserve, no improvisational capacity, no repertoire other than “continue as planned.”

    What the HOP movement around Todd Conklin and others has been showing since the 2010s is banal and consequential at once: every functioning shift deviates from the script daily. Nurses combine orders that formally were not designed to be combined, because the original procedure does not fit the specific situation. Industrial operators put in small workarounds because a tool is missing or a step under time pressure has to be skipped. Pilots interpret checklists in an order that fits the situation. These deviations are not the problem. They are the safety. They are what carries the system through the day at all.

    Behind this stands a deeper insight from the resilience-engineering tradition: safety is not the absence of variation, but the capacity to absorb it. David Woods calls this “graceful extensibility”: the question of how far a system can be stretched before it breaks, and how it behaves while being stretched. Over-standardisation optimises for the normal case and ignores exactly this question. It makes the system efficient under ideal conditions and prone to brittle failure under real ones.

    What tailoring means is exactly this: shaping the latitude rather than eliminating it. Setting guardrails (the limits beyond which it becomes dangerous) and, within those guardrails, allowing adaptability, making it visible, keeping it learnable. This is more demanding than a thick rule-book, because it requires trust, conversation, and contextual knowledge. It is also the only thing that works under conditions where variation cannot be eliminated. Pilots who set the manual aside can be heroes or culprits. What they are depends on the system, not on themselves.

    What this means for us

    From this follows the position we write from: safety does not arise when people adapt to systems, but when systems are designed so they can be adapted to people: continuously, in operation, not in the audit room. Exactly this tailoring (this ongoing adaptation under real conditions) is the craft we want to lay out here. Not because the New View line is fashionable. It has been established in the literature for more than two decades. But because the operational gap between it and daily practice is still wide.

    In practice this means: We write about incidents to reconstruct conditions: the conditions under which reasonable people made reasonable decisions that turned out, in retrospect, to be consequential. Methods we treat as craft, requiring practice, judgement, and contextual knowledge. Organisations we read as learning-capable (or learning-incapable) systems.

    Back to Three Mile Island, shortly after four in the morning. Three operators stand in front of indicators, one of which shows the control signal rather than the position. They follow their training, they throttle the emergency cooling, because under suspected overpressure the procedure asks for exactly that. We can read them as the weak point of the system, or as the last people that night who acted by the rules they had been given. Which interpretation we choose decides what we build differently next time.

    What we build differently here is not, in the first place, an indicator that shows position rather than control signal. It is the willingness to change the question: not “Who failed?”, but “What made this, in that moment, plausible?” This question is more demanding. It does not lead to a person who can be sanctioned. It leads to a system that has to be rebuilt.

    Sources

    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • Karl E. Weick & Kathleen M. Sutcliffe – Managing the Unexpected, 3rd ed., Wiley 2015
    • Charles Perrow – Normal Accidents: Living with High-Risk Technologies, Princeton University Press 1999 (on the TMI analysis)
    • Diane Vaughan – The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, University of Chicago Press 1996
  • How our mental picture can endanger us

    How our mental picture can endanger us

    For nearly 20 years I have served as a volunteer firefighter. Over those two decades I have experienced a great deal of meaningful work in countless operations, but also recurring moments that weighed on me. In parallel with firefighting service, I built my entire professional career up to today. I worked intensively on topics like safety, human factors, organizational culture and structure, resilience, and so on, and turned these topics into my profession. Over the years my view of people and organizations has continuously changed and developed, including in connection with the institution of the fire service. I want to share part of those reflections in this article.

    It has struck me repeatedly that automatic alarms connected to fire detection systems (FDS) get labelled as “false alarms”. Firefighters amble casually into the fire station; it’s “just an FDS alarm”. While changing in the station, a colleague says “this is just a false alarm anyway”; the drive to the scene isn’t urgent because it’s “just an FDS alarm”. The examples are many.

    In this article I want to look at the topics of mental picture, situational awareness, and confirmation bias, using FDS alarms as the example. I will show the danger of a false mental picture and present some first approaches to addressing it.

    Some statistics to start with

    How large is the share of automatic alarms from fire detection systems? To answer this, I did a short analysis of the operations data publicly available on the website of the Dübendorf – Wangen-Brüttisellen fire service. All operations are recorded there and publicly accessible. I analyzed the years 2018-2020. Of 649 operations, 130 were FDS alarms – about 20%, with the annual percentages ranging between 16% and 23%. These figures did not match my own sense of it, so I also evaluated “my” alarms over the same period. Of 159 alarm calls I received, 57 were FDS alarms, which is 36%. Either way, the numbers show that FDS alarms make up a dominant part of firefighting work.

    It can also be said that these are usually operations where the all-clear can be given quickly. On the one hand, they are often technical malfunctions of the detection system; on the other hand, the systems fulfil their purpose and raise an alarm early, typically about small events – burnt toast in the staff room, an overheated water boiler, a small fire in an electrical distribution cabinet, and so on. However, we also keep encountering FDS alarms behind which events lie that can become genuinely dangerous for us firefighters.

    Our mental picture and our situational awareness

    But why all the fuss about a small thing? Because it is a dangerous small thing, the kind that can lead to accidents – in the worst case with fatal consequences. In countless accident investigation reports from all kinds of fields you can read that the cause lay in a missing or insufficient situational awareness (“lack of situational awareness”). That means indicators of the impending accident were essentially there, but the actors didn’t discover them or didn’t perceive them as such. Unfortunately, this is where most investigations end, without asking the really relevant question: why?

    How we approach an operation shapes our situational awareness during it. On the drive to the scene we “paint” our mental picture of the operation ahead. Once that mental picture is painted, we consciously and unconsciously look for cues that confirm it. So if we go into the operation with the mental picture “false alarm”, we unconsciously look for signs that confirm a false alarm. More importantly: we unconsciously filter out signs that don’t match our mental picture. This confirmation bias is something we can hardly escape. If thick black smoke is already coming out of a building, that’s less of a problem, because we perceive the obvious danger immediately. But in a fire operation, many dangers are not obvious. They hide behind doors, or they are gases that can’t be seen or smelled. If we unconsciously filter these signs out, we may put ourselves in mortal danger with a false situational awareness.

    The role of the organization

    The mental picture a person takes into a firefighting operation isn’t determined solely by that person. We are all shaped by the organization, or rather by the organizational culture – how we as a collective deal with such situations. This handling shows itself in existing processes, in the language used in communication (internal and external), in the leadership behaviour of supervisors, in the behaviour of colleagues, and so on. Does the organization commit itself to a strong safety culture and weight cultural aspects sufficiently?

    High Reliability Organizations (HRO) lead by example. These organizations are very aware of exactly these hidden dangers in their system and actively address them. High Reliability Organizations operate according to five principles:

    • Preoccupation with failure
    • Reluctance to simplify interpretations
    • Sensitivity to operations
    • Commitment to resilience
    • Deference to expertise

    Taking the “false alarm” example, this practice violates the second principle, the reluctance to simplify interpretations. The mental picture that arises from this simplification doesn’t necessarily match reality, and in the worst case can prove fatal.

    So how can a fire service organization reduce or minimize this risk? Initial, immediate measures could be:

    • Clearly defined and documented procedure for (FDS) alarms
    • Specific sensitisation of firefighters to the “false alarm” topic
    • Language guidelines for internal and external communication
    • Consistently pointing out the language guidelines and their background to firefighters who use the term “false alarm”

    These are of course only first, targeted measures. As mentioned above, organizational culture, which is not adequately changed by these measures alone, plays an important role. To improve safety sustainably, the topics of high-reliability organization and safety culture have to be tackled comprehensively.

    Conclusion

    The mental picture with which we go into an operation every day significantly influences our situational awareness and thus our actions in the operation. In the – fortunately rare – extreme case it can decide between life and death. But the mental picture is not solely a matter for each individual firefighter; it is shaped substantially by the surroundings and the organizational culture. Here the organization can and must take its directional role to sustainably improve the safety of its members. The High Reliability Organization model offers one possible approach.

  • The characteristics of High Reliability Organizations

    The characteristics of High Reliability Organizations

    Today, companies are faced with ever-increasing complexity. On the one hand, companies themselves are complex, socio-technical systems, and on the other hand, they are embedded in a complex environment with numerous known and unknown factors. How can an organization successfully keep up with these constantly increasing demands?

    In this article, I will focus on High Reliability Organizations (HRO) and the related High Reliability Theory (HRT). The idea of High Reliability Organizations originally comes from organizations that successfully operate in high-risk industries. Examples include air traffic control, nuclear power plants, aircraft carriers, power grid operators, and similar fields of activity. However, I am personally convinced that the insights gained from HRO and the associated operational principles can be transferred to any organization and – like the classic HRO – will make them more successful.

    The emergence of the High Reliability Theory

    In 1984, sociologist Charles Perrow published his book “Normal Accidents: Living with High-Risk Technologies”, in which he introduces the Normal Accident Theory (NAT) using an analysis of the reactor accident in the nuclear power plant Three Mile Island in the USA in 1979. His reasoning was that complex and tightly coupled systems would have to lead to a catastrophic accident sooner or later. It therefore was irresponsible to operate such systems, for example nuclear power plants. This theory appears to be absolutely plausible; however, it holds a problem: it cannot be falsified. Because – and this several times was Perrow’s answer to criticism – even if there has never been an accident, it is just a matter of time before it happens. Who can refute a forward-looking statement?

    NAT has rightly received a lot of attention in safety sciences and Perrow has been cited thousands of times. But the question arose as to why there are still organizations that can successfully operate complex, tightly coupled systems with virtually no incidents. This question inspired the Berkeley scholars Gene I. Rochlin, Todd R. La Porte, and Karlene H. Roberts to study such organizations more closely and to publish an article on High Reliability Organizations in 1987 in response to Perrow’s NAT: the High Reliability Theory (HRT). Using a best-practice approach, the Berkeley scholars examined why operations on an aircraft carrier (in peacetime), where aircraft move at high speed and in tight space in the presence of dangerous goods such as fuel and weapons, do not (or did not yet) lead to a catastrophic accident.

    Like NAT, HRT received a lot of attention. As a result, it was followed by numerous publications, and there are still articles and books published today on this topic. Of course, Perrow’s answer to HRT did not wait. While the Berkeley scholars considered their work as complementary to NAT, Perrow didn’t agree with this assessment and contradicted directly. After 1987, numerous studies and surveys were conducted in different industries such as nuclear power plants, aircraft carriers (in peacetime), power grid operators, air traffic control, etc. These studies have contributed to the further development of the HRT. In 2001, Karl E. Weick and Kathleen M. Sutcliffe published the first edition of “Managing the Unexpected”, in which they broke down the findings to five HRO principles that I will explain further down. “Managing the Unexpected” was published in a second edition in 2007 and in a third edition in 2015.

    The five HRO principles

    The first three of the five HRO principles are primarily to be understood from a prevention perspective. The focus is on preventing serious incidents and accidents.

    Preoccupation with failure. To deal with possible failure means to think about what could go wrong and to look for weak signals in the system. These are activities that can be taken over by effective risk management but should also be firmly anchored in the everyday work routine of all employees. When analyzing risks, it is important to understand what could happen. However, it is even more important to get to the bottom of the question of why something could happen.

    Reluctance to simplify. We tend to look at something in a way that fits our personal worldview. This applies to both individuals and entire organizations. With the reluctance to simplify, the existing worldview is questioned, and situations are viewed from different perspectives.

    Sensitivity to operations. In an HRO, operational processes set the pace of the organization. The organization has a high awareness of details and identifies weak signals. The quality of relationships is strong. There is a culture of trust, which enables employees to speak openly about irregularities and concerns in connection with operational processes.

    But an HRO is also aware that, despite all efforts, mistakes cannot be completely avoided. The last two of the five HRO principles focus primarily on coping with errors so that they do not develop into a crisis.

    Commitment to resilience. Errors also occur in HROs. HRO do not try to be error-free, but on the one hand they have the ability to quickly recover from mistakes before a crisis arises, and on the other hand they have the ability – if a crisis does occur – to quickly get out of this crisis.

    Deference to expertise. If an HRO is in a critical situation or even in a crisis, the decisions to deal with this situation are made “at the front” by the relevant experts and not necessarily by the management. Figuratively speaking, the hierarchy pyramid turns upside down. Once the crisis has been overcome, the hierarchy pyramid will normalize again.

    Why become a High Reliability Organization?

    The five HRO principles support a company to become a mindful organization that is able to focus on operational processes, detect weak signals, learn from irregularities, and manage potentially dangerous situations before they turn into a full-blown crisis. The benefits that result from this are manifold. For example, losses in connection with a crisis – be it a production shutdown or a far-reaching scandal – can be avoided or limited, or a significant competitive advantage can be achieved by optimizing operational processes. If a company is perceived as an HRO from outside, this can have a positive impact on public perception and can also play a decisive role in the war for talent.

    In conclusion, it can be said that the concept of the High Reliability Organization not only supports companies in high-risk areas, but basically supports all companies in successfully sustaining themselves in their increasingly complex environment and in creating a competitive advantage.