It is the night of 28 March 1979, shortly after four in the morning. In the control room at Three Mile Island, Unit 2, a light is on: pressure relief valve closed. The light says that because it doesn’t measure position. It displays the control signal, the command that was sent to the valve to close. What the valve is actually doing, nobody in the room knows. It has been open for two minutes and thirteen seconds, and it will stay open for the next two hours.

In the hours that follow, the operators will do something that the later investigation will identify as the primary cause of the partial meltdown: they throttle back the emergency cooling. They do it because their instruments tell them the system is over-pressurised, and because their training has taught them to avoid exactly that condition. They act rationally given what they see. In the days that follow, the press will speak of “human error.”

This reflex (the diagnosis of “human error” that follows a scene like this almost automatically) sits behind most of the safety conversations I have in consulting practice. Not because those involved are unwise. But because three assumptions are so deeply embedded in our safety tradition that they pass as common sense. We read them differently. What follows are three counter-positions, one per assumption.

“Human error” is a diagnosis, not a finding

Anyone working in this field knows the statistic: 80 to 90 percent of all incidents are attributed to “human error.” The number has been cited since the 1980s in talks, audits, executive reports, and it works: it makes plausible that the answer to safety problems must lie with people. More training, clearer standards, stricter discipline. The logic is clean: if the problem sits in the cockpit, the solution must sit in the cockpit too.

The problem with this logic isn’t the statistic. It’s the interpretation. Sidney Dekker puts it in his Field Guide so sharply it hurts: “human error” is never the end of an investigation, it is the beginning. Whoever explains incidents this way has stopped asking: they have found a label and settled into it. Local rationality, the concept Dekker keeps sharpening, says: nobody comes to work intending to take a reactor into meltdown, harm a patient, or bring an aircraft down. What looks like failure from the bird’s-eye view of an investigation made sense at the moment of action, given what the person could see, given the pressure, given the training.

Reconstructing that sense is the actual work.

Hollnagel adds a second thread. His Safety-II argument runs, simplified: the same thing we call “failure” is the other side of an adaptive capacity without which the system wouldn’t function for an hour. People accomplish daily what procedures cannot accomplish on their own: they interpret context, they improvise when reality diverges from the script assumption (which it does constantly), they fill the gaps that designers and rule-books have left open. Whoever treats people as a weak point cuts themselves off from the only real source of resilience the system has.

Back in the TMI control room, read through this lens: the operators throttle the emergency cooling because their instruments say the system is over-pressurised, and because their training has sensitised them to exactly that risk. At the moment of action, their decision is the only coherent interpretation of the data available to them. That we know today the valve was open and the system under-pressurised rather than over: that is information of the investigation, not information the operators had. This asymmetry between investigator and actor, “hindsight bias” in the research vocabulary, is not a methodological cosmetic flaw. It is the structural condition under which every incident investigation operates. Whoever doesn’t reflect on it sees in every past what those involved could have done. And overlooks what they actually could see.

In training sessions, I now routinely ask participants: what is the most frequent cause of incidents and accidents in your operation? The answer comes every time, without exception: human error. It comes fast, it comes self-evidently, and it comes before the actual work of the training has begun. Over the hours that follow, there is regularly a moment when something dawns on the participants. And it isn’t a new term, no additional tool, but a shift of perspective: their own incident investigations, as they themselves recognise, have ended exactly where they should have begun. What that costs isn’t only a weaker investigation. It is the willingness of employees to report anything at all next time.

The question that interests us more than “How do we prevent human errors?” is this: How does our system support the adaptive work people have to do for it to function at all?

Human error is never an explanation. It is a diagnosis that says more about those diagnosing than about the incident.

Compliance is a minimum, not safety

The second assumption follows the first like a shadow. If people are the risk, then regulations, audits, and certifications are the instruments of control. Safety becomes a question of whether the right boxes are ticked. Executive teams read safety KPIs (lost-time injury rate, audit findings, training completion rates) and draw conclusions about the state of the organisation. The governance is clear, the reporting is clean, the responsibility is distributed. There is a reason this model survives so robustly: it interfaces well with law, insurance, and corporate reporting.

The model has just one problem: compliance and safety regularly come apart. Boeing’s 737 MAX held FAA certification, a compliance status that was green by every auditable measure. And an MCAS system whose malfunction cost 346 people their lives. The Bristol Heart Scandal of the 1990s revealed a hospital whose internal safety indicators showed no clear anomalies, while paediatric cardiac surgery mortality had climbed to twice the British average. In both cases the signals were reported, by insiders no one wanted to listen to, because the compliance picture was clean.

What happens between the audits is the actual safety story. Diane Vaughan, in her study of the Challenger disaster, coined a term for it: “normalisation of deviance.” Drift rarely arises as deliberate rule-breaking. It arises because, under real conditions, the system gradually departs from the norm (a small tolerance here, a step shortened in time there) and because these deviations mostly turn out fine. Every repetition without consequence widens the bandwidth of the acceptable, without anyone ever having made a conscious decision. From the audit perspective, this drift is invisible: on audit day the picture aligns again, because everyone knows what to show. From the perspective of learning capacity, it would be visible, if the organisation had the mechanisms to see it.

What these cases share is not a compliance failure. It is a learning failure. Compliance is a property of a moment: it says that at time X rule Y was being followed. Safety is a property of a process: it says that the organisation is able to pick up weak signals, revise assumptions, and correct its own behaviour, before the next audit date enters the stage. The one is a state, the other is a capability. An organisation can be fully compliant at any given moment and at the same time completely blind to the drift it is in.

The operational question that follows from this is not “Are we compliant?” It is: Do weaknesses become visible without being punished? Are near-misses treated as learning opportunities, or as reputational risks? Does the system get smarter after every incident, or just more defensive? Just culture, in the precise sense of Reason and Dekker, is the precondition. It is not the poster in the break room.

It is the lived answer to what happens when someone admits something they could have kept quiet about.

Standardisation creates brittleness, not resilience

The third assumption is the most stubborn, because it speaks most directly to the safety reflex. When something goes wrong, we raise the level of standardisation. We write the next step into the SOP, we narrow the latitude, we formalise what used to be a matter of experience. The underlying assumption is clean and mechanical: variation is defect, uniformity is safety. What does not behave deviantly cannot go wrong.

The assumption holds for simple, linear systems. It does not hold for the systems we deal with in HRO-adjacent contexts. Erik Hollnagel uses a precise word for the consequence of this reflex: brittleness. An over-standardised system loses the capacity to adapt to conditions its designers did not anticipate. It functions exactly as long as reality follows the script. And reality never follows the script all the way. The moment deviation arrives, the system has no reserve, no improvisational capacity, no repertoire other than “continue as planned.”

What the HOP movement around Todd Conklin and others has been showing since the 2010s is banal and consequential at once: every functioning shift deviates from the script daily. Nurses combine orders that formally were not designed to be combined, because the original procedure does not fit the specific situation. Industrial operators put in small workarounds because a tool is missing or a step under time pressure has to be skipped. Pilots interpret checklists in an order that fits the situation. These deviations are not the problem. They are the safety. They are what carries the system through the day at all.

Behind this stands a deeper insight from the resilience-engineering tradition: safety is not the absence of variation, but the capacity to absorb it. David Woods calls this “graceful extensibility”: the question of how far a system can be stretched before it breaks, and how it behaves while being stretched. Over-standardisation optimises for the normal case and ignores exactly this question. It makes the system efficient under ideal conditions and prone to brittle failure under real ones.

What tailoring means is exactly this: shaping the latitude rather than eliminating it. Setting guardrails (the limits beyond which it becomes dangerous) and, within those guardrails, allowing adaptability, making it visible, keeping it learnable. This is more demanding than a thick rule-book, because it requires trust, conversation, and contextual knowledge. It is also the only thing that works under conditions where variation cannot be eliminated. Pilots who set the manual aside can be heroes or culprits. What they are depends on the system, not on themselves.

What this means for us

From this follows the position we write from: safety does not arise when people adapt to systems, but when systems are designed so they can be adapted to people: continuously, in operation, not in the audit room. Exactly this tailoring (this ongoing adaptation under real conditions) is the craft we want to lay out here. Not because the New View line is fashionable. It has been established in the literature for more than two decades. But because the operational gap between it and daily practice is still wide.

In practice this means: We write about incidents to reconstruct conditions: the conditions under which reasonable people made reasonable decisions that turned out, in retrospect, to be consequential. Methods we treat as craft, requiring practice, judgement, and contextual knowledge. Organisations we read as learning-capable (or learning-incapable) systems.

Back to Three Mile Island, shortly after four in the morning. Three operators stand in front of indicators, one of which shows the control signal rather than the position. They follow their training, they throttle the emergency cooling, because under suspected overpressure the procedure asks for exactly that. We can read them as the weak point of the system, or as the last people that night who acted by the rules they had been given. Which interpretation we choose decides what we build differently next time.

What we build differently here is not, in the first place, an indicator that shows position rather than control signal. It is the willingness to change the question: not “Who failed?”, but “What made this, in that moment, plausible?” This question is more demanding. It does not lead to a person who can be sanctioned. It leads to a system that has to be rebuilt.

Sources

  • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014
  • Erik Hollnagel – Safety-II in Practice, Routledge 2018
  • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
  • Karl E. Weick & Kathleen M. Sutcliffe – Managing the Unexpected, 3rd ed., Wiley 2015
  • Charles Perrow – Normal Accidents: Living with High-Risk Technologies, Princeton University Press 1999 (on the TMI analysis)
  • Diane Vaughan – The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, University of Chicago Press 1996
Discussion

Thoughts on this? Take it to LinkedIn.

No comments here – but discussion is welcome, where the readership is anyway.


Join the discussion on LinkedIn