Tag: Hindsight Bias

  • Field Guide to Understanding Human Error – a note on Sidney Dekker

    Field Guide to Understanding Human Error – a note on Sidney Dekker

    Sidney Dekker’s Field Guide to Understanding Human Error sits within reach of my desk, third edition 2014. It’s not the book I quote most often, but the one I most often return to. With every re-reading, what strikes me is how strange it is: written as a guide (subtitle, clear table of contents, plain language) and yet at its core a systematic attack on the standard way of reading incidents.

    Dekker’s core operation can be put in one sentence: he shifts the position of judgement – away from above with outcome knowledge, toward the view of those who stood in the moment of action. Old View, in his vocabulary, asks from the outside: who acted wrongly, who failed, who didn’t meet the standard? New View asks from the inside: what was visible, plausible, reasonable to the acting person at that moment, given what they saw, knew, and could put together? The shift isn’t a swap of tools. It’s a shift of the position from which judgement happens at all.

    Local rationality. Why “local” is the decisive adjective, not “rational” alone: every actor would be rational, every investigation tacitly assumes that anyway. The word “local” marks something else: the binding to a concrete horizon. Local means what the person could see, know, and combine in the moment, with the indicators in front of them, the pressure at their neck, the training in their head. Dekker’s standard move in the Field Guide is to reconstruct every “wrong” decision from this local horizon first. And the book shows this reconstruction as craft, through concrete incidents, often with transcribed radio logs or witness statements. Little theory, much workshop. In the back of the book there’s a structured question apparatus meant to help any investigation reconstruct local rationality: what did the person have in front of them, what didn’t they, which indicators spoke which language, which ones do they know from training, which resources were available at that moment. This turns a theoretical concept into a workable investigative discipline.

    Sharp End / Blunt End. The pair of terms comes from James Reason, and Dekker uses them consistently. Sharp End is whoever stood in the incident: the nurse at the bedside, the operator at the console, the pilot in the cockpit. Blunt End is what created the conditions under which Sharp End works: design decisions, rule sets, resource allocations, trade-offs between safety and speed. Dekker’s point isn’t that the Sharp End is innocent. It’s that most Sharp End actions are responses to Blunt End conditions. Whoever looks only at the Sharp End sees the hand on the lever. And misses the pressure that brought the hand there. And with it the only place where a correction would even be possible.

    “Causes don’t exist, you construct them.” Dekker’s sharpest provocation and most frequently misunderstood thesis. It doesn’t mean: everything is equal, everything is relative. It means: what we identify at the end of an investigation as “the cause” is always a selection from many contributing conditions. And the selection says something about the analytical lens we’re looking through. Which factor becomes “the cause” and which stays “context” is a decision of the investigation. A conscious one sometimes, an unconscious one mostly. Dekker’s invitation isn’t to arbitrariness. It’s to self-reflection: what are we doing when we identify a cause? What choice are we making without marking it as a choice?

    Causes don’t exist, you construct them.
    – Sidney Dekker

    What I read differently today

    What the book has shaped, after years of practice, can be named pretty precisely: better investigations, more context interviews before the question of blame, less reflex toward “employee sensitisation” as a recommendation. Dekker supplied the vocabulary with which I now turn down assignments where the answer is already prescribed. The limit I notice more strongly with every re-reading: the book is excellent at diagnosis, how to read incidents differently, how to conduct investigations more openly, how to deconstruct the reading reflexes of the Old View. It’s noticeably less explicit on the operational rebuilding question: how to actually build an organisation differently so the New View reading happens not only in investigations but in daily operations. Whoever looks for the next step after the Field Guide typically lands with Conklin (HOP, Pre-Accident Investigations, Operating Principles), more operational, closer to the shop floor. Dekker and Conklin together make the set: first the lens, then the tool.

    Who this book is for

    Required reading for anyone who investigates incidents or shares responsibility for them: safety officers, auditors, line managers in HRO-adjacent industries, investigation commissions of every kind. Especially for those who notice their investigations were over too quickly, without being able to name exactly why. The book gives the observation a vocabulary. If you read only one book on human error, read this one, not because it gives the most answers, but because it changes the way you ask questions.

    Sources

    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014 (main source)
    • Sidney Dekker – Drift into Failure, CRC Press 2011
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • James Reason – Managing the Risks of Organizational Accidents, Ashgate 1997 (for Sharp End / Blunt End)
  • People are not the weak link

    People are not the weak link

    15 January 2009. US Airways Flight 1549 takes off from LaGuardia at 3:25 p.m., heading for Charlotte. Just under a minute and a half later, at 2,800 feet over Manhattan, the Airbus A320 flies into a flock of Canada geese. Both engines lose thrust, almost simultaneously. What Captain Chesley “Sully” Sullenberger and First Officer Jeffrey Skiles do in the next three minutes is in no manual. There is no checklist for “dual engine flameout at 2,800 feet over Manhattan”. The engine restart procedure they work through for form’s sake is designed for altitudes above 20,000 feet. It doesn’t fit in the very first step. Sully decides not to bring the aircraft back to Teterboro airport as the tower suggests (he sees in twenty seconds that it won’t reach), but to set down on the Hudson. A decision no procedure provides for, because no procedure can. All 155 people on board survive.

    In the later NTSB analysis it’s calculated that the aircraft could have reached Teterboro, had the crew turned immediately, without attempting the engine restart, without spending the seconds in which a human tries to assess the impossible. “Could have reached”, under conditions no one in the cockpit knew: a crew in the simulator, prepared for the scenario, with engine data that no one could realistically have had. Sully himself said in the hearing: It was not realistic. He was right.

    This is the story that became a classic. It’s shown in trainings, quoted in talks, shared on LinkedIn. What’s rarely said about it is the place where it becomes uncomfortable, the place where praise for the captain and the safety logic of our industry should fall apart.

    What saved Sully that day wasn’t the procedure. It was the willingness to set the procedure aside the moment it became clear it didn’t fit. It was the experience, built over thousands of flight hours, to position an aircraft in seconds against a geography he knew. It was a cockpit in which two people could communicate quickly and without hierarchical friction. And it was an organisation that had built enough trust over the preceding years that a captain took responsibility for a water landing. And was not rebuked afterwards for departing from the script.

    In the dominant safety logic of our time, exactly this moment is an anomaly. “Human error” is the standard explanation for most incidents. What do we call what Sully did, in the same language?

    The usual diagnosis

    The usual diagnosis after an incident goes predictably. It runs in two steps: first “human error”, then “more standardisation”. Who should have done it better, what should they have done, which procedure wasn’t followed? The vocabulary is well-rehearsed, the conclusion usually stands before the investigation: more precise manual, sharper training, stronger compliance.

    What this logic doesn’t reach is the asymmetry between what counts as “failure” and what gets registered as “success”. Sully is a hero today. The moment he left the restart checklist, every formally driven investigation would have had to read him as “procedural deviation under pressure”. Had the aircraft not reached the Hudson, Sully would today be an example of “inadequate procedural compliance”. The story hangs on the outcome, not the action.

    This is exactly where Erik Hollnagel’s point in Safety-II in Practice becomes operational: the same behaviour we classify as failure after an incident is the condition under which the system makes it through most days. People continuously adapt procedures to a reality in which the procedures don’t fit. When things go well, no one talks about it. When things go wrong, the adaptation becomes the symptom that needs to be prevented.

    This isn’t a methodological cosmetic flaw of incident investigations. It’s the structural foundation of a safety logic in which the term “human error” doesn’t describe what happened, but what shouldn’t have happened. A diagnosis that always already knows where the problem lies (with the human), and accordingly stops learning.

    It would be too easy to file this reading pattern as mere knowledge lag. The Old View doesn’t survive because its proponents have read too little. It survives because it serves a set of institutional needs very efficiently. It delivers clear attribution: one person, one fault, one closed case. It fits insurance and liability logic, which asks about individual responsibility. It’s representable in executive reporting without loss in translation: Employee X didn’t follow procedure Y, training Z is the answer. Above all, it minimises the need to question the system itself (and with it the decisions of those who designed it). Arguing against all of this is not primarily a matter of better knowledge. It’s a matter of who bears the cost of the shift.

    The unspoken truth

    If we look honestly at an average workday in a high-risk organisation, we don’t see what’s in the manual. We see thousands of small adjustments, most of which are never written down. And without which the system wouldn’t survive.

    A nurse combines orders because the original procedure doesn’t fit the specific situation. An industrial operator takes a step early because the tool named in the procedure is currently out for maintenance. A pilot follows the checklist in an order more fitting to the situation than the one prescribed in the manual. A firefighter sets the nozzle two metres closer than standard formation would dictate, because he reads the geometry of the fire differently.

    What Hollnagel calls the efficiency-thoroughness trade-off (the constant balancing between effort and thoroughness that cannot be trained away under real conditions) is not an exception. It’s the form in which work is done. Steven Shorrock, in his articles on humanisticsystems.com, accordingly speaks of adjustments as the actual substance of safety: the constant, invisible stream of small corrections through which procedures stay connected to reality.

    These adjustments don’t enter the statistics anywhere. They don’t show up in safety KPIs. They aren’t part of compliance reports. They happen because they have to. And because no one talks about them, no one knows how many there are daily or what they rest on. The organisation depends on a resilience whose existence it doesn’t officially acknowledge.

    Exactly what the safety logic demands (strict procedural adherence) is what undermines safety under real conditions.

    What Old View thinking costs

    As long as the official logic addresses people as the weak link, this invisible adaptive work has an implicit status: it’s tolerated as long as nothing happens, and sanctioned the moment something does. This has two consequences, which together hollow out the organisation’s learning system.

    First, employees learn (quickly, in every operation) that adjustments are best left undocumented. Whoever does something that deviates from the procedure and records it in a report risks consequences that don’t lie in the adjustment itself, but in the fact that it became visible. The rational response is not to make it visible. This costs the organisation its only access to the question of how it actually works.

    Second, the employees whose adaptive work carries the system are simultaneously the ones to whom responsibility is assigned when the system nevertheless fails. This isn’t just unfair. It’s destructive. It trains people to think less, observe less, compensate less, because every compensation, if it becomes visible, can become an accusation.

    What would work instead

    The alternative isn’t: abolish procedures. The alternative is to treat procedures as what they are: a first approximation to a complex reality that must be recalibrated in every single application. What happens between procedure and application isn’t a defect. It’s the place where safety is produced.

    In operational terms: make adjustment visible without elevating it to a new rule. An organisation that regularly asks where did we deviate from the procedure this week, why, and with what result learns something that audits can’t deliver. It learns how its work is actually done. Whoever doesn’t want to hear the answer shouldn’t ask. Whoever wants to hear it must be willing to adjust the procedure when needed, not the person who went around it in the moment of truth.

    Todd Conklin’s HOP line makes a tool of this insight: Learning Teams instead of Investigations, Pre-Job Briefs instead of formalised Job Safety Analyses, Operational Learning instead of Root Cause Analysis. The shift in vocabulary isn’t cosmetic. It shifts the question from “who failed?” to “what haven’t we understood yet, and how do we understand it better next time?”.

    In practice these are small, regular formats that work below the threshold of a formal investigation. A weekly Learning Team of 30 minutes, in which someone briefly tells where the procedure didn’t fit this week, without consequences, without documentation. A Pre-Job Brief before an unusual operation that asks what’s different this time and which assumption won’t hold today. An After-Action Review even after ordinary days, because every ordinary day contains something learnable. These formats are well known. They fail in most organisations not for lack of knowledge, but because they lead to nothing without psychological safety. Whoever admits something in the learning group that could later end up in their personnel file stays silent. And the learning group decays into an empty ritual.

    This names a precondition that doesn’t exist in many organisations: that speaking without punishment is possible. Just Culture in the strict sense. Without this precondition, everything else stays cosmetic: HOP as well, Safety-II as well, the friendliest learning group in the world as well. With it, the invisible adaptive work becomes what it could be: the learning source of an organisation that wants to speak honestly about its own operations.

    The uncomfortable question

    Back to Sully, shortly after 3:31 p.m., one of the most unusual water landings in civil aviation. The story became a Hollywood film, the captain became a hero. What gets forgotten in the telling is the question that runs along in the background: in which organisation could he have done that, without being sanctioned afterwards for the procedural deviation?

    In most high-risk industries the honest answer is: not in many. Whoever consistently treats their employees as the weak link will end up with employees who become exactly that, not out of malice, but out of self-protection. They will stop deviating from scripts, and resign themselves to the fact that a day on which something unforeseen happens simply won’t be a good day, because they have nothing left in hand that isn’t in the manual.

    If safety is what we want, we have to stop reading the human as the problem to be fixed. We have the choice: either we treat adaptive work as what it is (the invisible substance of our safety), or we talk it away until there’s no one left to put the next aircraft down on the Hudson.

    Sources

    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Steven Shorrock – Articles on humanisticsystems.com (Work-as-Done, Adjustments)
    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • NTSB – Accident Report AAR-10/03, Loss of Thrust in Both Engines After Encountering a Flock of Birds and Subsequent Ditching on the Hudson River, US Airways Flight 1549, 2010
  • Three Assumptions We Need to Leave Behind

    Three Assumptions We Need to Leave Behind

    It is the night of 28 March 1979, shortly after four in the morning. In the control room at Three Mile Island, Unit 2, a light is on: pressure relief valve closed. The light says that because it doesn’t measure position. It displays the control signal, the command that was sent to the valve to close. What the valve is actually doing, nobody in the room knows. It has been open for two minutes and thirteen seconds, and it will stay open for the next two hours.

    In the hours that follow, the operators will do something that the later investigation will identify as the primary cause of the partial meltdown: they throttle back the emergency cooling. They do it because their instruments tell them the system is over-pressurised, and because their training has taught them to avoid exactly that condition. They act rationally given what they see. In the days that follow, the press will speak of “human error.”

    This reflex (the diagnosis of “human error” that follows a scene like this almost automatically) sits behind most of the safety conversations I have in consulting practice. Not because those involved are unwise. But because three assumptions are so deeply embedded in our safety tradition that they pass as common sense. We read them differently. What follows are three counter-positions, one per assumption.

    “Human error” is a diagnosis, not a finding

    Anyone working in this field knows the statistic: 80 to 90 percent of all incidents are attributed to “human error.” The number has been cited since the 1980s in talks, audits, executive reports, and it works: it makes plausible that the answer to safety problems must lie with people. More training, clearer standards, stricter discipline. The logic is clean: if the problem sits in the cockpit, the solution must sit in the cockpit too.

    The problem with this logic isn’t the statistic. It’s the interpretation. Sidney Dekker puts it in his Field Guide so sharply it hurts: “human error” is never the end of an investigation, it is the beginning. Whoever explains incidents this way has stopped asking: they have found a label and settled into it. Local rationality, the concept Dekker keeps sharpening, says: nobody comes to work intending to take a reactor into meltdown, harm a patient, or bring an aircraft down. What looks like failure from the bird’s-eye view of an investigation made sense at the moment of action, given what the person could see, given the pressure, given the training.

    Reconstructing that sense is the actual work.

    Hollnagel adds a second thread. His Safety-II argument runs, simplified: the same thing we call “failure” is the other side of an adaptive capacity without which the system wouldn’t function for an hour. People accomplish daily what procedures cannot accomplish on their own: they interpret context, they improvise when reality diverges from the script assumption (which it does constantly), they fill the gaps that designers and rule-books have left open. Whoever treats people as a weak point cuts themselves off from the only real source of resilience the system has.

    Back in the TMI control room, read through this lens: the operators throttle the emergency cooling because their instruments say the system is over-pressurised, and because their training has sensitised them to exactly that risk. At the moment of action, their decision is the only coherent interpretation of the data available to them. That we know today the valve was open and the system under-pressurised rather than over: that is information of the investigation, not information the operators had. This asymmetry between investigator and actor, “hindsight bias” in the research vocabulary, is not a methodological cosmetic flaw. It is the structural condition under which every incident investigation operates. Whoever doesn’t reflect on it sees in every past what those involved could have done. And overlooks what they actually could see.

    In training sessions, I now routinely ask participants: what is the most frequent cause of incidents and accidents in your operation? The answer comes every time, without exception: human error. It comes fast, it comes self-evidently, and it comes before the actual work of the training has begun. Over the hours that follow, there is regularly a moment when something dawns on the participants. And it isn’t a new term, no additional tool, but a shift of perspective: their own incident investigations, as they themselves recognise, have ended exactly where they should have begun. What that costs isn’t only a weaker investigation. It is the willingness of employees to report anything at all next time.

    The question that interests us more than “How do we prevent human errors?” is this: How does our system support the adaptive work people have to do for it to function at all?

    Human error is never an explanation. It is a diagnosis that says more about those diagnosing than about the incident.

    Compliance is a minimum, not safety

    The second assumption follows the first like a shadow. If people are the risk, then regulations, audits, and certifications are the instruments of control. Safety becomes a question of whether the right boxes are ticked. Executive teams read safety KPIs (lost-time injury rate, audit findings, training completion rates) and draw conclusions about the state of the organisation. The governance is clear, the reporting is clean, the responsibility is distributed. There is a reason this model survives so robustly: it interfaces well with law, insurance, and corporate reporting.

    The model has just one problem: compliance and safety regularly come apart. Boeing’s 737 MAX held FAA certification, a compliance status that was green by every auditable measure. And an MCAS system whose malfunction cost 346 people their lives. The Bristol Heart Scandal of the 1990s revealed a hospital whose internal safety indicators showed no clear anomalies, while paediatric cardiac surgery mortality had climbed to twice the British average. In both cases the signals were reported, by insiders no one wanted to listen to, because the compliance picture was clean.

    What happens between the audits is the actual safety story. Diane Vaughan, in her study of the Challenger disaster, coined a term for it: “normalisation of deviance.” Drift rarely arises as deliberate rule-breaking. It arises because, under real conditions, the system gradually departs from the norm (a small tolerance here, a step shortened in time there) and because these deviations mostly turn out fine. Every repetition without consequence widens the bandwidth of the acceptable, without anyone ever having made a conscious decision. From the audit perspective, this drift is invisible: on audit day the picture aligns again, because everyone knows what to show. From the perspective of learning capacity, it would be visible, if the organisation had the mechanisms to see it.

    What these cases share is not a compliance failure. It is a learning failure. Compliance is a property of a moment: it says that at time X rule Y was being followed. Safety is a property of a process: it says that the organisation is able to pick up weak signals, revise assumptions, and correct its own behaviour, before the next audit date enters the stage. The one is a state, the other is a capability. An organisation can be fully compliant at any given moment and at the same time completely blind to the drift it is in.

    The operational question that follows from this is not “Are we compliant?” It is: Do weaknesses become visible without being punished? Are near-misses treated as learning opportunities, or as reputational risks? Does the system get smarter after every incident, or just more defensive? Just culture, in the precise sense of Reason and Dekker, is the precondition. It is not the poster in the break room.

    It is the lived answer to what happens when someone admits something they could have kept quiet about.

    Standardisation creates brittleness, not resilience

    The third assumption is the most stubborn, because it speaks most directly to the safety reflex. When something goes wrong, we raise the level of standardisation. We write the next step into the SOP, we narrow the latitude, we formalise what used to be a matter of experience. The underlying assumption is clean and mechanical: variation is defect, uniformity is safety. What does not behave deviantly cannot go wrong.

    The assumption holds for simple, linear systems. It does not hold for the systems we deal with in HRO-adjacent contexts. Erik Hollnagel uses a precise word for the consequence of this reflex: brittleness. An over-standardised system loses the capacity to adapt to conditions its designers did not anticipate. It functions exactly as long as reality follows the script. And reality never follows the script all the way. The moment deviation arrives, the system has no reserve, no improvisational capacity, no repertoire other than “continue as planned.”

    What the HOP movement around Todd Conklin and others has been showing since the 2010s is banal and consequential at once: every functioning shift deviates from the script daily. Nurses combine orders that formally were not designed to be combined, because the original procedure does not fit the specific situation. Industrial operators put in small workarounds because a tool is missing or a step under time pressure has to be skipped. Pilots interpret checklists in an order that fits the situation. These deviations are not the problem. They are the safety. They are what carries the system through the day at all.

    Behind this stands a deeper insight from the resilience-engineering tradition: safety is not the absence of variation, but the capacity to absorb it. David Woods calls this “graceful extensibility”: the question of how far a system can be stretched before it breaks, and how it behaves while being stretched. Over-standardisation optimises for the normal case and ignores exactly this question. It makes the system efficient under ideal conditions and prone to brittle failure under real ones.

    What tailoring means is exactly this: shaping the latitude rather than eliminating it. Setting guardrails (the limits beyond which it becomes dangerous) and, within those guardrails, allowing adaptability, making it visible, keeping it learnable. This is more demanding than a thick rule-book, because it requires trust, conversation, and contextual knowledge. It is also the only thing that works under conditions where variation cannot be eliminated. Pilots who set the manual aside can be heroes or culprits. What they are depends on the system, not on themselves.

    What this means for us

    From this follows the position we write from: safety does not arise when people adapt to systems, but when systems are designed so they can be adapted to people: continuously, in operation, not in the audit room. Exactly this tailoring (this ongoing adaptation under real conditions) is the craft we want to lay out here. Not because the New View line is fashionable. It has been established in the literature for more than two decades. But because the operational gap between it and daily practice is still wide.

    In practice this means: We write about incidents to reconstruct conditions: the conditions under which reasonable people made reasonable decisions that turned out, in retrospect, to be consequential. Methods we treat as craft, requiring practice, judgement, and contextual knowledge. Organisations we read as learning-capable (or learning-incapable) systems.

    Back to Three Mile Island, shortly after four in the morning. Three operators stand in front of indicators, one of which shows the control signal rather than the position. They follow their training, they throttle the emergency cooling, because under suspected overpressure the procedure asks for exactly that. We can read them as the weak point of the system, or as the last people that night who acted by the rules they had been given. Which interpretation we choose decides what we build differently next time.

    What we build differently here is not, in the first place, an indicator that shows position rather than control signal. It is the willingness to change the question: not “Who failed?”, but “What made this, in that moment, plausible?” This question is more demanding. It does not lead to a person who can be sanctioned. It leads to a system that has to be rebuilt.

    Sources

    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • Karl E. Weick & Kathleen M. Sutcliffe – Managing the Unexpected, 3rd ed., Wiley 2015
    • Charles Perrow – Normal Accidents: Living with High-Risk Technologies, Princeton University Press 1999 (on the TMI analysis)
    • Diane Vaughan – The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, University of Chicago Press 1996