Tag: Local Rationality

  • Field Guide to Understanding Human Error – a note on Sidney Dekker

    Field Guide to Understanding Human Error – a note on Sidney Dekker

    Sidney Dekker’s Field Guide to Understanding Human Error sits within reach of my desk, third edition 2014. It’s not the book I quote most often, but the one I most often return to. With every re-reading, what strikes me is how strange it is: written as a guide (subtitle, clear table of contents, plain language) and yet at its core a systematic attack on the standard way of reading incidents.

    Dekker’s core operation can be put in one sentence: he shifts the position of judgement – away from above with outcome knowledge, toward the view of those who stood in the moment of action. Old View, in his vocabulary, asks from the outside: who acted wrongly, who failed, who didn’t meet the standard? New View asks from the inside: what was visible, plausible, reasonable to the acting person at that moment, given what they saw, knew, and could put together? The shift isn’t a swap of tools. It’s a shift of the position from which judgement happens at all.

    Local rationality. Why “local” is the decisive adjective, not “rational” alone: every actor would be rational, every investigation tacitly assumes that anyway. The word “local” marks something else: the binding to a concrete horizon. Local means what the person could see, know, and combine in the moment, with the indicators in front of them, the pressure at their neck, the training in their head. Dekker’s standard move in the Field Guide is to reconstruct every “wrong” decision from this local horizon first. And the book shows this reconstruction as craft, through concrete incidents, often with transcribed radio logs or witness statements. Little theory, much workshop. In the back of the book there’s a structured question apparatus meant to help any investigation reconstruct local rationality: what did the person have in front of them, what didn’t they, which indicators spoke which language, which ones do they know from training, which resources were available at that moment. This turns a theoretical concept into a workable investigative discipline.

    Sharp End / Blunt End. The pair of terms comes from James Reason, and Dekker uses them consistently. Sharp End is whoever stood in the incident: the nurse at the bedside, the operator at the console, the pilot in the cockpit. Blunt End is what created the conditions under which Sharp End works: design decisions, rule sets, resource allocations, trade-offs between safety and speed. Dekker’s point isn’t that the Sharp End is innocent. It’s that most Sharp End actions are responses to Blunt End conditions. Whoever looks only at the Sharp End sees the hand on the lever. And misses the pressure that brought the hand there. And with it the only place where a correction would even be possible.

    “Causes don’t exist, you construct them.” Dekker’s sharpest provocation and most frequently misunderstood thesis. It doesn’t mean: everything is equal, everything is relative. It means: what we identify at the end of an investigation as “the cause” is always a selection from many contributing conditions. And the selection says something about the analytical lens we’re looking through. Which factor becomes “the cause” and which stays “context” is a decision of the investigation. A conscious one sometimes, an unconscious one mostly. Dekker’s invitation isn’t to arbitrariness. It’s to self-reflection: what are we doing when we identify a cause? What choice are we making without marking it as a choice?

    Causes don’t exist, you construct them.
    – Sidney Dekker

    What I read differently today

    What the book has shaped, after years of practice, can be named pretty precisely: better investigations, more context interviews before the question of blame, less reflex toward “employee sensitisation” as a recommendation. Dekker supplied the vocabulary with which I now turn down assignments where the answer is already prescribed. The limit I notice more strongly with every re-reading: the book is excellent at diagnosis, how to read incidents differently, how to conduct investigations more openly, how to deconstruct the reading reflexes of the Old View. It’s noticeably less explicit on the operational rebuilding question: how to actually build an organisation differently so the New View reading happens not only in investigations but in daily operations. Whoever looks for the next step after the Field Guide typically lands with Conklin (HOP, Pre-Accident Investigations, Operating Principles), more operational, closer to the shop floor. Dekker and Conklin together make the set: first the lens, then the tool.

    Who this book is for

    Required reading for anyone who investigates incidents or shares responsibility for them: safety officers, auditors, line managers in HRO-adjacent industries, investigation commissions of every kind. Especially for those who notice their investigations were over too quickly, without being able to name exactly why. The book gives the observation a vocabulary. If you read only one book on human error, read this one, not because it gives the most answers, but because it changes the way you ask questions.

    Sources

    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014 (main source)
    • Sidney Dekker – Drift into Failure, CRC Press 2011
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • James Reason – Managing the Risks of Organizational Accidents, Ashgate 1997 (for Sharp End / Blunt End)
  • Three Assumptions We Need to Leave Behind

    Three Assumptions We Need to Leave Behind

    It is the night of 28 March 1979, shortly after four in the morning. In the control room at Three Mile Island, Unit 2, a light is on: pressure relief valve closed. The light says that because it doesn’t measure position. It displays the control signal, the command that was sent to the valve to close. What the valve is actually doing, nobody in the room knows. It has been open for two minutes and thirteen seconds, and it will stay open for the next two hours.

    In the hours that follow, the operators will do something that the later investigation will identify as the primary cause of the partial meltdown: they throttle back the emergency cooling. They do it because their instruments tell them the system is over-pressurised, and because their training has taught them to avoid exactly that condition. They act rationally given what they see. In the days that follow, the press will speak of “human error.”

    This reflex (the diagnosis of “human error” that follows a scene like this almost automatically) sits behind most of the safety conversations I have in consulting practice. Not because those involved are unwise. But because three assumptions are so deeply embedded in our safety tradition that they pass as common sense. We read them differently. What follows are three counter-positions, one per assumption.

    “Human error” is a diagnosis, not a finding

    Anyone working in this field knows the statistic: 80 to 90 percent of all incidents are attributed to “human error.” The number has been cited since the 1980s in talks, audits, executive reports, and it works: it makes plausible that the answer to safety problems must lie with people. More training, clearer standards, stricter discipline. The logic is clean: if the problem sits in the cockpit, the solution must sit in the cockpit too.

    The problem with this logic isn’t the statistic. It’s the interpretation. Sidney Dekker puts it in his Field Guide so sharply it hurts: “human error” is never the end of an investigation, it is the beginning. Whoever explains incidents this way has stopped asking: they have found a label and settled into it. Local rationality, the concept Dekker keeps sharpening, says: nobody comes to work intending to take a reactor into meltdown, harm a patient, or bring an aircraft down. What looks like failure from the bird’s-eye view of an investigation made sense at the moment of action, given what the person could see, given the pressure, given the training.

    Reconstructing that sense is the actual work.

    Hollnagel adds a second thread. His Safety-II argument runs, simplified: the same thing we call “failure” is the other side of an adaptive capacity without which the system wouldn’t function for an hour. People accomplish daily what procedures cannot accomplish on their own: they interpret context, they improvise when reality diverges from the script assumption (which it does constantly), they fill the gaps that designers and rule-books have left open. Whoever treats people as a weak point cuts themselves off from the only real source of resilience the system has.

    Back in the TMI control room, read through this lens: the operators throttle the emergency cooling because their instruments say the system is over-pressurised, and because their training has sensitised them to exactly that risk. At the moment of action, their decision is the only coherent interpretation of the data available to them. That we know today the valve was open and the system under-pressurised rather than over: that is information of the investigation, not information the operators had. This asymmetry between investigator and actor, “hindsight bias” in the research vocabulary, is not a methodological cosmetic flaw. It is the structural condition under which every incident investigation operates. Whoever doesn’t reflect on it sees in every past what those involved could have done. And overlooks what they actually could see.

    In training sessions, I now routinely ask participants: what is the most frequent cause of incidents and accidents in your operation? The answer comes every time, without exception: human error. It comes fast, it comes self-evidently, and it comes before the actual work of the training has begun. Over the hours that follow, there is regularly a moment when something dawns on the participants. And it isn’t a new term, no additional tool, but a shift of perspective: their own incident investigations, as they themselves recognise, have ended exactly where they should have begun. What that costs isn’t only a weaker investigation. It is the willingness of employees to report anything at all next time.

    The question that interests us more than “How do we prevent human errors?” is this: How does our system support the adaptive work people have to do for it to function at all?

    Human error is never an explanation. It is a diagnosis that says more about those diagnosing than about the incident.

    Compliance is a minimum, not safety

    The second assumption follows the first like a shadow. If people are the risk, then regulations, audits, and certifications are the instruments of control. Safety becomes a question of whether the right boxes are ticked. Executive teams read safety KPIs (lost-time injury rate, audit findings, training completion rates) and draw conclusions about the state of the organisation. The governance is clear, the reporting is clean, the responsibility is distributed. There is a reason this model survives so robustly: it interfaces well with law, insurance, and corporate reporting.

    The model has just one problem: compliance and safety regularly come apart. Boeing’s 737 MAX held FAA certification, a compliance status that was green by every auditable measure. And an MCAS system whose malfunction cost 346 people their lives. The Bristol Heart Scandal of the 1990s revealed a hospital whose internal safety indicators showed no clear anomalies, while paediatric cardiac surgery mortality had climbed to twice the British average. In both cases the signals were reported, by insiders no one wanted to listen to, because the compliance picture was clean.

    What happens between the audits is the actual safety story. Diane Vaughan, in her study of the Challenger disaster, coined a term for it: “normalisation of deviance.” Drift rarely arises as deliberate rule-breaking. It arises because, under real conditions, the system gradually departs from the norm (a small tolerance here, a step shortened in time there) and because these deviations mostly turn out fine. Every repetition without consequence widens the bandwidth of the acceptable, without anyone ever having made a conscious decision. From the audit perspective, this drift is invisible: on audit day the picture aligns again, because everyone knows what to show. From the perspective of learning capacity, it would be visible, if the organisation had the mechanisms to see it.

    What these cases share is not a compliance failure. It is a learning failure. Compliance is a property of a moment: it says that at time X rule Y was being followed. Safety is a property of a process: it says that the organisation is able to pick up weak signals, revise assumptions, and correct its own behaviour, before the next audit date enters the stage. The one is a state, the other is a capability. An organisation can be fully compliant at any given moment and at the same time completely blind to the drift it is in.

    The operational question that follows from this is not “Are we compliant?” It is: Do weaknesses become visible without being punished? Are near-misses treated as learning opportunities, or as reputational risks? Does the system get smarter after every incident, or just more defensive? Just culture, in the precise sense of Reason and Dekker, is the precondition. It is not the poster in the break room.

    It is the lived answer to what happens when someone admits something they could have kept quiet about.

    Standardisation creates brittleness, not resilience

    The third assumption is the most stubborn, because it speaks most directly to the safety reflex. When something goes wrong, we raise the level of standardisation. We write the next step into the SOP, we narrow the latitude, we formalise what used to be a matter of experience. The underlying assumption is clean and mechanical: variation is defect, uniformity is safety. What does not behave deviantly cannot go wrong.

    The assumption holds for simple, linear systems. It does not hold for the systems we deal with in HRO-adjacent contexts. Erik Hollnagel uses a precise word for the consequence of this reflex: brittleness. An over-standardised system loses the capacity to adapt to conditions its designers did not anticipate. It functions exactly as long as reality follows the script. And reality never follows the script all the way. The moment deviation arrives, the system has no reserve, no improvisational capacity, no repertoire other than “continue as planned.”

    What the HOP movement around Todd Conklin and others has been showing since the 2010s is banal and consequential at once: every functioning shift deviates from the script daily. Nurses combine orders that formally were not designed to be combined, because the original procedure does not fit the specific situation. Industrial operators put in small workarounds because a tool is missing or a step under time pressure has to be skipped. Pilots interpret checklists in an order that fits the situation. These deviations are not the problem. They are the safety. They are what carries the system through the day at all.

    Behind this stands a deeper insight from the resilience-engineering tradition: safety is not the absence of variation, but the capacity to absorb it. David Woods calls this “graceful extensibility”: the question of how far a system can be stretched before it breaks, and how it behaves while being stretched. Over-standardisation optimises for the normal case and ignores exactly this question. It makes the system efficient under ideal conditions and prone to brittle failure under real ones.

    What tailoring means is exactly this: shaping the latitude rather than eliminating it. Setting guardrails (the limits beyond which it becomes dangerous) and, within those guardrails, allowing adaptability, making it visible, keeping it learnable. This is more demanding than a thick rule-book, because it requires trust, conversation, and contextual knowledge. It is also the only thing that works under conditions where variation cannot be eliminated. Pilots who set the manual aside can be heroes or culprits. What they are depends on the system, not on themselves.

    What this means for us

    From this follows the position we write from: safety does not arise when people adapt to systems, but when systems are designed so they can be adapted to people: continuously, in operation, not in the audit room. Exactly this tailoring (this ongoing adaptation under real conditions) is the craft we want to lay out here. Not because the New View line is fashionable. It has been established in the literature for more than two decades. But because the operational gap between it and daily practice is still wide.

    In practice this means: We write about incidents to reconstruct conditions: the conditions under which reasonable people made reasonable decisions that turned out, in retrospect, to be consequential. Methods we treat as craft, requiring practice, judgement, and contextual knowledge. Organisations we read as learning-capable (or learning-incapable) systems.

    Back to Three Mile Island, shortly after four in the morning. Three operators stand in front of indicators, one of which shows the control signal rather than the position. They follow their training, they throttle the emergency cooling, because under suspected overpressure the procedure asks for exactly that. We can read them as the weak point of the system, or as the last people that night who acted by the rules they had been given. Which interpretation we choose decides what we build differently next time.

    What we build differently here is not, in the first place, an indicator that shows position rather than control signal. It is the willingness to change the question: not “Who failed?”, but “What made this, in that moment, plausible?” This question is more demanding. It does not lead to a person who can be sanctioned. It leads to a system that has to be rebuilt.

    Sources

    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • Karl E. Weick & Kathleen M. Sutcliffe – Managing the Unexpected, 3rd ed., Wiley 2015
    • Charles Perrow – Normal Accidents: Living with High-Risk Technologies, Princeton University Press 1999 (on the TMI analysis)
    • Diane Vaughan – The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, University of Chicago Press 1996