Tag: Essay

  • What Happens Between Audits

    What Happens Between Audits

    An audit is a snapshot. It checks, on a fixed date, whether the documentation meets requirements and whether the procedures are described as they need to be described. What it does not check is whether the organisation sees its own weak signals and learns from near misses. The distinction sounds academic but is operationally consequential. It decides whether an organisation builds its safety on the audit date or in between.

    There is no need for polemic against audits. Audits do what they should. The serious question is what they structurally cannot do. And what needs to stand beside them so that what happens between audits does not become a blind spot for the organisation.

    What audits actually measure

    Audits measure conformity within a defined observation window. They check whether documentation at the time of inspection shows what it must show, and whether procedures are described as they must be described. This is a legitimate and non-trivial task. It has its place in the trust landscape that complex societies require. No one would board an airline aircraft without certification, no hospital operates without accreditation, no industrial operator runs without a regulatory framework. Audits produce this trust through a social procedure whose function Michael Power described precisely back in the late nineties in The Audit Society: they are “rituals of verification”, not measuring instruments for the property they claim to test. They produce a legible picture of order, and this picture is compatible with insurance, law and corporate reporting.

    Power’s point is not that audits are useless. It is that what they produce is not identical to what they appear to produce. A passed audit says that what was documented at the time of inspection met the requirements. It says nothing about whether the organisation sees the weak signals of its own practice, whether it learns from near misses, whether its adaptive capacity holds under real pressure. These properties are not documentable in the form an audit requires for its findings. They are processes, not states, and an audit is built to test states.

    Anyone who does not draw this distinction builds a conception of safety in which compliance and safety are the same thing. Before the Texas City explosion in 2005, BP had a “lost time injury” rate below the industry average, and an audit result confirming that number. What was deteriorating simultaneously in the same plant was process safety: a domain not captured by the prevailing KPIs. Andrew Hopkins described this in Failure to Learn with a clarity that still hurts: the organisation managed what it could measure, and screened out what evaded measurement. The audit confirmed the measurement.

    Audits check whether what is supposed to be documented is documented. They do not check whether the organisation sees what it should be seeing.

    Audit preparation becomes a permanent task

    From this structural property follows a second, which becomes visible in many organisations as soon as you look at how effort is distributed across the year. Audit preparation has become a permanent task, with its own resources, its own roles, its own quarterly rhythms. An internal compliance function that works all year toward a smooth external inspection. Pre-audits, mock-audits, action lists, “gap analyses” meant to anticipate the result. This isn’t dumb bureaucracy, but the understandable response to an audit regime that has grown denser, more formalised and more consequential over the past twenty years.

    The difficulty in this rationality arises where it displaces attention. Attention is finite, line time is finite, and what pays off in an audit competes for these exact resources with what does not appear in an audit. Erik Hollnagel describes this effect in Safety-II in Practice as a systematic reinforcement of Work-as-Imagined: the denser the specification, the more energy the organisation puts into maintaining the world of specification. The attention it spends there is missing from the observation of Work-as-Done. The gap between the two grows widest precisely where the most is documented, because the documentation creates its own reality that needs maintenance.

    The displacement happens without bad intent. It follows the mechanics of prioritisation under time pressure: an operational signal that isn’t audit-relevant gets moved to the back, and “to the back” means, on the annual rhythm, until the next quarter, when the same logic will apply again. At the moment of prioritisation, the shift is materially correct. Over time, it is a pattern: what doesn’t fit the audit form rarely comes back on the table.

    Concretely: in November, a shift lead reports an anomaly at a measurement point that doesn’t appear on any audit checklist. Her supervisor files it as “forward to Q1”. The Q4 audit runs cleanly. In Q1, preparation for the next audit cycle is already underway, the anomaly sits on a list nobody opens any more. In April, a different shift at the same measurement point has a near miss that is connected to the original report. Nobody sees the connection any more.

    To see this pattern, you have to step outside the audit logic. From within its sorting, every individual shift looks like clean work.

    What needs to stand beside it

    Abolishing audits is neither possible nor sensible. They have their function, they are built into the logics of insurance and regulation, they produce the trust a division-of-labour economy requires. What they cannot deliver has to be delivered alongside. Alongside, not instead.

    That isn’t an elegant answer. It demands a second layer that belongs in normal operations and works on a different logic from the audit layer. This second layer has a name in the resilience and HOP literature: Operational Learning. It is not the collection of Lessons Learned from incident reports. It is the ongoing reconciliation of the picture of work with what actually happens in operations, before it becomes an event.

    In Pre-Accident Investigations, Todd Conklin develops two tools designed exactly for this second layer. The first are Learning Teams: small, time-limited groups of operating personnel and a facilitator, who sit down for one to two hours after a near miss or a routine task. Their job is not to find a solution. It is to reconstruct what was actually done and compare it with what should have been done. The output is an observation, not an action item. Precisely this refusal of the action-item format is the condition under which the observation becomes sharp. Whoever starts looking for solutions immediately stops seeing.

    The second are Pre-Job Briefs. These are not the formal safety briefings everyone knows, but short structured conversations at the beginning of a non-routine task: what could go wrong. What trigger means we abort. Who has authority in which situation. The output is a shared mental model, not a list. A well-run Pre-Job Brief practice is hard to show in an audit, because it leaves no paper trail. It is effective in daily safety because it brings what can go wrong into the conversation before the doing.

    Both tools share a principle. They are oriented toward seeing, not toward steering. The audit layer steers what is visibly documented. The learning layer makes visible what doesn’t reach the steering. Steven Shorrock and Claire Williams, in their work on human factors in practice, call this the “professional curiosity” of a learning organisation: the willingness to keep reconciling one’s own picture of work with the actual work. This is not a one-off project. It is an ongoing practice.

    The distinction from what many organisations report as “Lessons Learned” is important. Lessons Learned are an output format: what we take from a closed incident, formulated as an action item or insight, filable in a system, citable in the next audit. Operational Learning in Conklin’s sense is not an output but an ongoing mode in which the organisation continuously refines its picture of work. One closes something. The other holds something open.

    One example shows the difference most sharply. A near miss in a control room: an operator categorises an alarm differently from how the designers intended. In the Lessons-Learned format, this becomes an action item. “Refine alarm labelling, retrain operator.” Done, ticked off, filable in the system. In the Operational-Learning format, it becomes an observation. “Operator under load X reads the alarm in the context of other signals differently from how the designer assumed. The pattern recurs under conditions Y. What we don’t know is which contextual cues drive the interpretation.” A question instead of a result. Rather than closing the investigation, it keeps it open.

    Whoever forces Operational Learning into the Lessons-Learned format has just lost the concept.

    Why this is harder than it sounds

    The audit logic and the learning logic compete for the same resource: the attention of the line. The audit logic almost always wins this competition because its consequences are short-term and visible. A failed audit pulls inquiries, reports, justifications upward. A skipped Learning Team session pulls nothing. It simply doesn’t happen, and nobody notices, until an event happens whose connection with the omitted learning work can no longer be cleanly shown.

    On top of that: Operational Learning has no pretty KPI. An organisation can count the number of Learning Teams conducted, but the moment it does, it starts fulfilling the format rather than using it. Charles Goodhart described this effect in 1975 for economic regulation: once a metric becomes the target of control, it loses the property that made it a good metric. This makes the learning layer awkward in upward reporting, and awkwardness is a scarce property in organisations under efficiency pressure.

    Whoever wants to build this second layer accepts that it isn’t reportable in the same language as the audit layer. It requires a leadership willing to release time and protected spaces without immediately demanding an impact measurement. It requires a line that recognises observation as a legitimate activity in its own right, not merely as a means to an action-item end. This is the more demanding form of safety work, and it is precisely the kind that happens between the audits, or doesn’t.

    Where the audit belongs

    Audits will stay because they are functional. What is missing beside them is the second layer: an ongoing learning practice that doesn’t replace the audit but takes on the safety work the audit structurally cannot do. Only with this second layer does the audit become what it should be: a confirmation of the state the organisation knows. Not the principal source of its safety knowledge.


    Sources

    • Michael Power – The Audit Society: Rituals of Verification, Oxford University Press 1997
    • Todd Conklin – Pre-Accident Investigations: An Introduction to Organizational Safety, Ashgate 2012
    • Erik Hollnagel – Safety-II in Practice: Developing the Resilience Potentials, Routledge 2018
    • Andrew Hopkins – Failure to Learn: The BP Texas City Refinery Disaster, CCH Australia 2008
    • Steven Shorrock & Claire Williams – Human Factors and Ergonomics in Practice, CRC Press 2017
  • People are not the weak link

    People are not the weak link

    15 January 2009. US Airways Flight 1549 takes off from LaGuardia at 3:25 p.m., heading for Charlotte. Just under a minute and a half later, at 2,800 feet over Manhattan, the Airbus A320 flies into a flock of Canada geese. Both engines lose thrust, almost simultaneously. What Captain Chesley “Sully” Sullenberger and First Officer Jeffrey Skiles do in the next three minutes is in no manual. There is no checklist for “dual engine flameout at 2,800 feet over Manhattan”. The engine restart procedure they work through for form’s sake is designed for altitudes above 20,000 feet. It doesn’t fit in the very first step. Sully decides not to bring the aircraft back to Teterboro airport as the tower suggests (he sees in twenty seconds that it won’t reach), but to set down on the Hudson. A decision no procedure provides for, because no procedure can. All 155 people on board survive.

    In the later NTSB analysis it’s calculated that the aircraft could have reached Teterboro, had the crew turned immediately, without attempting the engine restart, without spending the seconds in which a human tries to assess the impossible. “Could have reached”, under conditions no one in the cockpit knew: a crew in the simulator, prepared for the scenario, with engine data that no one could realistically have had. Sully himself said in the hearing: It was not realistic. He was right.

    This is the story that became a classic. It’s shown in trainings, quoted in talks, shared on LinkedIn. What’s rarely said about it is the place where it becomes uncomfortable, the place where praise for the captain and the safety logic of our industry should fall apart.

    What saved Sully that day wasn’t the procedure. It was the willingness to set the procedure aside the moment it became clear it didn’t fit. It was the experience, built over thousands of flight hours, to position an aircraft in seconds against a geography he knew. It was a cockpit in which two people could communicate quickly and without hierarchical friction. And it was an organisation that had built enough trust over the preceding years that a captain took responsibility for a water landing. And was not rebuked afterwards for departing from the script.

    In the dominant safety logic of our time, exactly this moment is an anomaly. “Human error” is the standard explanation for most incidents. What do we call what Sully did, in the same language?

    The usual diagnosis

    The usual diagnosis after an incident goes predictably. It runs in two steps: first “human error”, then “more standardisation”. Who should have done it better, what should they have done, which procedure wasn’t followed? The vocabulary is well-rehearsed, the conclusion usually stands before the investigation: more precise manual, sharper training, stronger compliance.

    What this logic doesn’t reach is the asymmetry between what counts as “failure” and what gets registered as “success”. Sully is a hero today. The moment he left the restart checklist, every formally driven investigation would have had to read him as “procedural deviation under pressure”. Had the aircraft not reached the Hudson, Sully would today be an example of “inadequate procedural compliance”. The story hangs on the outcome, not the action.

    This is exactly where Erik Hollnagel’s point in Safety-II in Practice becomes operational: the same behaviour we classify as failure after an incident is the condition under which the system makes it through most days. People continuously adapt procedures to a reality in which the procedures don’t fit. When things go well, no one talks about it. When things go wrong, the adaptation becomes the symptom that needs to be prevented.

    This isn’t a methodological cosmetic flaw of incident investigations. It’s the structural foundation of a safety logic in which the term “human error” doesn’t describe what happened, but what shouldn’t have happened. A diagnosis that always already knows where the problem lies (with the human), and accordingly stops learning.

    It would be too easy to file this reading pattern as mere knowledge lag. The Old View doesn’t survive because its proponents have read too little. It survives because it serves a set of institutional needs very efficiently. It delivers clear attribution: one person, one fault, one closed case. It fits insurance and liability logic, which asks about individual responsibility. It’s representable in executive reporting without loss in translation: Employee X didn’t follow procedure Y, training Z is the answer. Above all, it minimises the need to question the system itself (and with it the decisions of those who designed it). Arguing against all of this is not primarily a matter of better knowledge. It’s a matter of who bears the cost of the shift.

    The unspoken truth

    If we look honestly at an average workday in a high-risk organisation, we don’t see what’s in the manual. We see thousands of small adjustments, most of which are never written down. And without which the system wouldn’t survive.

    A nurse combines orders because the original procedure doesn’t fit the specific situation. An industrial operator takes a step early because the tool named in the procedure is currently out for maintenance. A pilot follows the checklist in an order more fitting to the situation than the one prescribed in the manual. A firefighter sets the nozzle two metres closer than standard formation would dictate, because he reads the geometry of the fire differently.

    What Hollnagel calls the efficiency-thoroughness trade-off (the constant balancing between effort and thoroughness that cannot be trained away under real conditions) is not an exception. It’s the form in which work is done. Steven Shorrock, in his articles on humanisticsystems.com, accordingly speaks of adjustments as the actual substance of safety: the constant, invisible stream of small corrections through which procedures stay connected to reality.

    These adjustments don’t enter the statistics anywhere. They don’t show up in safety KPIs. They aren’t part of compliance reports. They happen because they have to. And because no one talks about them, no one knows how many there are daily or what they rest on. The organisation depends on a resilience whose existence it doesn’t officially acknowledge.

    Exactly what the safety logic demands (strict procedural adherence) is what undermines safety under real conditions.

    What Old View thinking costs

    As long as the official logic addresses people as the weak link, this invisible adaptive work has an implicit status: it’s tolerated as long as nothing happens, and sanctioned the moment something does. This has two consequences, which together hollow out the organisation’s learning system.

    First, employees learn (quickly, in every operation) that adjustments are best left undocumented. Whoever does something that deviates from the procedure and records it in a report risks consequences that don’t lie in the adjustment itself, but in the fact that it became visible. The rational response is not to make it visible. This costs the organisation its only access to the question of how it actually works.

    Second, the employees whose adaptive work carries the system are simultaneously the ones to whom responsibility is assigned when the system nevertheless fails. This isn’t just unfair. It’s destructive. It trains people to think less, observe less, compensate less, because every compensation, if it becomes visible, can become an accusation.

    What would work instead

    The alternative isn’t: abolish procedures. The alternative is to treat procedures as what they are: a first approximation to a complex reality that must be recalibrated in every single application. What happens between procedure and application isn’t a defect. It’s the place where safety is produced.

    In operational terms: make adjustment visible without elevating it to a new rule. An organisation that regularly asks where did we deviate from the procedure this week, why, and with what result learns something that audits can’t deliver. It learns how its work is actually done. Whoever doesn’t want to hear the answer shouldn’t ask. Whoever wants to hear it must be willing to adjust the procedure when needed, not the person who went around it in the moment of truth.

    Todd Conklin’s HOP line makes a tool of this insight: Learning Teams instead of Investigations, Pre-Job Briefs instead of formalised Job Safety Analyses, Operational Learning instead of Root Cause Analysis. The shift in vocabulary isn’t cosmetic. It shifts the question from “who failed?” to “what haven’t we understood yet, and how do we understand it better next time?”.

    In practice these are small, regular formats that work below the threshold of a formal investigation. A weekly Learning Team of 30 minutes, in which someone briefly tells where the procedure didn’t fit this week, without consequences, without documentation. A Pre-Job Brief before an unusual operation that asks what’s different this time and which assumption won’t hold today. An After-Action Review even after ordinary days, because every ordinary day contains something learnable. These formats are well known. They fail in most organisations not for lack of knowledge, but because they lead to nothing without psychological safety. Whoever admits something in the learning group that could later end up in their personnel file stays silent. And the learning group decays into an empty ritual.

    This names a precondition that doesn’t exist in many organisations: that speaking without punishment is possible. Just Culture in the strict sense. Without this precondition, everything else stays cosmetic: HOP as well, Safety-II as well, the friendliest learning group in the world as well. With it, the invisible adaptive work becomes what it could be: the learning source of an organisation that wants to speak honestly about its own operations.

    The uncomfortable question

    Back to Sully, shortly after 3:31 p.m., one of the most unusual water landings in civil aviation. The story became a Hollywood film, the captain became a hero. What gets forgotten in the telling is the question that runs along in the background: in which organisation could he have done that, without being sanctioned afterwards for the procedural deviation?

    In most high-risk industries the honest answer is: not in many. Whoever consistently treats their employees as the weak link will end up with employees who become exactly that, not out of malice, but out of self-protection. They will stop deviating from scripts, and resign themselves to the fact that a day on which something unforeseen happens simply won’t be a good day, because they have nothing left in hand that isn’t in the manual.

    If safety is what we want, we have to stop reading the human as the problem to be fixed. We have the choice: either we treat adaptive work as what it is (the invisible substance of our safety), or we talk it away until there’s no one left to put the next aircraft down on the Hudson.

    Sources

    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Steven Shorrock – Articles on humanisticsystems.com (Work-as-Done, Adjustments)
    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • NTSB – Accident Report AAR-10/03, Loss of Thrust in Both Engines After Encountering a Flock of Birds and Subsequent Ditching on the Hudson River, US Airways Flight 1549, 2010
  • What it means to tailor a system

    What it means to tailor a system

    A tailor’s shop, somewhere downtown. A back room, two mirrors, a table covered with bolts of fabric, chalk and pins on a board. A man stands on a low platform, in a rough cut of light wool, and the tailor walks around him. He doesn’t just take measurements. He observes. He sees how the customer shifts his weight, how he holds his shoulders, whether the seam at the back pulls to one side. A chalk mark where it doesn’t yet sit right. Then the measuring tape again, then a stitch, then a fitting. Adjust. Try again.

    What’s happening here isn’t fitting a suit. It’s a conversation between fabric, body, and habit. The tailor knows that no human stands exactly the way the pattern assumes. He knows the seams meant to sit centred will shift the moment the person moves. He plans for it. He builds in reserve at points where he knows the fabric needs room to settle. He isn’t surprised when the customer has to come back twice more. That’s his craft.

    What does this workshop have to do with safety? That’s the question this magazine owes its name to. I made the case in the opening article Three Assumptions We Need to Leave Behind; in short: safety doesn’t arise when people adapt to systems, but when systems are designed so they can be adapted to people. Tailoring Safer Systems. Measure, draft, fit, wear, adjust. The cycle repeats, only with different material. And just as in tailoring, it isn’t a one-off act but a stance.

    What this stance asks of us, in concept and tool, I want to lay out here. Three principles, each with a term you’ll recognise from the literature.

    Measure, don’t assume

    The tailor who doesn’t put down the measuring tape knows something that many safety departments treat as needless effort: that reality isn’t in the pattern.

    Steven Shorrock and Claire Williams, in Human Factors and Ergonomics in Practice, frame the distinction that’s been at the centre of the human-factors tradition since Hollnagel so simply that it works as a test. Work-as-Imagined is the picture designers, auditors, and executives have of how work gets done. Work-as-Done is what people actually do. Between them there’s regularly a gap. The question isn’t whether the gap exists. It always does. The question is whether the organisation knows it.

    Whoever doesn’t know it tailors into the assumption. They design procedures on the basis of what their model says. And the model says what’s convenient, what’s legible by audit standards, what sounds executive-ready. The procedure fits the assumption, not the practice. Within a short time, practice and procedure drift apart without anyone noticing, because no one ever measured how the fabric actually hangs.

    What measuring actually means isn’t spectacular. It means observing. It means walking the floor: what the Lean tradition calls a Gemba Walk, and what the safety world circulates under terms like “operational learning visit”. It means shadowing across more than one shift. It means asking questions open enough that they don’t contain the answer: not “Do you stick to the procedure?”, but “When was the last time the procedure didn’t fit your situation, and what did you do instead?”

    These questions regularly produce answers no one wants to hear. People describe workarounds that look like violations to compliance and look like the only way through to the system, on a day when a tool is missing, a stand-in is new, the plant has been moody since the update. The temptation is to read these answers as defect. And to close the case there. The work is to read them as finding.

    Whoever measures accepts what they see. What they see is regularly not what’s in the pattern. That’s exactly why they’re there.

    Measuring isn’t compliance on trial. It’s the willingness to see something that contradicts your own assumption.

    Respect the fabric

    Not every fabric can be tailored any way you like. Treat a soft knit like a firm wool and the seam won’t hold. The tailor knows the material’s properties before drafting the cut, and adapts the design to the fabric, not the other way around.

    Transposed to organisations: context, culture, and history are the material with which a system is tailored. What works in an airline where Crew Resource Management has been embedded practice for decades doesn’t translate directly to an industrial organisation where hierarchies are lived differently and “Stop the Line” still has to be explained as a concept. What takes hold on a ward where the unit lead has built a reporting culture over years runs into nothing on another ward, where every report passes through two layers of HR before anyone gets to see it.

    David Snowden’s Cynefin framework helps at this point. Simply put, it distinguishes between two kinds of problems: complicated and complex. Complicated problems are those where the link between cause and effect can be made visible with enough expertise: a machine, an accounting system, a construction plan. Best practices work here. Complex problems are those where cause and effect are only readable in hindsight, because the system shifts on every intervention. Culture, risk behaviour, learning capacity belong in this category. Best practices don’t work here. What worked in one organisation isn’t guaranteed to work in the next.

    The most common mistake in safety programmes I work with is mixing these two up. A proven concept from a best-practice collection gets sold as a universal solution, draped over an organisation made of different fabric. And everyone’s surprised when the seam doesn’t hold. What the organisation needed wasn’t the solution. It was the diagnosis: what kind of fabric is in front of us?

    Respecting the fabric doesn’t mean finding everything fine as it is. It means checking the cut against the material before reaching for the scissors. Whoever skips that builds a safety programme that fits the quarterly report, not the practice.

    Build adjustment in

    A good cut has give. The tailor doesn’t pull the fabric so tight that it tears at the first breath. He knows the body changes, the day changes, the fabric settles after the first few wearings. He builds that in. Where he leaves room, where he doesn’t, is craft. Eliminate the give, bind yourself to the exact measurement, and you get a garment that fits exactly once. Not the next moment.

    Erik Hollnagel’s work has circled this insight in safety language for years. In FRAM (the Functional Resonance Analysis Method), he argues against linear incident models that read variation as defect. Variation, Hollnagel writes, isn’t the opposite of function. It’s a condition of function. Complex socio-technical systems work because their components (people, tools, procedures) are flexible enough to respond to conditions that aren’t in the plan. When the plan tries to switch off this variation, it switches off adaptive capacity at the same time.

    In practice this means: a good procedure describes not only the intended path, but makes visible the conditions under which it holds. It knows the assumptions it makes, and it knows the places where it will break if those assumptions fail. A good procedure is aware of its limits. More than that: a good system keeps resources free that aren’t tied to the plan (slack in the staffing plan, time in the shift, room in the communication), because without these no adjustment is possible. What looks like inefficiency is the precondition for the system to make it through the day on which reality departs from the plan. And it departs. Every day.

    Building adjustment in means giving the system permission to adjust. Not afterwards, in the case of damage, but beforehand, in the design. It means shaping room deliberately rather than tolerating it by default. And it means making visible what otherwise stays hidden: that the workarounds no one admits to are often the last adjustments an over-standardised system still allows.

    What tailoring isn’t

    Off-the-rack sits in the warehouse waiting for someone it fits. It’s efficient, it’s cheap, it’s clean in the reporting. It’s a complete solution as long as the measurement is right. When it isn’t, it becomes the source of a quiet compromise: the person adapts to the suit, holds their shoulders differently, breathes shallower, moves as if they belonged in the pattern. For a while, this goes well.

    Safety from the compliance catalogue works on exactly this logic. It comes with finished procedures, standardised KPIs, audit templates that fit everything because they look at nothing. The problem isn’t that it’s structured. The problem is that it takes its own description of the system for the system. When reality departs from it (and it does), no adjustment is provided for in the catalogue. What remains is the admonition to please stick to the procedure.

    In contrast stands the tailor who doesn’t put down the measuring tape. Who knows he’ll have to come back twice. Who respects the give in the fabric. Who doesn’t finish the cut today, but draws it in conversation with what’s in front of him. Who accepts that the end product isn’t perfect on the first attempt, and that adjustment is part of the craft, not an admission of error.

    This is what Tailoring Safer Systems means. Shape room rather than eliminate it. Make adjustment visible so the system can learn from it. This is harder work than a dense catalogue. It’s also the only thing that works under conditions where the next measurement is already a different one.

    Sources

    • Steven Shorrock & Claire Williams (Eds.) – Human Factors and Ergonomics in Practice: Improving System Performance and Human Well-Being in the Real World, CRC Press 2017
    • Erik Hollnagel – FRAM: The Functional Resonance Analysis Method – Modelling Complex Socio-technical Systems, Ashgate 2012
    • David J. Snowden & Mary E. Boone – A Leader’s Framework for Decision Making, Harvard Business Review, November 2007
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
  • Three Assumptions We Need to Leave Behind

    Three Assumptions We Need to Leave Behind

    It is the night of 28 March 1979, shortly after four in the morning. In the control room at Three Mile Island, Unit 2, a light is on: pressure relief valve closed. The light says that because it doesn’t measure position. It displays the control signal, the command that was sent to the valve to close. What the valve is actually doing, nobody in the room knows. It has been open for two minutes and thirteen seconds, and it will stay open for the next two hours.

    In the hours that follow, the operators will do something that the later investigation will identify as the primary cause of the partial meltdown: they throttle back the emergency cooling. They do it because their instruments tell them the system is over-pressurised, and because their training has taught them to avoid exactly that condition. They act rationally given what they see. In the days that follow, the press will speak of “human error.”

    This reflex (the diagnosis of “human error” that follows a scene like this almost automatically) sits behind most of the safety conversations I have in consulting practice. Not because those involved are unwise. But because three assumptions are so deeply embedded in our safety tradition that they pass as common sense. We read them differently. What follows are three counter-positions, one per assumption.

    “Human error” is a diagnosis, not a finding

    Anyone working in this field knows the statistic: 80 to 90 percent of all incidents are attributed to “human error.” The number has been cited since the 1980s in talks, audits, executive reports, and it works: it makes plausible that the answer to safety problems must lie with people. More training, clearer standards, stricter discipline. The logic is clean: if the problem sits in the cockpit, the solution must sit in the cockpit too.

    The problem with this logic isn’t the statistic. It’s the interpretation. Sidney Dekker puts it in his Field Guide so sharply it hurts: “human error” is never the end of an investigation, it is the beginning. Whoever explains incidents this way has stopped asking: they have found a label and settled into it. Local rationality, the concept Dekker keeps sharpening, says: nobody comes to work intending to take a reactor into meltdown, harm a patient, or bring an aircraft down. What looks like failure from the bird’s-eye view of an investigation made sense at the moment of action, given what the person could see, given the pressure, given the training.

    Reconstructing that sense is the actual work.

    Hollnagel adds a second thread. His Safety-II argument runs, simplified: the same thing we call “failure” is the other side of an adaptive capacity without which the system wouldn’t function for an hour. People accomplish daily what procedures cannot accomplish on their own: they interpret context, they improvise when reality diverges from the script assumption (which it does constantly), they fill the gaps that designers and rule-books have left open. Whoever treats people as a weak point cuts themselves off from the only real source of resilience the system has.

    Back in the TMI control room, read through this lens: the operators throttle the emergency cooling because their instruments say the system is over-pressurised, and because their training has sensitised them to exactly that risk. At the moment of action, their decision is the only coherent interpretation of the data available to them. That we know today the valve was open and the system under-pressurised rather than over: that is information of the investigation, not information the operators had. This asymmetry between investigator and actor, “hindsight bias” in the research vocabulary, is not a methodological cosmetic flaw. It is the structural condition under which every incident investigation operates. Whoever doesn’t reflect on it sees in every past what those involved could have done. And overlooks what they actually could see.

    In training sessions, I now routinely ask participants: what is the most frequent cause of incidents and accidents in your operation? The answer comes every time, without exception: human error. It comes fast, it comes self-evidently, and it comes before the actual work of the training has begun. Over the hours that follow, there is regularly a moment when something dawns on the participants. And it isn’t a new term, no additional tool, but a shift of perspective: their own incident investigations, as they themselves recognise, have ended exactly where they should have begun. What that costs isn’t only a weaker investigation. It is the willingness of employees to report anything at all next time.

    The question that interests us more than “How do we prevent human errors?” is this: How does our system support the adaptive work people have to do for it to function at all?

    Human error is never an explanation. It is a diagnosis that says more about those diagnosing than about the incident.

    Compliance is a minimum, not safety

    The second assumption follows the first like a shadow. If people are the risk, then regulations, audits, and certifications are the instruments of control. Safety becomes a question of whether the right boxes are ticked. Executive teams read safety KPIs (lost-time injury rate, audit findings, training completion rates) and draw conclusions about the state of the organisation. The governance is clear, the reporting is clean, the responsibility is distributed. There is a reason this model survives so robustly: it interfaces well with law, insurance, and corporate reporting.

    The model has just one problem: compliance and safety regularly come apart. Boeing’s 737 MAX held FAA certification, a compliance status that was green by every auditable measure. And an MCAS system whose malfunction cost 346 people their lives. The Bristol Heart Scandal of the 1990s revealed a hospital whose internal safety indicators showed no clear anomalies, while paediatric cardiac surgery mortality had climbed to twice the British average. In both cases the signals were reported, by insiders no one wanted to listen to, because the compliance picture was clean.

    What happens between the audits is the actual safety story. Diane Vaughan, in her study of the Challenger disaster, coined a term for it: “normalisation of deviance.” Drift rarely arises as deliberate rule-breaking. It arises because, under real conditions, the system gradually departs from the norm (a small tolerance here, a step shortened in time there) and because these deviations mostly turn out fine. Every repetition without consequence widens the bandwidth of the acceptable, without anyone ever having made a conscious decision. From the audit perspective, this drift is invisible: on audit day the picture aligns again, because everyone knows what to show. From the perspective of learning capacity, it would be visible, if the organisation had the mechanisms to see it.

    What these cases share is not a compliance failure. It is a learning failure. Compliance is a property of a moment: it says that at time X rule Y was being followed. Safety is a property of a process: it says that the organisation is able to pick up weak signals, revise assumptions, and correct its own behaviour, before the next audit date enters the stage. The one is a state, the other is a capability. An organisation can be fully compliant at any given moment and at the same time completely blind to the drift it is in.

    The operational question that follows from this is not “Are we compliant?” It is: Do weaknesses become visible without being punished? Are near-misses treated as learning opportunities, or as reputational risks? Does the system get smarter after every incident, or just more defensive? Just culture, in the precise sense of Reason and Dekker, is the precondition. It is not the poster in the break room.

    It is the lived answer to what happens when someone admits something they could have kept quiet about.

    Standardisation creates brittleness, not resilience

    The third assumption is the most stubborn, because it speaks most directly to the safety reflex. When something goes wrong, we raise the level of standardisation. We write the next step into the SOP, we narrow the latitude, we formalise what used to be a matter of experience. The underlying assumption is clean and mechanical: variation is defect, uniformity is safety. What does not behave deviantly cannot go wrong.

    The assumption holds for simple, linear systems. It does not hold for the systems we deal with in HRO-adjacent contexts. Erik Hollnagel uses a precise word for the consequence of this reflex: brittleness. An over-standardised system loses the capacity to adapt to conditions its designers did not anticipate. It functions exactly as long as reality follows the script. And reality never follows the script all the way. The moment deviation arrives, the system has no reserve, no improvisational capacity, no repertoire other than “continue as planned.”

    What the HOP movement around Todd Conklin and others has been showing since the 2010s is banal and consequential at once: every functioning shift deviates from the script daily. Nurses combine orders that formally were not designed to be combined, because the original procedure does not fit the specific situation. Industrial operators put in small workarounds because a tool is missing or a step under time pressure has to be skipped. Pilots interpret checklists in an order that fits the situation. These deviations are not the problem. They are the safety. They are what carries the system through the day at all.

    Behind this stands a deeper insight from the resilience-engineering tradition: safety is not the absence of variation, but the capacity to absorb it. David Woods calls this “graceful extensibility”: the question of how far a system can be stretched before it breaks, and how it behaves while being stretched. Over-standardisation optimises for the normal case and ignores exactly this question. It makes the system efficient under ideal conditions and prone to brittle failure under real ones.

    What tailoring means is exactly this: shaping the latitude rather than eliminating it. Setting guardrails (the limits beyond which it becomes dangerous) and, within those guardrails, allowing adaptability, making it visible, keeping it learnable. This is more demanding than a thick rule-book, because it requires trust, conversation, and contextual knowledge. It is also the only thing that works under conditions where variation cannot be eliminated. Pilots who set the manual aside can be heroes or culprits. What they are depends on the system, not on themselves.

    What this means for us

    From this follows the position we write from: safety does not arise when people adapt to systems, but when systems are designed so they can be adapted to people: continuously, in operation, not in the audit room. Exactly this tailoring (this ongoing adaptation under real conditions) is the craft we want to lay out here. Not because the New View line is fashionable. It has been established in the literature for more than two decades. But because the operational gap between it and daily practice is still wide.

    In practice this means: We write about incidents to reconstruct conditions: the conditions under which reasonable people made reasonable decisions that turned out, in retrospect, to be consequential. Methods we treat as craft, requiring practice, judgement, and contextual knowledge. Organisations we read as learning-capable (or learning-incapable) systems.

    Back to Three Mile Island, shortly after four in the morning. Three operators stand in front of indicators, one of which shows the control signal rather than the position. They follow their training, they throttle the emergency cooling, because under suspected overpressure the procedure asks for exactly that. We can read them as the weak point of the system, or as the last people that night who acted by the rules they had been given. Which interpretation we choose decides what we build differently next time.

    What we build differently here is not, in the first place, an indicator that shows position rather than control signal. It is the willingness to change the question: not “Who failed?”, but “What made this, in that moment, plausible?” This question is more demanding. It does not lead to a person who can be sanctioned. It leads to a system that has to be rebuilt.

    Sources

    • Sidney Dekker – The Field Guide to Understanding Human Error, 3rd ed., CRC Press 2014
    • Erik Hollnagel – Safety-II in Practice, Routledge 2018
    • Todd Conklin – Pre-Accident Investigations, Ashgate 2012
    • Karl E. Weick & Kathleen M. Sutcliffe – Managing the Unexpected, 3rd ed., Wiley 2015
    • Charles Perrow – Normal Accidents: Living with High-Risk Technologies, Princeton University Press 1999 (on the TMI analysis)
    • Diane Vaughan – The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, University of Chicago Press 1996
  • Measuring corporate culture in real time

    Measuring corporate culture in real time

    Corporate culture is rightly receiving ever-greater attention. It is described as the DNA of an organization, and there are as many corporate cultures as there are companies. But what makes a positive corporate culture? How can it actually be measured, and how can it be improved?

    In this article I introduce an innovative measurement method that is clearly superior to conventional employee surveys. It enables companies and business units, with minimal effort and low cost, to measure corporate culture practically in real time, to identify trends early, and to implement specific, targeted improvement measures.

    Corporate culture

    A culture is made up of shared values and convictions. In a corporate context, this can mean shared norms and values, a shared understanding of the company’s strategy, a particular way of communicating and interacting, and so on. Corporate culture is often described simply as “how we do things around here”.

    In a positive corporate culture, employees identify with the company’s strategy and values. They actively contribute to the company’s success. Good interaction among employees and supervisors enables them to contribute without fear of negative consequences, to express their opinions, to critically question processes and products, to bring in ideas, and to point out problems in the organization. For this they receive appreciation. They show a high level of loyalty and commitment to their employer and are willing to deliver peak performance. A positive corporate culture is visible not only inside the company but manifests outwards as well. Customers and potential employees notice it, and it constitutes a significant competitive advantage, not least in attracting talent.

    A positive corporate culture creates the foundation for the sharing of information, and thus for organizational learning and innovation. It enables a company to fully unfold the potential of its employees, and it means that the company’s know-how is not the sum of individual employee know-how, but a multiple of it.

    If corporate culture plays such an important role, we definitely want to know where our company stands. But how can we measure culture?

    Measuring corporate culture

    Many companies conduct annual or biennial employee surveys to measure employee satisfaction and corporate culture. This type of survey has several disadvantages:

    • Participating in the survey requires a significant time investment from employees.
    • The survey is a snapshot and is highly susceptible to external influences.
    • Evaluating the answers takes a long time and produces an overwhelming amount of information. The top-down transfer into the organizational units takes time, and any measures can only be taken long after the survey.
    • While survey results can be compared, the large interval between them makes it difficult to analyze the effectiveness of specific measures in isolation. The reasons for an improvement, an absence of change, or even a deterioration remain unclear.
    • Static surveys over several years no longer do justice to today’s dynamism in the corporate environment.
    Figure: Example question from a FRIDAY6 survey

    The collaboration platform LutherOne with its module FRIDAY6 [1] offers a solution. FRIDAY6 is an employee-survey instrument that works with weekly, intelligent mini-surveys of six questions [2] (statements to be rated on a Likert scale from 1 to 10). The questions come from a pool of 100-120 questions covering a wide range of topics and are assigned individually to each employee. The question sets therefore differ across employees, and employees receive different questions each week. Answering the questions takes 1-2 minutes and can be done conveniently on a computer or smartphone. Management thus receives a comprehensive weekly picture of the situation in the company and in the various business units. FRIDAY6 offers companies a number of advantages, some of which are outlined below:

    • FRIDAY6 can be tailored extensively to the specific needs of the company.
    • The results of the surveys appear weekly in a comprehensive management cockpit, with numerous dimensions relevant to corporate culture – company climate, leadership, trust, engagement, customer focus, strategy, and so on. The cockpit shows both the status quo and the corresponding trends.
    • The user-friendly presentation and handling, and the very low effort required to respond, ensure a high participation rate over time.
    • The close-meshed surveying and clearly structured dimensions allow specific development areas to be identified and targeted improvements to be made. The effectiveness of the improvement measures can be measured within a few weeks, and corrective action can be taken quickly.
    • The regular surveys lead by themselves to an improvement of corporate culture. Employee motivation rises because they can actively contribute. This effect is reinforced when they sense that they are being heard and that their feedback leads to improvements in the organization.
    • A reduced form of the management cockpit is accessible to all employees and creates transparency and trust.
    Figure: Management cockpit of the FRIDAY6 platform — sample dashboard (test platform)

    FRIDAY6 creates optimal conditions for measuring corporate culture and for initiating cultural change. Through its many dimensions, companies are able to take specific, smaller measures without getting lost in the complexity of cultural challenges. Especially in today’s environment, which is shaped by constant change, this is indispensable.

    Has this caught your interest? Get in touch – I’d be glad to discuss with you how you can measure the culture in your company in real time and continuously develop it.

    safety & risk solutions GmbH, Tel. +41 76 343 44 09 or email fabian.landherr@safetyrisksolutions.ch


    [1] The FRIDAY6 module is available stand-alone. There is no requirement to implement further modules of the LutherOne collaboration platform.

    [2] For smaller organizations, instead of FRIDAY6 with its weekly six questions, the monthly 16-question Monthly16 is available. This extends the evaluation cycle but delivers qualitatively better results.

  • Achieve outstanding performance with psychological safety!

    Achieve outstanding performance with psychological safety!

    Psychological safety of employees is one of the key factors for the sustainable success of companies. Employees who feel psychologically safe are willing to contribute their knowledge and ideas and enable a company to constantly learn and reinvent itself, thus gaining a decisive competitive advantage.

    The environment in which companies operate is subject to a constantly high level of dynamism. Not only in the area of (system) safety, but also in daily competition, companies are forced to reinvent themselves again and again. They must adapt to volatility, deal with uncertainty, successfully manage omnipresent complexity and make quick decisions in situations of ambiguity (VUCA) in order to successfully hold their own against their competitors.

    Employees at all levels are confronted with the same challenges when performing their tasks. In most cases, they are no longer just a cog in the system, exercising a specific, well-defined task at a predefined pace, but are confronted with dynamic situations and actively contribute to the continuous improvement of the system – be it in the area of innovation or safety/risk management. Employees are therefore no longer just resources performing a clearly defined task, but an important source of information within a company, which is the prerequisite for continuous learning and further development of the organization.

    In order to be able to make this contribution, the basis of a high level of psychological safety must be in place.

    The psychologically unsafe, toxic work environment

    In a toxic work environment, employees cannot express themselves openly for fear of negative consequences and withhold information. The work climate is characterized by mistrust. Whenever possible, they try to sweep mistakes under the carpet to avoid exposure. Knowledge is seen as power and not shared with colleagues. Managers no longer hear about what is going on in the company. The system falls silent and merely “functions”, although in retrospect there is no longer any question of functioning. There is a high risk that the company will slip unnoticed into a crisis. The lack of knowledge transfer nips the necessary further development of the company in the bud.

    The psychologically safe working environment

    A psychologically safe work environment is characterized by personal respect and appreciation. Employees feel safe and motivated to actively contribute and share information without fear of negative consequences, whether in the form of ideas or reports of problems and mistakes. There is positive collaboration and teams excel. The company has motivated and inspired employees who trust their colleagues and are actively involved in the further development of the organization. The innovation potential of the organization can be fully exploited. Conflicts within teams are used positively and seen as an enriching opportunity to learn from different perspectives and to move forward.

    A study conducted at Google identified five key factors for successful teams. These are (1) psychological safety, (2) clear roles and responsibilities within the team, (3) reliable colleagues, (4) personally meaningful work, and (5) the conviction to make an impact. Psychological safety emerged by far as the most important element, which formed the basis for the other four key factors.

    The benefits of high psychological safety

    The benefits of high psychological safety are many and can be felt at all levels within an organization. They include:

    • Fulfilment at work and, as a result, a high level of employee loyalty to the employer
    • Proven significantly better team performance
    • Constructive use of conflicts with the aim of improvement
    • Improved information flow within the company, which forms the basis for a learning organization
    • A positive and inspiring corporate culture, which enables a pronounced risk and safety culture
    • Improved resilience of the company
    • Increased innovation potential of the company

    If you look at the above list – which is still far from complete – no company can really afford not to put psychological safety high on its agenda. A toxic work environment creates immense damage in the form of daily inefficiency, high staff turnover, missed innovation, and even the demise of companies due to a crisis or loss of competitiveness. Improving psychological safety can make all the difference.

  • Measuring safety

    Measuring safety

    There is a lot of talk about measuring safety. That is something which is easier said than done. This article shares some reflections.

    Measuring what?

    Before starting to measure, one needs to know what one is measuring. How you define safety will determine what you measure and how you measure. Let us illustrate the problem with three quite common views on safety. As you will see, none of them covers the subject entirely and all have advantages and disadvantages.

    Safety as compliance

    A very basic way of thinking: safety is following the safety rules. Being compliant with these rules is being safe. This corresponds to the almost automatic reaction that many people have after an accident: if only they had followed the rules, this would not have happened. Many investigations therefore focus on breaches of protocol and deviations. Also, in ‘normal’ situations there is emphasis on compliance. Wear the mandatory safety gear. Hold the railing. Striving for compliance also appeals to the human tendency towards conformity. We are social creatures, after all.

    Safety rules are important. They are a basic form of how we teach safety: “Don’t touch the stove, it’s hot!” “Watch left, right, left before crossing the street.” These things we teach our kids, our workers, etc. Safety as compliance works reasonably well in rather simple, ordered and predictable systems. In these situations, you have a reasonable chance to foresee what can happen and conceive actions to deal with variations. If you are on known territory, you can deal with the things that happen by applying prescribed routines. Following ‘best practice’ means acting safely, while acting outside of these scripts is regarded as unsafe.

    Safety rules are not perfect, however. We live and work in a world with a lot of variability and we have a limited amount of foresight. This means that we cannot write rules for every eventuality. If we could, the rules would be impossible to handle because of their sheer volume. Besides, rules depend on context. In London it is smarter to look right, left, right before crossing, while this is not the best strategy for Zürich.

    Rules are compromises and may sometimes not be enough to keep you safe. Even if you follow all the traffic rules, you can have an accident. For example, when others do not follow the rules. In some situations, following rules is even the unsafe option. One (in)famous example is the Piper Alpha disaster where the people that followed the emergency procedures died while the ones who ignored the procedures and just jumped overboard survived.

    Safety as an absence of accidents

    Go out on the street and ask a hundred randomly chosen people, “What is safety?” Chances are that many will answer something in the line of “Not having any accidents”. Thinking this way makes intuitive sense to most people. It feels right because in our minds safety and accidents are very much linked. When we do not have any accidents, we have been safe. Or have we? Actually, not necessarily. That nothing has happened does not mean that things are safe. In many cases it only means that nothing has happened yet. Although it can very well be that nothing happens ever.

    A simple test is to reverse the definition and see whether it still works. Is “the absence of accidents is safety” true? Absence of accidents can be achieved by other ways. Randomness or luck are possible factors. Your definition of accident is another. Whether people choose to report accidents yet another. However, accidents do give an indication about safety, or rather unsafety. An accident can be regarded as a manifestation of risk, bringing us to the next definition.

    Safety as acceptable risk

    Whatever you do, there is some risk involved. We cannot avoid this. We even want some risk, but not too much. We need to compromise between various goals (financial, safety, production, quality, etc.), between uncertainty and control. We have only limited resources (money, time, expertise, etc.). Therefore, we must make trade-offs and search for balance.

    This view of safety appeals to rational creatures. It suggests deliberation and decision based on ‘facts’. We will always face risks; we just have to make sure that they are acceptably low. The question is therefore what the right level of risk is. We should obviously try to put as much ‘distance’ as possible between ourselves and the hazard and the possible negative futures the hazard could lead to. But we do not want too much distance either. It has to be practicable and affordable. Besides, some hazards we actually do desire. Just think of drinking coffee. We want our coffee hot, but we do not want to burn ourselves. Therefore, we tend to sip our coffee carefully at first, or maybe blow a bit on it, instead of gulping it down at once.

    The view of safety-as-acceptable-risk is useful, but there are also some drawbacks. One is its reliance on knowledge, another is how it can lead to quantitative approaches to risk that look more objective than they are, that it may lead to a static view of safety, and the problem of monitoring the risk level. Then there is of course the problem of who decides what is ‘acceptable’ and based on what. Who determines what is included in the assessment and what factors weigh in (and how much)? Who is allowed to participate in the process and how can they participate in the process? What language is used during the process and in the communication of the results?

    One example of the latter is how consequences are selected and expressed. Certain risk assessments focus on fatalities, but those are often not the only bodily consequences. So, what to do with injuries? Should one choose a number of severe injuries that equals a fatality? Or should we, as one often sees, translate fatalities and injuries into monetary units? Is that really a good, and fair measure? Can you put a number on a human life? And if so, what number? Sure, you can estimate one person’s economic contribution to society and his/her family, but a person is so much more than his/her economic contribution.

    Challenges

    The above views of safety all bring their own ways of measuring safety. Regard safety as compliance and you may be tracking citations from the inspectorate, or observations of unsafe acts (e.g. not wearing protective equipment). If safety is seen as the absence of accidents, you will naturally follow up on accident and injury reports. Those who adopted a risk view of safety may have some kind of a risk register, present the most important risks in a risk matrix or heat map and follow up on actions to control the risks.

    How you define safety will influence your choice of things you measure – and vice versa. What you measure may very well become your definition of safety, consciously or not. If corporate policy, an ISO standard or the regulator requires you to record accidents and near misses as part of your monitoring, it will become very natural to talk about these metrics when someone asks about “How are we doing at safety?”

    Another challenge is that management dashboards and scorecards allow only limited space for the presentation of how things are going. Managers are busy people and they would very much like to get clear, concise, unambiguous and short answers. However, safety is a complex phenomenon. Therefore, we need a variety of measures to give a reasonable description. No one view captures everything. Every view shows some elements of safety, but never the full picture. A good answer thus needs rich information and nuances. Here is a tension between space and attention available and what is needed to give a high-quality answer.

    Dumbing it down into an easy measure, no matter how intuitive, will not do justice to the subject. A fatality/injury-based metric only captures a tiny part of a very complex phenomenon. It would be like describing a river exclusively by its temperature – which, by the way, rather depends on its surroundings, location and season than on ‘itself’, just as injury rates may correlate stronger with the context than with safety efforts initiated by the organisation. A trade-off between thoroughness and efficiency is inevitable and carefully addressing this in the management system is essential.

    This article is an adapted and abbreviated chapter from the book If You Can’t Measure It… Maybe You Shouldn’t. Reflections on Measuring Safety, Indicators, and Goals.

  • The Human-Centred Organisation

    The Human-Centred Organisation

    The world today is highly complex and fast-changing. New technologies become available and change the way we work, communicate and live our lives. The complex socio-economic and socio-political systems can make it difficult to anticipate the needs and requirements of tomorrow. This article discusses issues organisations have to deal with and the benefit of becoming more human-centred with the help of a model aiming to influence organisations on policy level.

    A changing world

    The introduction of new technologies, automation in particular, has shifted the nature of work and made certain tasks performed by personnel obsolete. This becomes more obvious when we look at how tasks have changed over time. Routine work of a cognitive and manual nature has decreased. However, non-routine work of both categories, but especially cognitive non-routine tasks, has increased greatly over the last 30 years, as illustrated in the following chart:

    Change in routine vs. non-routine, cognitive vs. manual tasks over time

    Being able to adapt and evolve in a sustainable way requires a workforce that is diverse and skilled and able to deal with complex problems. To accommodate this, frameworks are needed that could give guidance on how organisations can use their human resource in a better, more human-centred way.

    Management

    The aim of management is to maximise profits made by the company which in turn increases shareholder value. This is true today as it was back in the early 20th century when Scientific Management (Taylor 1911) was first introduced in manufacturing, particularly in the steel industry, to increase productivity and reduce costs.

    Industries have to adapt to changes in demand and the development of new technologies at an ever-increasing rate. This is creating complex problems that must be confronted each day anew (Scott Page, 2011).

    The constant demand on organisations to cope with complexity brings the need to develop better strategies and to become smarter. However, decision making, in many cases, depends on the knowledge and wisdom of few people with potentially limited understanding of the problem and no time to gather additional information. The knowledge and wisdom of experts is often not used or dispersed within the organisation, and difficult to access and unknown to decision makers.

    One reason for this could be the process of employment, and that the criteria for the job are regularly narrowed to a set of specific requirements, ignoring the whole remit of a person’s skills and knowledge and how he or she could add value to the team under changing circumstances. On the other hand, personnel who proactively volunteer their expertise outside their defined job description are often seen as rebels or troublemakers and discouraged from contributing.

    Frederick Winslow Taylor, in the early 1900s, described the good worker as someone whose job was to “just do what he was told to do, and no back talk.”
    – James Surowiecki, The Wisdom of Crowds, 2004

    Employees

    The struggle of an organisation to change and adapt is often blamed on their employees, and most managers know the difficulty in convincing them of the necessity for the company to adapt.

    Employees may already be conditioned to simply perform the job they were hired to do, and in many cases, they are happy knowing exactly what is expected of them with no need for further development of their skills and education. Others see the issues which make work difficult, but their suggestions are not taken into consideration and they become frustrated.

    The view that a good worker is one that just does as he or she is told with no back talk is still present in today’s work environment, regardless of the nature of the job. If, however, employees are suddenly expected to embrace a new way of working, it is not surprising when they respond with scepticism and appear apathetic and unwilling to engage.

    In fact, this very issue was observed by Frederick Winslow Taylor during the introduction of scientific management (Taylor 1911) in the early 20th century, and that was a time when change was less rapid than it is today.

    In fast-changing environments, it becomes all the more difficult to precisely specify roles and responsibilities across a diverse set of jobs.
    – Royal & Agnew, The Enemy of Engagement, 2012

    ISO 27500 – Human-Centred Organisation

    As illustrated previously, the need to constantly adapt to a changing environment is of vital importance for organisations in ever more dynamic economic environments. Often how work is done needs to change, which can mean that new technology needs to be introduced. New technology may impose the need for employees to adapt, which can have a tremendous impact not only on employees but also on customers. Therefore, it is important to anticipate the impact of new technology on human behaviour and to consider a human-centred approach not only on design but also on the wider organisation.

    Many standards have been developed to address ergonomic and human factors requirements. These mainly address specific issues and focus on the technical side of human interaction with technology. However, the rapid pace of technological development makes it difficult to keep up to date with standards. This led to the development of Human-Centred Design standards which are not technology specific but focused on who the design was for and what their needs for the product and systems are (Tom Stewart, 2017). In 2016 a new ISO standard was introduced focusing on the human-centred organisation – general principles.

    ISO 27500 is a “Hearts and Minds” standard aimed at corporate boards and at influencing policies. It consists of seven top-level principles. Each one has been endorsed by successful companies. It lays the foundation for application of ergonomics and human factors which not only address risk in terms of safety but can also improve quality and efficiency, and wellbeing.

    ISO 27500 – Human-Centred Organisation provides principles that can help management with the process of becoming more human-centred. Below are some useful practical suggestions.

    Capitalise on individual differences as an organisational strength. Having a diverse workforce should not be seen as a “must do thing” imposed by legislation or stakeholders, but a chance to improve resilience and performance within the organisation. People with different backgrounds think differently and make an organisation smarter. This should be reflected within human resource policies.

    Adopt a total system approach. Understanding how the organisation works from a systems perspective helps in understanding its behaviour. This requires the organisation to take a closer look at feedback loops and make sure the flow of information is also going bottom-up. The application of system thinking can help to create better models of the dynamic processes relevant to the organisation. Try to understand the relationship between the different agents and components of the whole organisation. This can be achieved through applying methods which are able to model dynamic socio-technical systems.

    Make usability and accessibility strategic business objectives. Application of a human-centred design process helps to understand users’ needs and provides a framework for engineering to design more usable and accessible products. Having systems in place that are usable and support optimal human performance will not only increase reliability but also reduce frustration within the workforce. Special attention should be paid to the distribution of information and how this is presented. Written information may not be ideal for a significant portion of the workforce.

    Ensure health, safety, and wellbeing are business priorities. With more work being of cognitive non-routine nature, the focus should not only be on conventional safety but also on workload and mental health. Understanding the system and its constraints will help identify bottlenecks and be proactive in prevention of mental health issues.

    Value personnel and create meaningful work. Do not consider employees as just another replaceable piece in the process and acknowledge their contribution. Their feedback might be of critical importance to the organisation. Attempt to understand the capability of your workforce and conduct a “what is already there” analysis to understand the variety of skills and competencies which are already available in the organisation. Finding a way to allow creativity to thrive increases the organisation’s ability to innovate and be more resilient to change. Listen to “rebels” carefully; what they have to say can be of critical importance. Create an environment where thoughts and ideas can be shared, and critical voices are valued.

    Be open and trustworthy. Openly and transparently communicate difficult decisions and admit organisational shortfalls. Accept different views and critical feedback from employees. Make sure you have an effective way to collect opinions from stakeholders.

    Act in socially responsible ways. This principle links to ISO 26000 which provides guidance on social responsibility. Social responsibility may depend on the cultural context the organisation is working in. If an organisation changes its operation from a regional or national to an international stage, the requirements may change rapidly.

    Conclusion

    ISO 27500 is currently not a certifiable standard, but this does not mean it should be ignored. The principles mentioned can provide a framework for policies and lay the foundation for a more sustainable utilisation of the human resource.

    Organisations do not need to find a good reason to follow a standard; they need a good reason not to follow it.

    References

  • The New View of Human Error

    The New View of Human Error

    In this article, I will focus on the Old View and the New View of human error. This is a first, short introduction, which lays the ground for further articles on this interesting topic.

    The term ‘New View’ is already 20 years old and basically not that new anymore. However, in many minds and subsequently in numerous organizations, the New View has not yet become established.

    Errors occur in every company. Fortunately, these mistakes usually have no consequences and often they are not even noticed. But unfortunately, sometimes there is financial impact or even personal injury.

    But why are these errors happening? Are these avoidable mistakes by individuals that should just have been more careful? Or are errors emergent properties of a complex socio-technical system and have little to do with the individual?

    The Old View

    One possible view is that human error and thus its negative consequence would be avoidable if everyone adhered to the rules. If an error occurs due to carelessness, it is sufficient to point this out to the person and, if necessary, to punish them, to solve the problem. In extreme cases, the punishment can go as far as to remove the culprit from the system. Criminal consequences are also conceivable. These are not necessarily initiated by the company, but in the case of ex officio offences by the prosecutor.

    The system itself is considered to be inherently safe. People in the system are seen as potential sources of error and system weakness. If all acting persons make an effort and adhere to the rules, nothing can actually happen. The safety level of the system can be measured by the number of incidents or accidents within a period.

    But how does this view help to make a system more secure?

    I am inclined to say: not at all. Companies are complex socio-technical systems. A characteristic of these systems is that not all effects of the interaction of different system components are known. Errors, but also system safety, are emergent system properties.

    But what is actually an error?

    We differentiate between different types of errors. There are the unintentional or unconscious errors that happen without knowing the effects on the system. And there are intended or deliberate mistakes. These are mostly deliberate deviations from existing procedures or rules. Such deviations occur, for example, in the event of conflicting goals, under high production pressure, or because no better alternatives are available. Thus, they are a result of inadequate systems. It is also often the case that these deviations have achieved better results for some time than official procedures.

    Whether an action was an error or not often has to do with the result itself. The term ‘error’ is therefore a backward-looking view of an action of which the result became known in the meantime. Especially in an environment with high complexity and incomplete information, it can happen that the same action leads to a positive result and another to a negative result. So whether someone made an error or not can be due to circumstances that were still unknown at that time.

    The New View

    The New View is distancing itself from the perspective of the human as a source of error and as the weakest link in the chain. Humans are seen much more as a system component that enables high system safety. The starting point is that people come to work to do a good job. If an error occurs, it cannot simply be reduced to the action of an individual. It is necessary to consider the error in the system context. Because the action that later turned out to be an error was considered by the acting person to be useful for achieving the goal at the time of execution.

    People make decisions under high pressure, with conflicting goals and in great uncertainty. In a complex system, decisions have to be made with incomplete information, or the amount of information is so large that it cannot be processed at all. This can lead to information being overlooked or deliberately not being included in decision making.

    Make the system safer

    In this context, I consider a system to be an organization or organizational unit with employees, technical systems and processes. If appropriate, the term system can also be extended to external components.

    Fortunately, as mentioned at the beginning, most errors remain without consequences. This is primarily due to people’s resilience and sometimes simply due to chance.

    Errors provide an opportunity to learn and make the system safer. If errors occur, it is not expedient to limit the analysis to the actions of the individual (the Old View). Removing the ‘culprit’ from the system does not improve it. It is crucial to take a system perspective in the analysis and to want to understand why the decision for this individual made sense in this specific situation (the New View). It must also be taken into account what information the person had available and what conflicting goals they were exposed to. This also raises the question of whether another – comparably competent – person might have made the same decision in a comparable situation or not. If this question is answered with ‘yes’, an adjustment in the system is required in order to achieve sustainable improvement.

  • Reaching optimal Human Performance through effective System Design

    Reaching optimal Human Performance through effective System Design

    Designing automation for complex socio-technical systems, to ensure optimal Human Performance of human operators, is a challenging endeavour. Especially in safety-critical environments, humans may need to adapt quickly to changing levels of demands, complexity and uncertainty, in order to maintain optimal performance, efficiency and safety of operations. Under these conditions, humans may benefit from automation. In most cases, automation is designed to take over low-value tasks, i.e. tasks that are simple and easy to automate. However, designing automation to support the human with cognitively demanding tasks such as problem solving and complex decision-making is more challenging for various reasons. First, it is required to build an understanding of all high-level tasks and underlying (human) cognitive functions, and to identify to what extent these tasks are currently supported by automation, and what humans need in terms of resources to execute them. Second, automating tasks requires re-thinking the new distribution of (cognitive) functions between humans and automation on a higher level, what organizational structures are required, and how cognition is shared amongst humans and automation (i.e. how humans are able to work effectively with automation). Third, it needs to be understood how automation should be designed so it can support humans optimally in managing complex tasks, in particular when decision-making or problem solving under rapidly changing demands, high levels of complexity, and uncertainty is required. Therefore, creating automation to support humans requires a deep understanding of what strategies humans adopt when engaging in complex problem solving and decision making. What strategies do they adopt and what do they need as automation support? This article provides an overview of how to tackle these challenges.

    Step 1: Understanding tasks and underlying (cognitive) functions of a system

    We have to consider that in most cases, we do not develop systems from scratch. Rather, we are building upon existing systems for improvements in terms of safety, efficiency, or other performance dimensions. This means we have to understand what tasks and underlying (cognitive) functions currently exist and what functions currently are supported by automation, in order to identify possibilities to further automate complete tasks or underlying (cognitive) functions or improve existing automated functions.

    In order to identify what automation optimally supports the human in complex tasks (ensuring human-centric decision-making), we first need to identify all tasks and corresponding (cognitive) functions. We also need to identify the current allocation of tasks (and underlying cognitive functions) between humans and automation. Some tasks may be allocated to humans, with various levels of automation support; some tasks may be allocated fully to automation. But it is also possible that tasks are dynamically allocated to humans or automation. It is necessary to understand how changing the allocation of tasks may impact the overall system in terms of interdependencies between humans and automation. A Cognitive Function Analysis (CFA) (Boy, 1998) is an important instrument for Human Factors Engineers and Designers (e.g. UX Engineers) to generate an understanding of all tasks and underlying functions of a system, and the implications of changing the allocation of functions between humans and automation. When doing a CFA, it is important that a wide range of techniques is used, including interviews, observations as well as documentation study. Interviews and observations are important as in most cases, humans may have evolved to use the system differently than intended, which often is not documented.

    Step 2: Understanding the impact of function allocation on system stability

    Changing allocation of functions between humans and automation may have an impact on system stability (Straussberger et al., 2008). When automating existing functions currently allocated to humans, it therefore needs to be assessed what impact redesigning human and machine cognitive functions through increasing automation will have on the overall stability of a complex socio-technical system. This will ultimately determine the resilience of the system to respond to all operational demands. Stability exists on various different layers. It is the result of organizational structures linked to procedures and technical systems and will reflect a system’s ability to recover after disturbance. The stability of socio-technical systems is defined through two processes (Straussberger et al. 2008):

    • Global socio-cognitive stability
    • Local socio-cognitive stability

    Global socio-cognitive stability is concerned with the appropriateness of functions allocated to humans or automation, the pace of information flows and related coordination, through designing appropriate structures linked to:

    • Authority
    • Responsibility
    • Controllability
    • Ability

    Issues may arise if these structures have not been adequately designed. For example, when humans have formal responsibility but do not have controllability or ability to execute certain tasks or high-level functions. Or, alternatively, functions become fully allocated to automation, yet humans maintain formal responsibility for these functions, whereas they have no control or ability to intervene in their execution. Issues may also arise when functions are dynamically allocated to humans or automation or delegated to the system by humans, and the conditions which must be met for delegation are not transparent to humans or are simply not defined.

    Local socio-cognitive stability refers to humans’ workload, situation awareness, ability to make appropriate decisions and take action. Local socio-cognitive stability will mainly rely on humans’ ability to understand automation and to gain a mental model of the system. Automated systems need to be designed such that humans are able to predict (anticipate) responses of automated systems on human input as well as receive adequate feedback, and regain authority if needed (Boy, 1998). Also, transparency of automated functions needs to be considered, so that humans can develop a valid mental model of the system, its functions, and its behaviour.

    Ensuring both global as well as local socio-cognitive stability will ensure a common frame of reference, supporting joint situation awareness between humans and automated systems.

    Step 3: Design automation to support expert decision-making

    Designing automation to support human macro cognitive functions starts with understanding how human operators respond to high levels of complexity and uncertainty. Humans may need to adapt to changing demands, which requires anticipating, extrapolating into the future, and creating an assessment based on experience. It may also be required to plan ahead and build capacity to be able to manage situations in the near future. They may also need to engage in strategies to deal with future demands and unexpected situations. Such strategies may be dedicated to either reduce or manage complexity and uncertainty. Examples of complexity and uncertainty management strategies include (Corver & Grote, 2016):

    • Anticipatory thinking (extrapolating the current situation into the future based on past experience on observed deviations)
    • Adaptive planning (i.e. creating back-up plans)
    • Weighing pros and cons of different options (comparing alternative solutions)
    • Forestalling (improving readiness, e.g. to manage resources for future demands)
    • Reducing uncertainty (e.g. increase accuracy and reliability of data through the integration and validation of information from different sources)

    The understanding of these strategies is important to start designing useful automation to support human operator decision-making and task execution in highly dynamic situations with high levels of complexity. The following questions should be asked: what information is required from which sources and what data accuracy is required? What cues are required for human operators to be adequately alerted about deviations in order to allow them to quickly respond adequately? What do humans consider when analyzing a situation and engaging in complex decision-making? Automated support tools can be designed to support humans’ ability to filter and cluster information where it is needed, to extrapolate into the future, and be alerted when the situation deviates, or to make complex decisions based on operational trade-offs (Corver & Grote, 2016). Finally, an understanding of the tasks and information needs can support the design of automation which supports humans with clustering, integrating and filtering different information from different sources for improved and quicker decision-making.

    In summary, the identification of human macro cognitive strategies allows us to understand how automation can support human needs and will allow us to increase overall performance of a system.

    References

    • Corver, S.C. & Grote, G. (2016). Uncertainty management in en route air traffic control: a field study exploring controller strategies and requirements for automation. Cognition, Technology & Work.
    • Boy, G. (1998). Cognitive Function Analysis. Westport, CT: Ablex, Greenwood Publishing Group.
    • Straussberger, S., et al. (2008). PAUSA for the future – A synthesis of Phase 1. June 2008. Final Report.