Tailoring Safer Systems Das Magazin von safety & risk solutions The magazine of safety & risk solutions

Author: admin

The New View of Human Error

In this article, I will focus on the Old View and the New View of human error. This is a first, short introduction, which lays the ground for further articles on this interesting topic.

The term ‘New View’ is already 20 years old and basically not that new anymore. However, in many minds and subsequently in numerous organizations, the New View has not yet become established.

Errors occur in every company. Fortunately, these mistakes usually have no consequences and often they are not even noticed. But unfortunately, sometimes there is financial impact or even personal injury.

But why are these errors happening? Are these avoidable mistakes by individuals that should just have been more careful? Or are errors emergent properties of a complex socio-technical system and have little to do with the individual?

The Old View

One possible view is that human error and thus its negative consequence would be avoidable if everyone adhered to the rules. If an error occurs due to carelessness, it is sufficient to point this out to the person and, if necessary, to punish them, to solve the problem. In extreme cases, the punishment can go as far as to remove the culprit from the system. Criminal consequences are also conceivable. These are not necessarily initiated by the company, but in the case of ex officio offences by the prosecutor.

The system itself is considered to be inherently safe. People in the system are seen as potential sources of error and system weakness. If all acting persons make an effort and adhere to the rules, nothing can actually happen. The safety level of the system can be measured by the number of incidents or accidents within a period.

But how does this view help to make a system more secure?

I am inclined to say: not at all. Companies are complex socio-technical systems. A characteristic of these systems is that not all effects of the interaction of different system components are known. Errors, but also system safety, are emergent system properties.

But what is actually an error?

We differentiate between different types of errors. There are the unintentional or unconscious errors that happen without knowing the effects on the system. And there are intended or deliberate mistakes. These are mostly deliberate deviations from existing procedures or rules. Such deviations occur, for example, in the event of conflicting goals, under high production pressure, or because no better alternatives are available. Thus, they are a result of inadequate systems. It is also often the case that these deviations have achieved better results for some time than official procedures.

Whether an action was an error or not often has to do with the result itself. The term ‘error’ is therefore a backward-looking view of an action of which the result became known in the meantime. Especially in an environment with high complexity and incomplete information, it can happen that the same action leads to a positive result and another to a negative result. So whether someone made an error or not can be due to circumstances that were still unknown at that time.

The New View

The New View is distancing itself from the perspective of the human as a source of error and as the weakest link in the chain. Humans are seen much more as a system component that enables high system safety. The starting point is that people come to work to do a good job. If an error occurs, it cannot simply be reduced to the action of an individual. It is necessary to consider the error in the system context. Because the action that later turned out to be an error was considered by the acting person to be useful for achieving the goal at the time of execution.

People make decisions under high pressure, with conflicting goals and in great uncertainty. In a complex system, decisions have to be made with incomplete information, or the amount of information is so large that it cannot be processed at all. This can lead to information being overlooked or deliberately not being included in decision making.

Make the system safer

In this context, I consider a system to be an organization or organizational unit with employees, technical systems and processes. If appropriate, the term system can also be extended to external components.

Fortunately, as mentioned at the beginning, most errors remain without consequences. This is primarily due to people’s resilience and sometimes simply due to chance.

Errors provide an opportunity to learn and make the system safer. If errors occur, it is not expedient to limit the analysis to the actions of the individual (the Old View). Removing the ‘culprit’ from the system does not improve it. It is crucial to take a system perspective in the analysis and to want to understand why the decision for this individual made sense in this specific situation (the New View). It must also be taken into account what information the person had available and what conflicting goals they were exposed to. This also raises the question of whether another – comparably competent – person might have made the same decision in a comparable situation or not. If this question is answered with ‘yes’, an adjustment in the system is required in order to achieve sustainable improvement.

March 18, 2020
Reaching optimal Human Performance through effective System Design
Designing automation for complex socio-technical systems, to ensure optimal Human Performance of human operators, is a challenging endeavour. Especially in safety-critical environments, humans may need to adapt quickly to changing levels of demands, complexity and uncertainty, in order to maintain optimal performance, efficiency and safety of operations. Under these conditions, humans may benefit from automation. In most cases, automation is designed to take over low-value tasks, i.e. tasks that are simple and easy to automate. However, designing automation to support the human with cognitively demanding tasks such as problem solving and complex decision-making is more challenging for various reasons. First, it is required to build an understanding of all high-level tasks and underlying (human) cognitive functions, and to identify to what extent these tasks are currently supported by automation, and what humans need in terms of resources to execute them. Second, automating tasks requires re-thinking the new distribution of (cognitive) functions between humans and automation on a higher level, what organizational structures are required, and how cognition is shared amongst humans and automation (i.e. how humans are able to work effectively with automation). Third, it needs to be understood how automation should be designed so it can support humans optimally in managing complex tasks, in particular when decision-making or problem solving under rapidly changing demands, high levels of complexity, and uncertainty is required. Therefore, creating automation to support humans requires a deep understanding of what strategies humans adopt when engaging in complex problem solving and decision making. What strategies do they adopt and what do they need as automation support? This article provides an overview of how to tackle these challenges.

Step 1: Understanding tasks and underlying (cognitive) functions of a system

We have to consider that in most cases, we do not develop systems from scratch. Rather, we are building upon existing systems for improvements in terms of safety, efficiency, or other performance dimensions. This means we have to understand what tasks and underlying (cognitive) functions currently exist and what functions currently are supported by automation, in order to identify possibilities to further automate complete tasks or underlying (cognitive) functions or improve existing automated functions.

In order to identify what automation optimally supports the human in complex tasks (ensuring human-centric decision-making), we first need to identify all tasks and corresponding (cognitive) functions. We also need to identify the current allocation of tasks (and underlying cognitive functions) between humans and automation. Some tasks may be allocated to humans, with various levels of automation support; some tasks may be allocated fully to automation. But it is also possible that tasks are dynamically allocated to humans or automation. It is necessary to understand how changing the allocation of tasks may impact the overall system in terms of interdependencies between humans and automation. A Cognitive Function Analysis (CFA) (Boy, 1998) is an important instrument for Human Factors Engineers and Designers (e.g. UX Engineers) to generate an understanding of all tasks and underlying functions of a system, and the implications of changing the allocation of functions between humans and automation. When doing a CFA, it is important that a wide range of techniques is used, including interviews, observations as well as documentation study. Interviews and observations are important as in most cases, humans may have evolved to use the system differently than intended, which often is not documented.

Step 2: Understanding the impact of function allocation on system stability

Changing allocation of functions between humans and automation may have an impact on system stability (Straussberger et al., 2008). When automating existing functions currently allocated to humans, it therefore needs to be assessed what impact redesigning human and machine cognitive functions through increasing automation will have on the overall stability of a complex socio-technical system. This will ultimately determine the resilience of the system to respond to all operational demands. Stability exists on various different layers. It is the result of organizational structures linked to procedures and technical systems and will reflect a system’s ability to recover after disturbance. The stability of socio-technical systems is defined through two processes (Straussberger et al. 2008):
- Global socio-cognitive stability
- Local socio-cognitive stability
Global socio-cognitive stability is concerned with the appropriateness of functions allocated to humans or automation, the pace of information flows and related coordination, through designing appropriate structures linked to:
- Authority
- Responsibility
- Controllability
- Ability
Issues may arise if these structures have not been adequately designed. For example, when humans have formal responsibility but do not have controllability or ability to execute certain tasks or high-level functions. Or, alternatively, functions become fully allocated to automation, yet humans maintain formal responsibility for these functions, whereas they have no control or ability to intervene in their execution. Issues may also arise when functions are dynamically allocated to humans or automation or delegated to the system by humans, and the conditions which must be met for delegation are not transparent to humans or are simply not defined.

Local socio-cognitive stability refers to humans’ workload, situation awareness, ability to make appropriate decisions and take action. Local socio-cognitive stability will mainly rely on humans’ ability to understand automation and to gain a mental model of the system. Automated systems need to be designed such that humans are able to predict (anticipate) responses of automated systems on human input as well as receive adequate feedback, and regain authority if needed (Boy, 1998). Also, transparency of automated functions needs to be considered, so that humans can develop a valid mental model of the system, its functions, and its behaviour.

Ensuring both global as well as local socio-cognitive stability will ensure a common frame of reference, supporting joint situation awareness between humans and automated systems.

Step 3: Design automation to support expert decision-making

Designing automation to support human macro cognitive functions starts with understanding how human operators respond to high levels of complexity and uncertainty. Humans may need to adapt to changing demands, which requires anticipating, extrapolating into the future, and creating an assessment based on experience. It may also be required to plan ahead and build capacity to be able to manage situations in the near future. They may also need to engage in strategies to deal with future demands and unexpected situations. Such strategies may be dedicated to either reduce or manage complexity and uncertainty. Examples of complexity and uncertainty management strategies include (Corver & Grote, 2016):
- Anticipatory thinking (extrapolating the current situation into the future based on past experience on observed deviations)
- Adaptive planning (i.e. creating back-up plans)
- Weighing pros and cons of different options (comparing alternative solutions)
- Forestalling (improving readiness, e.g. to manage resources for future demands)
- Reducing uncertainty (e.g. increase accuracy and reliability of data through the integration and validation of information from different sources)
The understanding of these strategies is important to start designing useful automation to support human operator decision-making and task execution in highly dynamic situations with high levels of complexity. The following questions should be asked: what information is required from which sources and what data accuracy is required? What cues are required for human operators to be adequately alerted about deviations in order to allow them to quickly respond adequately? What do humans consider when analyzing a situation and engaging in complex decision-making? Automated support tools can be designed to support humans’ ability to filter and cluster information where it is needed, to extrapolate into the future, and be alerted when the situation deviates, or to make complex decisions based on operational trade-offs (Corver & Grote, 2016). Finally, an understanding of the tasks and information needs can support the design of automation which supports humans with clustering, integrating and filtering different information from different sources for improved and quicker decision-making.

In summary, the identification of human macro cognitive strategies allows us to understand how automation can support human needs and will allow us to increase overall performance of a system.

References
- Corver, S.C. & Grote, G. (2016). Uncertainty management in en route air traffic control: a field study exploring controller strategies and requirements for automation. Cognition, Technology & Work.
- Boy, G. (1998). Cognitive Function Analysis. Westport, CT: Ablex, Greenwood Publishing Group.
- Straussberger, S., et al. (2008). PAUSA for the future – A synthesis of Phase 1. June 2008. Final Report.
February 21, 2020
The characteristics of High Reliability Organizations

Today, companies are faced with ever-increasing complexity. On the one hand, companies themselves are complex, socio-technical systems, and on the other hand, they are embedded in a complex environment with numerous known and unknown factors. How can an organization successfully keep up with these constantly increasing demands?

In this article, I will focus on High Reliability Organizations (HRO) and the related High Reliability Theory (HRT). The idea of High Reliability Organizations originally comes from organizations that successfully operate in high-risk industries. Examples include air traffic control, nuclear power plants, aircraft carriers, power grid operators, and similar fields of activity. However, I am personally convinced that the insights gained from HRO and the associated operational principles can be transferred to any organization and – like the classic HRO – will make them more successful.

The emergence of the High Reliability Theory

In 1984, sociologist Charles Perrow published his book “Normal Accidents: Living with High-Risk Technologies”, in which he introduces the Normal Accident Theory (NAT) using an analysis of the reactor accident in the nuclear power plant Three Mile Island in the USA in 1979. His reasoning was that complex and tightly coupled systems would have to lead to a catastrophic accident sooner or later. It therefore was irresponsible to operate such systems, for example nuclear power plants. This theory appears to be absolutely plausible; however, it holds a problem: it cannot be falsified. Because – and this several times was Perrow’s answer to criticism – even if there has never been an accident, it is just a matter of time before it happens. Who can refute a forward-looking statement?

NAT has rightly received a lot of attention in safety sciences and Perrow has been cited thousands of times. But the question arose as to why there are still organizations that can successfully operate complex, tightly coupled systems with virtually no incidents. This question inspired the Berkeley scholars Gene I. Rochlin, Todd R. La Porte, and Karlene H. Roberts to study such organizations more closely and to publish an article on High Reliability Organizations in 1987 in response to Perrow’s NAT: the High Reliability Theory (HRT). Using a best-practice approach, the Berkeley scholars examined why operations on an aircraft carrier (in peacetime), where aircraft move at high speed and in tight space in the presence of dangerous goods such as fuel and weapons, do not (or did not yet) lead to a catastrophic accident.

Like NAT, HRT received a lot of attention. As a result, it was followed by numerous publications, and there are still articles and books published today on this topic. Of course, Perrow’s answer to HRT did not wait. While the Berkeley scholars considered their work as complementary to NAT, Perrow didn’t agree with this assessment and contradicted directly. After 1987, numerous studies and surveys were conducted in different industries such as nuclear power plants, aircraft carriers (in peacetime), power grid operators, air traffic control, etc. These studies have contributed to the further development of the HRT. In 2001, Karl E. Weick and Kathleen M. Sutcliffe published the first edition of “Managing the Unexpected”, in which they broke down the findings to five HRO principles that I will explain further down. “Managing the Unexpected” was published in a second edition in 2007 and in a third edition in 2015.

The five HRO principles

The first three of the five HRO principles are primarily to be understood from a prevention perspective. The focus is on preventing serious incidents and accidents.

Preoccupation with failure. To deal with possible failure means to think about what could go wrong and to look for weak signals in the system. These are activities that can be taken over by effective risk management but should also be firmly anchored in the everyday work routine of all employees. When analyzing risks, it is important to understand what could happen. However, it is even more important to get to the bottom of the question of why something could happen.

Reluctance to simplify. We tend to look at something in a way that fits our personal worldview. This applies to both individuals and entire organizations. With the reluctance to simplify, the existing worldview is questioned, and situations are viewed from different perspectives.

Sensitivity to operations. In an HRO, operational processes set the pace of the organization. The organization has a high awareness of details and identifies weak signals. The quality of relationships is strong. There is a culture of trust, which enables employees to speak openly about irregularities and concerns in connection with operational processes.

But an HRO is also aware that, despite all efforts, mistakes cannot be completely avoided. The last two of the five HRO principles focus primarily on coping with errors so that they do not develop into a crisis.

Commitment to resilience. Errors also occur in HROs. HRO do not try to be error-free, but on the one hand they have the ability to quickly recover from mistakes before a crisis arises, and on the other hand they have the ability – if a crisis does occur – to quickly get out of this crisis.

Deference to expertise. If an HRO is in a critical situation or even in a crisis, the decisions to deal with this situation are made “at the front” by the relevant experts and not necessarily by the management. Figuratively speaking, the hierarchy pyramid turns upside down. Once the crisis has been overcome, the hierarchy pyramid will normalize again.

Why become a High Reliability Organization?

The five HRO principles support a company to become a mindful organization that is able to focus on operational processes, detect weak signals, learn from irregularities, and manage potentially dangerous situations before they turn into a full-blown crisis. The benefits that result from this are manifold. For example, losses in connection with a crisis – be it a production shutdown or a far-reaching scandal – can be avoided or limited, or a significant competitive advantage can be achieved by optimizing operational processes. If a company is perceived as an HRO from outside, this can have a positive impact on public perception and can also play a decisive role in the war for talent.

In conclusion, it can be said that the concept of the High Reliability Organization not only supports companies in high-risk areas, but basically supports all companies in successfully sustaining themselves in their increasingly complex environment and in creating a competitive advantage.

February 12, 2020
How good reporting makes your organisation safer

A reporting system combined with a positive reporting culture enables a company to learn from incidents and reduces the likelihood of further incidents or even accidents. But the way towards a good reporting culture is challenging and has many stumbling blocks. It is about addressing fears and creating trust. In this way, the organization can become safer and more productive in the long term with the positive effects of its reporting system and reporting culture.

Today, numerous companies are subject to regulatory requirements to have an incident reporting system. A corresponding tool and the associated process can be introduced relatively quickly, which often means that the regulatory requirements are met. But even if a reporting system is physically available, this does not say anything about the reporting culture and thus about the quality of the reporting system.

In this article I will look into the necessity and the advantage of a positive reporting culture and the pre-requisites for it.

Of course, an incident reporting system is not only useful for companies that are legally obliged to do so. Every organization benefits from a good-quality reporting system and a positive reporting culture, in that it becomes visible what is happening in the organization and weaknesses can be identified and remedied. Depending on the industry, this leads to an increased level of safety and trust among stakeholders, or to higher efficiency and a decrease in production losses. With a positive reporting culture, a significant competitive advantage can be achieved. Furthermore, a reporting system is an important management tool for senior management.

Companies are complex, socio-technical systems, the property of which is that it is not possible to know, let alone understand, all of the interactions within the organization. With reports directly from inside the system, that is, from employees at all levels, an organization receives important information from the various areas. These reports can provide information about existing processes, risks, established standards, hidden or open deviations from processes, uncertainties of employees, and so on. Such information enables the organization to identify weaknesses and thus to continuously improve and develop.

For this it is important to get away from the attitude that humans are the weakest link in the chain and that mistakes are seen as weaknesses. Rather, it is helpful to consider that employees come to work to do a good job. If mistakes happen, it is important to understand why this action made sense to the person in this specific situation. If the employee is just punished for their mistake or – in an extreme case – excluded from the organization, the system has not learned anything from the mistake, and it is just a matter of time until some other employee makes the same mistake.

The first step is the introduction of a reporting process, which is supported by a more or less extensive IT tool, depending on the situation. Unfortunately, many organizations fail in the following step of establishing a positive reporting culture. Reasons for failing are insufficient employee trust in the organization, fear of suffering negative consequences, or doubts about the effectiveness of incident reports.

Employee trust

At the beginning, voluntarily reporting errors is a difficult task for many employees. It is often our first reflex to look around to see whether someone has noticed our mistake or not. We would most like to sweep the mistake under the carpet, especially when nothing has happened, which fortunately is the case with most mistakes. This on the one hand out of shame, on the other hand out of fear of negative consequences, be it direct consequences in connection with our employment or negative reactions of our colleagues. Reporting an error means making yourself vulnerable to others – superiors and colleagues – and requires a high level of trust.

Even if a colleague’s mistake is noticed, the hesitation to report it is often great. This may be due to a manager’s mistake, or you may not want to be perceived as someone who denounces the other person. There may also be the fear that someone else might report your own mistakes.

In order to be able to deal with errors openly, a high degree of trust of the employees in the organization with all their employees must be present. It is important that errors are reported solely to improve the system and that the report does not contain any hidden personal agenda. Employees need to know that they will be valued and not punished for their reports.

Anonymous reporting

Of course, an organization may substitute trust by anonymity. However, no organization can fully guarantee this anonymity. Furthermore, anonymity significantly limits the organization’s ability to learn from a report, since a detailed analysis of the report is hardly possible. In a first step, such a substitution may make sense, but it should not stay that way. In a sustainable, positive reporting culture, anonymity hinders the organization’s ability to learn.

Doubts about the effectiveness of a reporting process

The step of submitting a report is not only associated with the expectation that the report will not have a negative effect on the reporter, but also that the report will be taken seriously. If there is no feedback from the reporting process, the reports will soon be viewed as meaningless and a waste of time. The number of reports will decrease.

For the credibility of the reporting system, it is essential that the reporter receives feedback on the report. This can be a comprehensible reason why the report is not being followed up, or information about what will be done with the report. It is also important to connect changes that are introduced on the basis of a report in communication with the corresponding report – in a mature reporting culture even with the reporter themselves. This further underlines the meaningfulness of the reporting system.

What needs to be reported?

Reporting is often limited to reports of events that have already occurred. These are events that must be reported in accordance with the reporting process, or events where reporting is voluntary, i.e. at the discretion of the employee. These backward-looking reports are without any doubt useful and help companies to make adjustments to the system to make it more robust based on past events.

It often happens that people in the organization say that they have seen this event coming for a long time and that it was only a matter of time for the event to happen. It could have been avoided proactively. For this reason, it is important to give employees the opportunity to express concerns and fears in the reporting system. Optimally, the employees also actively participate in problem solving through a suggestion for improvement.

Conclusion

The introduction of an effective reporting system is a challenge that should not be underestimated. The implementation of the process and, if necessary, an IT tool follows the demanding cultural change within the organization. In order to achieve this, numerous employee fears must be addressed, which initially stand in the way of this change. If the employees can be shown in a top-down approach that they can trust the organization and that the newly implemented reporting system will bring benefits to everyone together, the ideal conditions are created for a cultural change towards a positive reporting culture.

February 12, 2020