Being a Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety 一篇关于故障本质、故障评估方式、故障近因归因以及由此产生的对患者安全的新认识的论文
Richard I. Cook, MD医学博士理查德·I·库克 Cognitive Technologies Labratory认知技术实验室 University of Chicago芝加哥大学
-
Complex systems are intrinsically hazardous systems.复杂系统本质上是危险系统。 All of the interesting systems (e.g. transportation, healthcare, power generation) are inherently and unavoidably hazardous by the own nature. The frequency of hazard exposure can sometimes be changed but the processes involved in the system are themselves intrinsically and irreducibly hazardous. It is the presence of these hazards that drives the creation of defenses against hazard that characterize these systems.所有有趣的系统(例如交通、医疗保健、发电系统)本质上都不可避免地具有危险性。接触危险的频率有时可以改变,但系统中涉及的过程本身就具有内在且无法降低的危险性。正是这些危险的存在促使人们创建针对危险的防御措施,而这些防御措施正是这些系统的特征。
-
Complex systems are heavily and successfully defended against failure复杂系统在抵御故障方面得到了大量且成功的防护 The high consequences of failure lead over time to the construction of multiple layers of defense against failure. These defenses include obvious technical components (e.g. backup systems, ‘safety’ features of equipment) and human components (e.g. training, knowledge) but also a variety of organizational, institutional, and regulatory defenses (e.g. policies and procedures, certification, work rules, team training). The effect of these measures is to provide a series of shields that normally divert operations away from accidents.随着时间的推移,故障带来的严重后果促使人们构建多层故障防御体系。这些防御措施包括明显的技术要素(例如备份系统、设备的 “安全” 特性)和人为要素(例如培训、知识),还包括各种组织、机构和监管方面的防御措施(例如政策和程序、认证、工作规则、团队培训)。这些措施的作用是提供一系列防护屏障,通常能使运营避开事故。
-
Catastrophe requires multiple failures – single point failures are not enough.灾难需要多重故障——单点故障是不够的。 The array of defenses works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners.这套防御措施是有效的。系统运行总体上是成功的。当一些看似无害的小故障相互结合,为系统性事故创造了机会时,就会发生明显的灾难性故障。这些小故障中的每一个都是引发灾难的必要条件,但只有它们的组合才足以导致故障发生。换句话说,故障发生的可能性比明显的系统事故要多得多。大多数初始故障轨迹会被系统设计的安全组件所阻断。而那些到达运行层面的故障轨迹大多会被从业人员阻断。
-
Complex systems contain changing mixtures of failures latent within them.复杂系统包含潜伏在其中的各种不断变化的故障组合。 The complexity of these systems makes it impossible for them to run without multiple flaws being present. Because these are individually insufficient to cause failure they are regarded as minor factors during operations. Eradication of all latent failures is limited primarily by economic cost but also because it is difficult before the fact to see how such failures might contribute to an accident. The failures change constantly because of changing technology, work organization, and efforts to eradicate failures.这些系统的复杂性使得它们在运行时不可能不存在多个缺陷。由于这些缺陷单独来看不足以导致故障,因此在运行过程中它们被视为次要因素。消除所有潜在故障主要受到经济成本的限制,同时也因为在事故发生前很难预见这些故障可能会如何引发事故。由于技术、工作组织方式的变化以及消除故障的努力,这些故障也在不断变化。
-
Complex systems run in degraded mode.复杂系统在降级模式下运行。 A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws. After accident reviews nearly always note that the system has a history of prior ‘proto-accidents’ that nearly generated catastrophe. Arguments that these degraded conditions should have been recognized before the overt accident are usually predicated on naïve notions of system performance. System operations are dynamic, with components (organizational, human, technical) failing and being replaced continuously.前一点的一个推论是,复杂系统是在故障状态下运行的。系统之所以能够继续运转,是因为它包含了大量的冗余,并且尽管存在许多缺陷,人们仍能使其运转。事故调查之后几乎总会指出,该系统存在先前 “准事故” 的历史,这些准事故险些酿成大祸。认为在明显的事故发生之前就应该认识到这些性能下降状况的观点,通常基于对系统性能的天真看法。系统运行是动态的,其组成部分(组织、人员、技术方面的)不断出现故障并被更换。
-
Catastrophe is always just around the corner.灾难总是近在咫尺。 Complex systems possess potential for catastrophic failure. Human practitioners are nearly always in close physical and temporal proximity to these potential failures – disaster can occur at any time and in nearly any place. The potential for catastrophic outcome is a hallmark of complex systems. It is impossible to eliminate the potential for such catastrophic failure; the potential for such failure is always present by the system’s own nature.复杂系统存在发生灾难性故障的可能性。人类从业者几乎总是在物理和时间上与这些潜在故障紧密接近——灾难可能在任何时间、几乎任何地点发生。灾难性后果的可能性是复杂系统的一个特征。消除这种灾难性故障的可能性是不可能的;由于系统自身的性质,这种故障的可能性始终存在。
-
Post-accident attribution to a ‘root cause’ is fundamentally wrong.事故发生后将其归因于“根本原因”从根本上来说是错误的。 Because overt failure requires multiple faults, there is no isolated ‘cause’ of an accident. There are multiple contributors to accidents. Each of these is necessarily insufficient in itself to create an accident. Only jointly are these causes sufficient to create an accident. Indeed, it is the linking of these causes together that creates the circumstances required for the accident. Thus, no isolation of the ‘root cause’ of an accident is possible. The evaluations based on such reasoning as ‘root cause’ do not reflect a technical understanding of the nature of failure but rather the social, cultural need to blame specific, localized forces or events for outcomes. 1由于明显的故障需要多个错误因素,所以事故不存在孤立的“原因”。事故有多个促成因素。这些因素中的每一个本身必然不足以引发事故。只有这些原因共同作用才足以引发事故。实际上,正是这些原因的相互关联才造成了事故发生所需的条件。因此,不可能孤立出事故的“根本原因”。基于“根本原因”这类推理的评估,反映的并非是对故障本质的技术理解,而是将结果归咎于特定、局部的力量或事件的社会、文化需求。 1 Anthropological field research provides the clearest demonstration of the social construction of the notion of ‘cause’ (cf. Goldman L (1993), The Culture of Coincidence: accident and absolute liability in Huli, New York: Clarendon Press; and also Tasca L (1990), The Social Construction of Human Error, Unpublished doctoral dissertation, Department of Sociology, State University of New York at Stonybrook)人类学田野研究最为清晰地展示了“原因”这一概念的社会建构(参见 Goldman L(1993 年),《巧合的文化:胡里人的意外与绝对责任》,纽约:克拉伦登出版社;以及 Tasca L(1990 年),《人为错误的社会建构》,未发表的博士论文,纽约州立大学石溪分校社会学系)
-
Hindsight biases post-accident assessments of human performance.事后诸葛亮的心态会影响事故后对人员表现的评估。 Knowledge of the outcome makes it seem that events leading to the outcome should have appeared more salient to practitioners at the time than was actually the case. This means that ex post facto accident analysis of human performance is inaccurate. The outcome knowledge poisons the ability of after-accident observers to recreate the view of practitioners before the accident of those same factors. It seems that practitioners “should have known” that the factors would “inevitably” lead to an accident. 2 Hindsight bias remains the primary obstacle to accident investigation, especially when expert human performance is involved.对结果的了解使得导致该结果的事件在当时对从业者来说似乎比实际情况更明显。这意味着对人类行为的事后事故分析是不准确的。结果认知影响了事故发生后观察者重现从业者在事故发生前对相同因素看法的能力。似乎从业者 “本应知道” 这些因素会 “不可避免地” 导致事故。事后偏见仍然是事故调查的主要障碍,尤其是在涉及专家级人类行为时。 2 This is not a feature of medical judgements or technical ones, but rather of all human cognition about past events and their causes.2 这并非医学判断或技术判断的特征,而是人类对过去事件及其成因的所有认知的特征。
-
Human operators have dual roles: as producers & as defenders against failure.人类操作员有双重角色:既是生产者,也是故障防御者。 The system practitioners operate the system in order to produce its desired product and also work to forestall accidents. This dynamic quality of system operation, the balancing of demands for production against the possibility of incipient failure is unavoidable. Outsiders rarely acknowledge the duality of this role. In non-accident filled times, the production role is emphasized. After accidents, the defense against failure role is emphasized. At either time, the outsider’s view misapprehends the operator’s constant, simultaneous engagement with both roles.系统从业者操作该系统是为了生产出期望的产品,同时也致力于预防事故。系统运行的这种动态特性,即在生产需求与潜在故障可能性之间进行平衡,是不可避免的。局外人很少认识到这种角色的双重性。在没有事故发生的时期,生产角色会受到重视。事故发生后,防范故障的角色则会受到重视。无论何时,局外人的观点都误解了操作人员始终同时承担这两种角色的情况。
-
All practitioner actions are gambles.所有从业者的行动都是一种冒险。 After accidents, the overt failure often appears to have been inevitable and the practitioner’s actions as blunders or deliberate willful disregard of certain impending failure. But all practitioner actions are actually gambles, that is, acts that take place in the face of uncertain outcomes. The degree of uncertainty may change from moment to moment. That practitioner actions are gambles appears clear after accidents; in general, post hoc analysis regards these gambles as poor ones. But the converse: that successful outcomes are also the result of gambles; is not widely appreciated.事故发生后,明显的故障往往看似不可避免,而从业者的行为则被视为失误,或是故意无视某些即将发生的故障。但实际上,从业者的所有行为都是一种冒险,即在结果不确定的情况下采取的行动。不确定性的程度可能随时变化。事故发生后,从业者的行为是一种冒险这一点似乎很明显;一般来说,事后分析会将这些冒险视为糟糕的行为。但相反的情况——成功的结果也是冒险的结果——却没有得到广泛认可。
-
Actions at the sharp end resolve all ambiguity.一线的行动消除了所有的不确定性。 Organizations are ambiguous, often intentionally, about the relationship between production targets, efficient use of resources, economy and costs of operations, and acceptable risks of low and high consequence accidents. All ambiguity is resolved by actions of practitioners at the sharp end of the system. After an accident, practitioner actions may be regarded as ‘errors’ or ‘violations’ but these evaluations are heavily biased by hindsight and ignore the other driving forces, especially production pressure.组织对于生产目标、资源的有效利用、运营的经济性和成本,以及低后果和高后果事故的可接受风险之间的关系往往含糊不清,而且常常是有意为之。所有的含糊之处都由处于系统前端的从业者的行动来解决。事故发生后,从业者的行为可能会被视为 “错误” 或 “违规”,但这些评估严重受到事后诸葛亮的影响,并且忽略了其他驱动因素,尤其是生产压力。
-
Human practitioners are the adaptable element of complex systems.人类从业者是复杂系统中具有适应性的要素。 Practitioners and first line management actively adapt the system to maximize production and minimize accidents. These adaptations often occur on a moment by moment basis. Some of these adaptations include: (1) Restructuring the system in order to reduce exposure of vulnerable parts to failure. (2) Concentrating critical resources in areas of expected high demand. (3) Providing pathways for retreat or recovery from expected and unexpected faults. (4) Establishing means for early detection of changed system performance in order to allow graceful cutbacks in production or other means of increasing resiliency.从业者和一线管理人员积极调整系统,以实现产量最大化和事故最小化。这些调整往往是 moment by moment(即时性地)进行的。其中一些调整包括:(1)对系统进行重组,以减少易损部件发生故障的可能性。(2)将关键资源集中在预计需求较高的区域。(3)为从预期和意外故障中撤退或恢复提供途径。(4)建立早期检测系统性能变化的方法,以便能够平稳地削减产量或采取其他提高弹性的措施。
-
Human expertise in complex systems is constantly changing人类在复杂系统方面的专业知识在不断变化 Complex systems require substantial human expertise in their operation and management. This expertise changes in character as technology changes but it also changes because of the need to replace experts who leave. In every case, training and refinement of skill and expertise is one part of the function of the system itself. At any moment, therefore, a given complex system will contain practitioners and trainees with varying degrees of expertise. Critical issues related to expertise arise from (1) the need to use scarce expertise as a resource for the most difficult or demanding production needs and (2) the need to develop expertise for future use.复杂系统在其运行和管理中需要大量的人类专业知识。这种专业知识会随着技术的变化而在性质上发生改变,同时也会因为需要替换离职的专家而发生变化。在任何情况下,技能和专业知识的培训与提升都是系统自身功能的一部分。因此,在任何时刻,一个特定的复杂系统都会包含专业水平各异的从业者和受训者。与专业知识相关的关键问题源于:(1)需要将稀缺的专业知识作为应对最困难或要求最高的生产需求的资源;(2)需要培养专业知识以备未来使用。
-
Change introduces new forms of failure.变化会引发新的故障形式。 The low rate of overt accidents in reliable systems may encourage changes, especially the use of new technology, to decrease the number of low consequence but high frequency failures. These changes maybe actually create opportunities for new, low frequency but high consequence failures. When new technologies are used to eliminate well understood system failures or to gain high precision performance they often introduce new pathways to large scale, catastrophic failures. Not uncommonly, these new, rare catastrophes have even greater impact than those eliminated by the new technology. These new forms of failure are difficult to see before the fact; attention is paid mostly to the putative beneficial characteristics of the changes. Because these new, high consequence accidents occur at a low rate, multiple system changes may occur before an accident, making it hard to see the contribution of technology to the failure.可靠系统中明显事故的低发生率可能会促使人们做出改变,尤其是采用新技术,以减少后果轻微但发生频率高的故障数量。然而,这些改变实际上可能会为新的、发生频率低但后果严重的故障创造机会。当使用新技术来消除已充分了解的系统故障或获取高精度性能时,它们往往会引入导致大规模灾难性故障的新途径。常见的情况是,这些新的、罕见的灾难甚至比被新技术消除的那些故障产生更大的影响。在事故发生前,这些新形式的故障很难被察觉;人们大多关注这些改变假定的有益特性。由于这些新的、后果严重的事故发生率较低,在事故发生前可能会出现多种系统变化,这使得很难看清技术对故障的影响。
-
Views of ‘cause’ limit the effectiveness of defenses against future events.对“原因”的看法限制了针对未来事件的防御措施的有效性。 Post-accident remedies for “human error” are usually predicated on obstructing activities that can “cause” accidents. These end-of-the-chain measures do little to reduce the likelihood of further accidents. In fact that likelihood of an identical accident is already extraordinarily low because the pattern of latent failures changes constantly. Instead of increasing safety, post-accident remedies usually increase the coupling and complexity of the system. This increases the potential number of latent failures and also makes the detection and blocking of accident trajectories more difficult.针对“人为失误”的事后补救措施通常基于阻止那些可能“导致”事故的活动。这些链条末端的措施对于降低进一步发生事故的可能性作用甚微。事实上,相同事故再次发生的可能性已经极低,因为潜在故障的模式在不断变化。事后补救措施通常并不会提高安全性,反而会增加系统的耦合性和复杂性。这增加了潜在故障的可能数量,也使得对事故轨迹的检测和阻断更加困难。
-
Safety is a characteristic of systems and not of their components安全是系统的一个特征,而非其组件的特征 Safety is an emergent property of systems; it does not reside in a person, device or department of an organization or system. Safety cannot be purchased or manufactured; it is not a feature that is separate from the other components of the system. This means that safety cannot be manipulated like a feedstock or raw material. The state of safety in any system is always dynamic; continuous systemic change insures that hazard and its management are constantly changing.安全是系统的一种涌现属性;它并不存在于组织或系统中的某个人、设备或部门。安全无法被购买或制造;它不是与系统其他组件相分离的一个特性。这意味着安全不能像原料或原材料那样被操控。任何系统中的安全状态始终是动态的;持续的系统性变化确保危险及其管理在不断变化。
-
People continuously create safety.人们持续创造安全。 Failure free operations are the result of activities of people who work to keep the system within the boundaries of tolerable performance. These activities are, for the most part, part of normal operations and superficially straightforward. But because system operations are never trouble free, human practitioner adaptations to changing conditions actually create safety from moment to moment. These adaptations often amount to just the selection of a well-rehearsed routine from a store of available responses; sometimes, however, the adaptations are novel combinations or de novo creations of new approaches.无故障运行是那些努力使系统保持在可容忍性能范围内的人员活动的结果。在大多数情况下,这些活动是正常运行的一部分,表面上看起来很简单。但由于系统运行从来都不是没有问题的,人类从业者对不断变化的情况的适应实际上每时每刻都在创造安全。这些适应通常仅仅是从可用的应对措施库中选择一个经过充分演练的常规做法;然而,有时这些适应是新的组合或全新创造的新方法。
-
Failure free operations require experience with failure.无故障运行需要有处理故障的经验。 Recognizing hazard and successfully manipulating system operations to remain inside the tolerable performance boundaries requires intimate contact with failure. More robust system performance is likely to arise in systems where operators can discern the “edge of the envelope”. This is where system performance begins to deteriorate, becomes difficult to predict, or cannot be readily recovered. In intrinsically hazardous systems, operators are expected to encounter and appreciate hazards in ways that lead to overall performance that is desirable. Improved safety depends on providing operators with calibrated views of the hazards. It also depends on providing calibration about how their actions move system performance towards or away from the edge of the envelope.识别危险并成功操控系统操作以保持在可容忍的性能边界内,需要与故障密切接触。在操作人员能够察觉 “性能极限边界” 的系统中,可能会出现更强大的系统性能。在这个边界处,系统性能开始恶化、变得难以预测或无法轻易恢复。在本质上具有危险性的系统中,期望操作人员以有助于实现理想整体性能的方式遇到并认识危险。安全性的提高取决于为操作人员提供经过校准的危险视图,还取决于提供有关其行动如何使系统性能接近或远离性能极限边界的校准信息。
Other Materials 其他材料
-
Cook, Render, Woods (2000). Gaps in the continuity of care and progress on patient safety. British Medical Journal 320: 791-4.库克、伦德尔、伍兹(2000 年)。医疗服务连续性中的缺口与患者安全方面的进展。《英国医学杂志》320 期:791 - 794 页。
-
Woods & Cook (1999). Perspectives on Human Error: Hindsight Biases and Local Rationality. In Durso, Nickerson, et al., eds., Handbook of Applied Cognition. (New York: Wiley) pp. 141-171.伍兹和库克(1999 年)。《对人为错误的看法:事后偏见与局部合理性》。载于杜尔索、尼克森等人编著的《应用认知手册》。(纽约:威利出版社),第 141 - 171 页。
-
Woods & Cook (1998). Characteristics of Patient Safety: Five Principles that Underlie Productive Work. (Chicago: CtL) 伍兹和库克(1998 年)。《患者安全的特征:高效工作背后的五项原则》。(芝加哥:学习与教学中心)
-
Cook & Woods (1994), “Operating at the Sharp End: The Complexity of Human Error,” in MS Bogner, ed., Human Error in Medicine , Hillsdale, NJ; pp. 255-310库克和伍兹(1994 年),《处于前沿:人为错误的复杂性》,收录于 MS·博格纳编著的《医学中的人为错误》,新泽西州希尔斯代尔;第 255 - 310 页
-
Woods, Johannesen, Cook, & Sarter (1994), Behind Human Error: Cognition, Computers and Hindsight , Wright Patterson AFB: CSERIAC.伍兹、约翰内森、库克和萨尔特(1994 年),《人为错误背后:认知、计算机与后见之明》,赖特 - 帕特森空军基地:CSERIAC。
-
Cook, Woods, & Miller (1998), A Tale of Two Stories: Contrasting Views of Patient Safety, Chicago, IL: NPSF库克、伍兹和米勒(1998 年),《两个故事:患者安全的不同观点》,伊利诺伊州芝加哥:国家患者安全基金会