Proceedings of HF 2002, Nov. 25-27, 2002, Melbourne, Australia

Analysis and Visualisation of Complex Behavioural Data: A Case Study of Disturbance Management in Anaesthesia

K.N.Keogh 

Department of Computer Science and Software Engineering

The University of Melbourne,

Parkville, 3052 Australia

 kathleen@cs.mu.oz.au

 

      E.A.Sonenberg

 Department of Information Systems

The University of Melbourne,

 Parkville, 3052 Australia

 l.sonenberg@dis.unimelb.edu.au

 

 

Keywords:  Complex data analysis, cognitive modelling

Abstract 

 We have studied the reasoning of anaesthetists involved in resolving a critical incident. This reasoning incorporates the diagnosis of disturbances simultaneously with therapeutic patient management activities. The cognitive behavioural data and activity representing each scenario is dense and rich with meaning although difficult to conceptualise as raw tabular traces of actions and events occurring over time. We have developed an approach for the transformation of our raw temporal data into novel visual representations that facilitate the interpretation of the data and enable comparative analysis of model/subject performance during cognitive modelling and in the assessment of subjects’ performance during training exercises. We make some preliminary observations as to how the approach illustrated here could be adapted to other dynamic domains.

1.      Introduction 

Studies in cognitive modelling often involve complex and dynamic processes that may be influenced by external actions as well as internal change. Domains of interest include critical care, military aviation and industrial (chemical and nuclear power) processing plants. Disturbance management defines the class of problems in which some fault(s) interferes with a process that is dynamically changing. An anaesthetist’s management of critical incidents occurring on the operating table is an example of disturbance management. The environment in the operating theatre is busy, noisy and complex (Watson and Sanderson, 1998). Anaesthesia is a complex medical field. The expertise required to cope well during the occasional ‘moments of terror’ that occur in the operating theatre is not easily defined (Gaba, 1994). Training anaesthetists to develop this crisis management expertise has involved the use of realistic simulators and acted out crises involving realistic teams of clinicians and nurses, using role play and verbal simulations  (Gaba, Fish and Howard, 1994). Educators generally agree that self-reflection and assessment of performance are important in teaching environments to facilitate learning. An ability to assess performance beyond the outcome - patient survival - may provide more in depth training feedback. Being able to quantitatively answer the question: ‘How well did I manage that crisis?’ may provide more insights than the answer to the critical question: ‘Did I provide an appropriate outcome for the patient?’ The latter question may be answered in simple terms by examining the final state of the patient. The former question opens up a broader investigation toward an examination of the clinician’s behaviour, actions and decisions undertaken during the exercise. In this paper, we will introduce methods to transform raw temporal behavioural data into visual ‘pictures’ of performance to enable one to distill a ‘story’ from the data that is otherwise difficult to represent. Descriptive metrics related to the Event Matrices which we present, enable numeric comparison of behaviour between subjects and executable cognitive models. We have used these tools in the iterative development of a (partial) computational model of disturbance management in anaesthesia (Keogh, Sonenberg, Lawrence, 1995) and have demonstrated their usefulness in identifying patterns in the data and illustrating differences in subject behaviour (Keogh, 2002). In the final section below, we make some preliminary observations as to how the approach illustrated here could be adapted to other dynamic domains.

2.      Background

2.1.     Anaesthesiology

The focus of the work reported here is a body of protocol data of clinician performance with simulated patients (Sonenberg, Lawrence and Zelcer, 1992, 1993). This rich data describes the management of patients having trouble under anaesthetic and needing urgent attention. The data provides some insights into the hypotheses and reasoning of the anaesthetists.  Characteristics of the domain are: high risk, uncertainty, time pressure, complexity and incomplete data in a dynamic environment. Individual patients respond differently in the same environment and the patient state is changing constantly, not only in response to actions by the anaesthetist, but due to physiological changes. The anaesthetist does not have time to consider all the alternatives. At any point in time, a decision must be made based on a sub-selection of data chosen to represent the patient state. The anaesthetist needs to share their attention between therapeutic actions to keep the patient alive and diagnostic reasoning and actions to resolve the difficulties which arise and that may be due to multiple causes. Gaba (1992, 1995) suggests that the most important skills needed in such a complex environment are the meta-level management skills of attention control and resource management.

2.2.     The data

Sonenberg, Lawrence and Zelcer (1992, 1993) developed a Macintosh based patient simulator, (the Poor Ventilation Simulator, PVS), which was used to produce patient data representative of ventilation problem conditions which might occur soon after a patient has been anaesthetised and a procedure known as endotracheal intubation has occurred. The PVS simulator was created based on simple heuristics that managed to capture the patterns of patient degeneration in these situations quite credibly (Sonenberg, Lawrence and Zelcer, 1992). The PVS simulator enabled the same incident to be presented to multiple subjects and automatically recorded the anaesthetists' actions whilst managing the problem. The incidents being simulated by PVS involved difficulties with patient ventilation and presented symptoms that may have had multiple explanations. The anaesthetist needed to simultaneously consider multiple hypotheses. During the simulation, the anaesthetist was asked to manage ‘the patient’ whilst the simulation was driven by the experimental team.

Three converging data sources were used to obtain evidence of the subjects' knowledge structures and steps in decision making whilst managing the problem. A computer generated (PVS) record of actions taken and the clinical context in which they occur; a verbalised record of the problem solving session; and a verbalised record of stimulated recall session immediately following the simulation exercise. The concurrent think aloud record of the session was used as the primary source of data for our analysis, although the stimulated recall data was also used. This data, representative of the anaesthetists’ cognitive and actual behaviour, was further coded into hypothesis categories and action categories as discussed in section 3. This coding was performed by the researchers with some independent informal verification by a domain expert.

There were seven subjects given the problem scenario simulating Oesophageal Intubation (the tube holding gases is accidentally inserted into the oesophagus instead of the trachea). Five subjects successfully diagnosed the problem. The subjects’ experience as anaesthetists ranged from 4 to 19 years. Below we will discuss two subjects in detail to illustrate the power of the data visualisation techniques. Subject A took 4 minutes and 14 seconds to stabilize the patient but decided on an incorrect diagnosis. Subject B took one minute and 38 seconds to correctly diagnose the problem and stabilise the patient.

3.      Distilling the story from the data

3.1.     The raw data

The raw data was transcribed from the audio-tape into time segments aligning the think-aloud protocol with the retrospective protocol and the values for visible patient data available from a log file. A sample of the integrated output log from subject A is shown in Figure 1. The columns show the values of clinical data as displayed on the screen during the simulation, the think aloud protocol, a summary of events in terms of IRA (Information, Reasoning, Action) and the stimulated recall protocol.

Figure 1. Partial raw data for subject A

When the data was presented in its raw tabular form as shown in Figure 1., several pages of output were created for each subject. For an examination of the data in this format, it was not easy to perform comparisons for each subject, nor get an understanding of how each subject performed. This provided some motivation for further exploration of other forms of representation of the subjects’ reasoning processes.

Subject A failed to recognise the problem. This subject performed the highest number of different actions and neglected to monitor the patient’s overall condition as frequently as other subjects. Subject A considered and tested more hypotheses than other subjects in the study and at one stage, neglected to review the general patient state for three minutes and fourteen seconds. This was a long period compared with the mean latency between general patient checks across all the successful subjects of just twenty-seven seconds. Examining subject A’s reasoning data in a table similar to Figure 1 does not highlight any of these observations, however building an Event Matrix (Streufert and Nogami, 1997) adapted to suit the data available, enabled us to easily generate metrics describing the actions performed and reasoning tasks that seem to have occurred. Cognitive events (diagnostic reasoning including data abstraction and hypothesis generation), were noted when results such as clinical findings and hypotheses, based on some reasoning were verbalised in the protocol data. The cognitive task abstraction involves a transformation of clinical data cues to more general findings. For example, raw data: pressure 27, O2Sat 96 might be transformed to: “pressure is fine”, “sats are low”. When a hypothesis was verbalised, we assumed that hypothesis to be the outcome from some abductive reasoning that occurred at approximately the same time. We categorised this cognitive event as hypothesis generation. The protocol data, together with the stimulated recall provided us with details regarding hypotheses and when they were eliminated. All other events were action events for example, inspect patient, get patient data, listen for bronchospasm, hand ventilate. Action events include diagnostic data gathering tasks or therapeutic tasks.

3.2.     Event Matrices

We have adapted the work of Struefert and Nogami (1997) to suit our data by omitting (unavailable) strategic or planning data and including hypotheses generated as cognitive events. We found Event Matrices and the associated metrics to be a useful tool for describing and presenting our data. Our intention in creating these visual representations was to make evident the interaction of cognitive events with actions (diagnostic and therapeutic). The Event Matrix for subject A is shown in Figure 2.

Figure 2. Time-Event Matrix for subject A.

Event Matrices were drawn for each of the seven subjects in our study and comparative generic metrics were calculated. These metrics provided the opportunity to compare the activity levels of each subject during the simulation. The metrics calculated were:

·         Number of actions performed

·         Number of initiative actions (driven by diagnostic reasoning (focused actions) or a strategic decision to interrupt a reasoning task in order to monitor the patient (unfocused actions))

·         Number of respondent actions (a response to prior data/information)

·         Breadth of functioning, that is, the number of unique action types considered during the scenario

·         Number of hypotheses considered during the scenario.

The event matrix was constructed by plotting a mark for each event against the event time at which it occurred. Each event type was assigned a unique integer to represent that event on the vertical axis. Time was shown on the horizontal axis. Events were spaced evenly on the horizontal axis without consideration of the time lag between events. It is therefore important to note that each tick mark along the X-axis represents the next event in sequence, rather than a specific period. A scenario with many events will have more tick marks than a scenario with few events, regardless of elapsed time. The distinction between cognitive reasoning and action was achieved by placing cognitive events on the positive Y-axis and action events on the negative Y-axis.

The matrix for subject A shows 20 action events and 27 cognitive events. Events that were related were highlighted by arrowed lines from the first event to the related action that followed. For example, the five abstraction events have arrows pointing at each event from the corresponding actions (check vitals and get data), responsible for generating the data leading to the abstraction event. Similarly, respondent actions were linked by an arrowed line from the time of the first action to the second action that was performed in response. On Figure 2, the respondent action - long squeeze: ETC02 - at time 2:44 is performed in response to the action: get data at time 2:21. The (unfocused), initiative action - check vitals - occurs four times.

The generic categorisation of actions as either initiative or respondent actions corresponded naturally to reasoning driven or data driven actions respectively in the anaesthesia domain. Categorising the action events in this way enabled comparison between subjects’ management of the crisis regarding how they reacted to data and how often diagnostic or strategic reasoning led them to initiative actions. As some actions were repeated during each simulation experiment, the total number of actions including repeats and the number of unique actions were counted separately. Subject A performed 20 action events, 18 of these were initiative actions, 2 were respondent actions. In total 13 unique actions were performed. Compare this to the metrics for Subject B: 11 action events, 8 initiative actions, 3 respondent actions and 5 unique actions. This comparison makes it clear that subject A performed many more initiative acts (i.e. based on reasoning) than subject B. In fact, subject A performed the highest number of unique actions compared with all subjects in this study, and neglected to monitor the patient’s overall condition as frequently as other subjects. It was possible to make such observations easily and objectively by comparing the metrics for all subjects in the study.

Deciding when to act, when to give attention to new patient vitals and when to devote attention to diagnostic reasoning, requires skill. The absence of attention control management skills in disturbance management can introduce errors leading to poor success (Woods, 1994). An unfocused initiative action of particular interest to this study was `Check vitals': the task of observing the current patient status (checking vital patient cues). This action is clearly tied to an active decision to interrupt diagnostic reasoning to monitor the patient. This initiative action was not based on diagnostic reasoning as much as a strategic decision to interrupting the diagnostic reasoning. Deciding when to interrupt reasoning in order to update the current knowledge can be part of a management strategy that can influence the successful resolution of the critical incident (Gaba, 1994). Subject A had a mean period of 1minute 26seconds between each check-vitals task. Comparatively, Subject B had a mean period of 35seconds between each check-vitals task. Subject A neglected to review the patient vitals as often as other subjects and overlooked serious deterioration in the patient whilst engaged in diagnostic reasoning and focused initiative actions generating data requests based on that reasoning. Employing a different strategy - to interrupt the diagnostic reasoning activity to monitor vital patient cues more often - may have led to a more successful outcome for subject A. Subject A suffered from a lack of new data on which to base reasoning and generate new hypotheses. This subject engaged in more hypothesis-driven reasoning rather than reasoning based on data.  

Analysis of the data toward describing the strategic behaviour of each subject, necessarily focused on the cognitive events in addition to the actions performed. Transforming the time-event matrix format to time-event maps as described in the next section revealed a visual map of the patterns in the cognitive behaviour of each subject, in line with the tasks performed. By following the peaks from left to right, a visual time trace of decisions made and actions chosen was produced.

3.3.     Event Maps

Event Maps were created by connecting each plotted point on the Time-Event Matrices and considering the shape of the area enclosed by the resulting graph and axes.

 Figure 3 shows the Event Map for subject A.

Figure 3. Event Map for subject A

The Event Maps served to provide a useful picture of the behaviour of subjects that allowed for a quick visual comparison between subjects. Visual comparison of these Event Maps for each subject revealed interesting observations. Activity above the horizontal axis represents cognitive tasks and activity below represents action. It was possible to look at an Event Map and immediately get a picture of how active a subject was, how many hypotheses were considered and for how long each hypothesis was maintained.

The shape of the Event Map above the X-axis indicates a pattern of hypothesis generation. When hypotheses were maintained over a period, there are plateaus, rather than sharp peaks. The actual height of the peaks and angles of the enclosing lines are not relevant, however the number of peaks is of interest. For example, consider Figure 4, the Event Map for subject B. There are four hypotheses plateaus appearing at 00:52. At 01:02 one ends and another ends at 01:09 which indicates that hypotheses ventilator problem and circuit disconnected respectively were eliminated at these times. The other two plateaus extend a bit longer showing that two hypotheses were maintained as plausible: kinked tube and oesophageal intubation. At 01:21 a new (related) hypothesis tube in wrong place appears and a test performed at 01:38 which confirms the final hypothesis: oesophageal intubation.

In our analysis, the Event Maps for both unsuccessful subjects shared some characteristics in shape that differed from those of other subjects. Comparing the event maps for each subject, the event map for subject A shows more activity points than that for subject B. Subject A’s event map has more peaks, particularly below the X-axis due to the higher number of tasks performed. The count of initiative actions was much higher for subject A than subject B, probably indicating more hypothesis driven reasoning which is not a strategy associated with diagnostic expertise (Patel, Groen, Ramoni and Kaufman, 1992). By performing ‘extra’ tasks, the activity workload (mental and physical) for subject A was higher than that for subject B. Both unsuccessful subjects in our sample performed more initiative actions and had a high number of peaks on their event maps than the successful subjects. Subject A’s event map shows a peak appearing and then being repeated later in time, rather than being maintained as a plateau. This indicates that rather than keeping alive a plausible set of hypotheses, this subject dropped or eliminated hypotheses and then re-considered them later.

Figure 4. Event Map for subject B

Event Maps do not indicate when a new related hypothesis is generated, for example, as a refinement of a previous more general hypothesis. This information is depicted in the Hypothesis Trees discussed in the next section.

3.4.     Hypothesis Trees

For a diagrammatic comparison of hypothesis generation that enabled the identification of related hypotheses, we designed Hypotheses Trees (Keogh, 2002). The hypothesis trees were manually drawn based on the hypothesis generation data to complement the Event Matrices and Event Maps. Each time a hypothesis was considered, a node was drawn. Each level in the tree was representative of a new time in the scenario, so the root to the hypothesis tree is the first hypothesis considered and moving down the tree, child nodes represented subsequent hypotheses. Related, refined and reconsidered hypotheses were connected. As hypotheses remain under consideration, nodes for those hypotheses are re-drawn at each subsequent new time level, and connected to previous nodes representing the same hypothesis. When a hypothesis was refined, its node became the root for a sub-tree of related hypothesis nodes. Branches in the tree were disconnected from higher branches in the tree if there was no obvious relationship between the previous hypotheses and newer hypotheses. Leaf nodes represented hypotheses that were eliminated. If a hypothesis was eliminated and later reconsidered, the latter node representing the time of reconsideration was drawn vertically beneath and connected to the node that represented the first consideration of that hypothesis. The highest-level leaf node (at the bottom of the tree) represented the final diagnosis. The set of hypotheses under consideration at any time are positioned in line horizontally.

The shape of the tree provides information regarding the hypothesis generation behaviour of the subject. Based on our visual inspection of the trees for all seven subjects in our sample, the successful subjects’ trees followed one of the following patterns:

·         tree spanned out from a root node to a subtree containing many (4-5) nodes at level 1. The tree then narrowed from a larger number of nodes (hypotheses) at level 1 (early in the scenario) to fewer nodes at higher levels (later in the scenario).

·         tree had only one or two unconnected branches.

To illustrate hypothesis trees further we introduce two additional subjects to our discussion. Subject C was successful and reached a diagnosis in 2 minutes 52 seconds. Subject D was unsuccessful, taking 3 minutes and 35 seconds. Subject B and C’s hypothesis trees followed the first pattern as shown in Figures 5 and 6. In contrast, the trees representing hypotheses considered by unsuccessful subjects - A and D, contained unconnected nodes, or longer single unconnected branches each leading to a leaf node. These trees did not narrow to a smaller number of nodes late in the scenario. Figures 7 and 8 show the hypothesis trees for subject A and D respectively.

In our analysis, the successful subjects seemed to follow strategy patterns such as constraining the problem space rather than expanding it and employing data-driven reasoning based on current data. Such strategies have been associated with expertise (Patel, Groen, Ramoni and Kaufman, 1992).  

Figure 5. Hypothesis Tree for subject B                     Figure 6. Hypothesis Tree for subject C

Figure 7. Hypothesis Tree for subject A                      Figure 8. Hypothesis Tree for subject D

4.      Conclusion

We have discussed the description, analysis and modelling of performance data in a complex dynamic domain. Because of the rapidly changing context and the number of actions involved, the data were difficult to conceptualise in tabular form. Event Matrices, Event Maps and Hypothesis Trees were introduced to represent the reasoning and events involved in each scenario, providing succinct visual presentations for more convenient and informative analysis and comparison of the data between subjects.

Based on our experience with data in the domain of Anaesthesia, we found a natural connection between the generic categorisation of actions as initiative or respondent adopted by Streufert and Nogami, and the concepts of reasoning-driven actions and data-driven actions identified separately in our domain (Sonenberg, Lawrence and Zelcer, 1992). These categorisations gave rise to insights in highlighting how our subjects managed to interleave reasoning with action. Event Maps create a map of the behaviour over time showing the interaction between reasoning and action, allowing for visual comparison between subjects which may suggest different approaches or strategies that each has adopted. Hypothesis Trees present the hypothesis data set as it is developed and refined to a solution. The shape of the resulting tree may help identify behavioural patterns in hypothesis generation. These visual approaches could be used to present similar complex dynamic data from any domain in which events may be grouped into categories of comparative interest, and in which hypotheses can be elicited from subjects in some form.

The analysis methods we describe could be used in similar work comparing subject behaviour to model behaviour or in describing the behaviour of individuals involved in training exercises. Before embarking on such a mapping exercise, it is essential to draw on expert knowledge of the domain to identify, within the stream of available data on individual subject performance, the key facets whose representation will enable meaningful comparisons between subjects. While some elements are likely to have a domain independent character (cf the concepts of respondent and initiative actions in Streufert and Nogami's analysis), expert knowledge will be required to translate these to a specific domain, and will be essential in other aspects of the analysis exercise. Once these definitions are available, much useful work can be done without major expert input, and depending on the nature of the events that need to be recognised and tagged, at least some of this coding could potentially be automated.

5.      References

·         Gaba, D.M. (1992) Dynamic decision making in anesthesiology: Cognitive models and training approaches, in Evans, D.A. and Patel, V.L. (ed.) Advanced models of cognition for medical training and practice, (Springer Verlag) pp 123-147

·         Gaba, D.M. and Howard, S.K. (1995) Situation awareness in anesthesiology. Human Factors, 37:20-31.

·         Gaba, D.M; Fish, K.J and Howard, S.K. (1994) Crisis Management in Anesthesiology, (Churchill Livingston).

·         Keogh, K.N (2002) A Computational model of disturbance management in anaesthesia. Masters Thesis, Department of Computer Science and Software Engineering, The University of Melbourne.

·         Keogh, K.N; Sonenberg, E.A.; and Lawrence, J.A. (1995) Towards a computational model of disturbance management in anaesthesia, in Yao, X (ed.) Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence. (World Scientific: Singapore) pp 115-122

·         Patel, V.L; Groen, G.J.; Ramoni, M.F.; and Kaufman, D.R. (1992) Machine depth versus psychological depth: A lack of equivalence, in E Keravnou (ed.) Deep models for Medical Knowledge Engineering, (Elsevier) pp. 249-272

·         Sonenberg, E.A; Lawrence, J.A. and Zelcer, J (1992) Modelling disturbance management in anesthesia: a preliminary report, Artificial Intelligence in Medicine, 4:447-61.

·         Sonenberg, E.A; Lawrence, J.A and Zelcer, J (1993) Keeping the patient asleep and alive: Investigating specialists’ disturbance management in anesthesia, Proceedings of the Second Australian Cognitive Science Conference, (The University of Melbourne).

·         Streufert, S and Nogami, G.Y. (1997) Analysis and assessment of planning: The view from complexity theory, in S.L. Friedman and E.K.Scholnick (ed.) The Developmental Psychology of Planning: Why, How, and When Do We Plan? (Lawrence Erlbaum Associates)

·         Watson, M and Sanderson, P (1998) Work domain analysis for the evaluation of human interaction with anaesthesia alarm systems, Proceedings 1998 Australasian Computer Human Interaction Conference, (OzCHI) p 228

·         Woods, D. (1994) Cognitive demands and activities in dynamic fault management: abductive reasoning and disturbance management, in Stanton (ed.) Human factors in alarm design. (Taylor and Francis).