Commentary
David Ogilvie takes you through the paper and discusses what it means
All medical students will be aware of recent concern about medical errors, and of moves to reduce the working hours of junior doctors. It seems reasonable to wonder whether doctors' performance is affected by staying up for most of the night, but how do you turn this into an answerable research question? Doctors' work involves a complex mixture of analysis, decision making, and practical tasks. Some of these are easier to study than others. In this paper, the authors have chosen to isolate one aspect of medical work—the basic practical tasks of laparoscopic (keyhole) surgery—and see whether these are affected by sleep deprivation.
What did they do?
Rather than study real life operating, they subjected a few surgeons in training to a set of tasks on a surgical simulator. They tried to minimise any differences between the surgeons by giving them all identical training on the simulator first. They scored the operating performance of each surgeon on a normal day and then the morning after a night on call, looking at three aspects: time taken, number of errors, and number of unnecessary movements.
What did they find?
The figures show the results for task 6, which was the most complex of the tasks and, they say, the task most likely to reflect real surgical performance.
These graphs are called box and whisker plots.
They are a convenient way of showing the distribution of numerical scores. The boxes show the range of scores from the 25th (bottom 25%) to the 75th (top 75%) centiles—in other words, half of the results lie within the box. The whiskers (the lines extending beyond the boxes) show the range of all the results.
You can see from the graphs that some of the distributions are skewed (unbalanced in one direction). For example, look at the box and whisker plot for the number of errors after a night on call. The position of the box within the range of the whiskers shows that the results are skewed towards the lower end of the range. When data are skewed like this, statistical tests which are based on the normal distribution may be invalid. This may be one reason why the authors chose to use a non-parametric test (tests which are specifically for data which are not evenly distributed) for their analysis. The p.values shown on the figures indicate that the before and after scores were significantly different, or in other words, the differences are unlikely to have occurred by chance.
What do the results mean?
The authors summarise their findings by saying that surgeons show impaired speed and accuracy in simulated laparoscopic performance after a night on call. If you read the electronic responses to the study on bmj.com you can see what other readers have said about the study and the authors' conclusions.
We should ask whether these results are generalisable. It is worth remembering that only 14 surgeons took part in the experiment, and they were all inexperienced at laparoscopic surgery. We do not know whether more experienced surgeons would show the same impairment, or whether a study on a larger number of surgeons would find the same effect.
We should also ask whether these results actually matter in practice. For example, the surgeons took an average of 5.6 seconds longer to complete task 6 after a night on call. Even if we accept that this is a real difference, would it matter to the patient on the operating table? The authors do not directly address this question, which frequently arises in research: “Is a statistically significant difference also a clinically significant difference?” A third question about the results is: “Are they valid?” In other words, did the study set out to measure what we actually want to know? Operating on a simulator is not the same as a real life operation, but surely we are more interested in surgeons' real operations than simulated ones. We need to know how well the simulator reflects real surgical practice.
So what?
Results always require interpretation. Even if we can agree on whether results are valid, significant, and generalisable, we may disagree further on what their implications are. The authors say that research should be directed towards measures to counter fatigue, but others may draw different conclusions—for example, that the results indicate that working patterns need to change to avoid the need to operate after a night on call. What do you think?
David Ogilvie, specialist registrar in public health medicine, Hamilton, Lanarkshire
Email: david.ogilvie@lanarkshirehb.scot.nhs.uk
studentBMJ 2002;10:1-44 February ISSN 0966-6494