published 07/29/09, updated 9/30/09
Many painful problems are surprisingly mysterious, and there are many theories about why people hurt. Debate can rage for years about whether or not a problem even exists. For instance, chiropractic “subluxations” have been a hot topic for decades now: are these little spinal dislocations actually real? What if five different chiropractors all looked at you, but each diagnosed different spots in your spine that were supposedly “out” and in need of adjustment?
That’s a reliability study.
Reliability studies are emotionally compelling. Evidence of unreliable diagnosis tends to make further discussion moot. If chiropractors can’t agree on where subluxations are in the same patient — and some studies have shown that they can’t1 — then the debate about whether or not subluxations actually exist gets less interesting. A reliability study with a negative result doesn’t necessarily prove anything,2 but they are strongly suggestive, and can be a handy shortcut for consumers. Who wants a diagnosis that will probably be contradicted by each of five other therapists? No one, that’s who.
What if five different chiropractors all looked at you, but each diagnosed different spots in your spine that were supposedly “out” and need of adjustment?In reliability science, we talk about “raters.” A rater is a judge … of anything. One who rates. The person who makes the call. All health care professionals are raters whenever they are assessing and diagnosing.
Reliability studies are studies of “inter-rater” reliability, agreement, or concordance. In other words, how much do raters agree with each other? Not in a meeting about it later, but on their own. Do they come to similar conclusions when they assess the same patient independently?
There are formulas that express reliability as a score, such as a “concordance correlation coefficient.” For the non-statistician, that boils down to: how often are health care professionals going to come to the same or similar conclusions about the same patient? Every time? Half the time? One in ten?
Ever?
This reliability thing is not subtle: you don’t need a second opinion for a gunshot wound. Ten out of ten doctors will agree: “Yep, that’s definitely a gunshot wound!” Well, almost.3
That’s high inter-rater reliability.
Lots of diagnostic challenges are harder, of course. Humans are complex. It’s not always obvious what’s wrong with them. This is why you need second and third opinions sometimes. And it’s perfectly fine to have low reliability regarding difficult medical situations. Patients are pretty forgiving of low diagnostic reliability quickly when professionals are candid about it. All a doctor has to say is, “I’m not sure. I don’t know. Maybe it’s this, and maybe it isn’t.”
What you have to watch out for is low reliability combined with high confidence: the professionals who claim to know, but can’t agree with each other when tested. Unfortunately, this is a common pattern in alternative medicine. And it is a strong argument that it’s actually alternative medicine practitioners who are “arrogant,” not doctors.
Ten out of ten doctors will agree: “Yep, that’s definitely a gunshot wound!”True story: a patient of mine, a young woman with chronic neck pain and nausea, went to a “body work” clinic for her problem. Three deeply spiritual massage therapists hovered over her for three hours, charging $100/hour — each, for a total of $900 — and provided (among some other things) a running commentary/translatation of what her stomach was “trying to tell her” about her psychological issues.
True story: my eyes rolled out their sockets. And my patient was absolutely horrified.
Obviously, if she’d gone to another gurgle-interpreter down the road, her gastric messages would have been interpreted differently.
That’s low inter-rater reliability.
There are numerous common diagnoses and theories of pain that suffer from lousy inter-rater reliability. Here are some good examples:
And so on and on. Over the months and years, I’ll add other nice examples to this list as they occur to me.