There’s an article by Jack Carney, DSW, on this topic on Mad in America. Jack refers to the DSM-5 field trials published earlier this year in the American Journal of Psychiatry.
Inter-rater reliability is measured by a statistic called a kappa score. A score of 1 means perfect inter-rater agreement; a score of 0 indicates zero agreement. In psychosocial research a kappa score of 0.7 or above is generally considered good.
Only one DSM-5 “diagnosis” was higher than 0.7 in the field trials. This was major neurocognitive disorder (essentially dementia). Major depressive disorder was 0.32; antisocial personality disorder was 0.22; obsessive compulsive disorder was 0.31; and so on. Even schizophrenia, the flagship “diagnosis,” scored only 0.46. You can see other values in Jack’s article.
What this means is that in the field trials, if one psychiatrist “diagnosed” a person with major depression, for instance, another psychiatrist was quite likely to come up with another “diagnosis.” They weren’t consistent. And remember, people participating in field trials are on their best behavior. They probably studied the new criteria, and were very conscious of the fact that their findings were being checked and scrutinized.
Most psychiatrists in their offices, I venture to predict, will buy DSM-5, glance at the changes, and put it on the shelf. Their inter-rater agreements will likely be lower.
APPEARANCE vs. REALITY
This is important, because the APA continues to push the notion that the manual is based on solid science. In fact, it isn’t, and never has been. Its purpose is to create the appearance of science, and to provide an umbrella under which psychiatrists can do pretty much whatever they like.
Here’s a little known quote from DSM-IV.
“The specific diagnostic criteria included in DSM-IV are meant to serve as guidelines to be informed by clinical judgment and are not meant to be used in a cookbook fashion. For example, the exercise of clinical judgment may justify giving a certain diagnosis to an individual even though the clinical presentation falls just short of meeting the full criteria for the diagnosis as long as the symptoms that are present are persistent and severe.” (p xxiii)
In lay circles this is known as having your cake and eating it too. Or perhaps it could be called “fuzzy science.”
SIGNIFICANCE OF INTER-RATER AGREEMENT
The poor inter-rater agreement is a serious problem, but as an issue it needs to be kept in perspective. One could have 100% agreement in this area and still be talking utter nonsense. For instance, suppose I were to form a society for the detection and prosecution of witches. We have a meeting and decide that we need to have hard and fast criteria for identifying these wicked ladies. So we get a panel of experts (which is easily achieved by shaking a big box of money). The experts draw up a list of identifying signs, each of which is sharp and unambiguous. Personally, I’m no expert on witchcraft, but I can imagine that they might produce items like: extra digit on left hand; red birthmark on thigh; owns a black cat, etc… Then, provided that each criterion is clear and precise, and that each rater sticks to the criteria, we will have 100% rater agreement.
But we’re still talking nonsense, because there’s no such thing as a witch. And DSM is nonsense because there’s no such thing as a mental illness.
Actually, I’m surprised that the DSM-5 figures weren’t better, because it’s not very difficult to get good reliability. Psychosocial researchers do it all the time. In fact, you can’t really do good research without good reliability. Suppose for instance, you want to study violence in schoolyards. You must first make sure that all your raters are on the same sheet of music when it comes to recording an incident of violence. If one rater is recording pushing as an act of violence, but another is not, then clearly the research will be fundamentally flawed.
Which means that any research based on DSM-5 will, of course, be fundamentally flawed, but we knew that anyway, because the concept of mental illness is fundamentally flawed.
DSM-5 vs. DSM-IV
The agreement figures for DSM-5 are noticeably poorer than the figures for DSM-IV. The likely reason for this is APA’s persistent desire to widen the net. One way to do this is to make the criteria less precise, which inevitably means that different raters will apply them differently.
So what can the APA do now? Will they have to scrap DSM-5 and start again? No. As I said earlier, it’s never been about science. It’s about marketing. My prediction is that they will either ignore the poor reliability matter, or spin it somehow into a positive feature. For instance, they might try to promote the notion that psychiatrists are less concerned about excessive fastidiousness than with providing real help to real people. If there’s one thing the APA is good at (and it may well be the only thing), it’s spin!
FLIP THE SCRIPT
The last job I had before retirement was in a prison. One of my major responsibilities was meeting with groups of prisoners, and facilitating discussions on subjects like anger, critical self-scrutiny, coping with conflict, etc…
One morning I was on my way to one of these meetings when I overheard a confrontation between a prisoner and an officer. Apparently the prisoner had stolen a loaf of raisin bread from the kitchen, and the officer was giving him a hard time. To which the prisoner replied, “If you people would give us enough food, we wouldn’t have to steal!”
I thought this was a beautiful piece of spin, but also that it was a mode of thinking that keeps people coming back into prison.
During the group session that morning, I mentioned the incident. All the guys started to laugh, and one man at the back said in a loud voice: “flip the script.” I asked them to explain, and the notion goes like this. If you’re ever accused of wrongdoing, your first priority is to neutralize or deflect the accusation. With the loaf of raisin bread, for instance, one could point out that it was stale, and that you were just saving the kitchen staff the trouble of throwing it away. So an act of theft becomes an act of civic responsibility. Or you can shift the “real” responsibility to someone else, which is what had been attempted in the incident I had witnessed earlier. A third variation that the men mentioned was deflecting attention, and they gave as an example something like: “Hey, a loaf of raisin bread is nothing. I saw one of the senior officers backing his truck up here yesterday, and he took out a crateful of beef!”
Politicians are good at this sort of thing too. One of the first rules of campaigning is that if you’re asked a difficult question, ignore it and answer a different question. This is a variation of flip the script.
And there’s a beautiful flip the script in DSM-IV, which the APA published in 1994. By that time there were rumblings of dissent in various circles with regards to the general concept of mental disorders/mental illnesses. And the various DSM-IV committees had to be aware of this. In their introduction to the revision, they might have addressed this matter, but they didn’t. Instead, they talked about reliability (i.e. inter-rater agreement) and in a notable display of self-congratulation, they proclaimed: “more than any other nomenclature of mental disorders, DSM-IV is grounded in empirical evidence,” and the reader is referred to a five-volume sourcebook of research findings.
But the thornier question about the ontological status of these disorders was deflected with a single sentence. “The need for a classification of mental disorders has been clear throughout the history of medicine, but there has been little agreement on which disorders should be included and the optimal method for their organization.” This is called preemptive strike flip the script, and my guys back at the prison would have been proud of the APA!
REAL SCIENCE vs. SHAM SCIENCE
We’ve heard a great deal in the news lately about the Higgs boson. I’m no expert on quantum physics, but I understand that this elusive particle is very important to physicists, who had expressed the belief that it exists way back in 1964. If it didn’t exist, they could think of no other way to explain the existence of mass. So they were very attached to the idea, but like true scientists, they refused to just take it for granted. They insisted that its existence had to be verified experimentally.
Well they built this enormous underground circular tunnel on the Swiss-French border (1998-2008), and for four years drove sub-atomic particles round this at close to the speed of light. They arranged for them to crash into each other and all sorts of other stuff. Until finally – a few weeks ago – they found the Higgs boson! Well – tentatively. They still have some minor reservations, and work continues, but it looks very promising.
What I can’t figure out is: why didn’t they just get together and take a vote, the way the APA do! It would have saved a lot of time and money.