I saw an interesting post complaining about the “magic” of clinical trials and the refusal by many people to consider other types of evidence rationally. One commenter there replies, “Let me give you an example I like to use. George Burns smoked 4 or 5 cigars every day and lived to be 100. Does that mean smoking cigars is harmless?”
I’d like to offer my 2 cents.
Let’s try a different example first, not George Burns. You go to the airport, planning to fly to Peoria and visit your aged mother. The ticket agent informs you that, statistically, fewer than .01 of the passengers leaving your airport go to Peoria, and therefore Peoria wouldn’t work for you and isn’t really what you want. She takes your money and hands you a ticket to Orlando.
Special circumstances? Not at all. The news is that every single person in the airport is going someplace “special” — namely someplace individually predetermined and non-statistical. But does that apply to clinical patients? Yes, of course it does.
My shoe salesman tells me very few people wear 15 EE. He sells me 10 C.
Statistics does have value. (Good! Otherwise, I’d be unemployed.) But it’s tricky when applied to individuals rather than groups.
It’s true that my chances of success would be poor if I had an unneeded ticket to Peoria and hoped to scalp it to some randomly selected person in the airport. But, if you’re not a random person — if you’re actually already you — and you want to go to Peoria, then you want to go to Peoria. In advance, Peoria is an unlikely prediction; but, retrospectively, about a given individual going there, it’s a slam dunk.
It’s true that, in the aggregate, the airlines are wise to run more planes for Orlando than for Peoria. But, to heck with the statistics of everyone else, I personally am still on my way to Peoria and I’d appreciate it if there were some flights that go there. I’d like my doctor to be considerate too.
Another scenario (unfortunately an actual one). My urologist diagnoses me as having benign prostate hyperplasia (a misdiagnosis, as it turns out: my urinary infections were caused by kidney stones he didn’t think to check for). He wants to do a TURP but informs me that there’s a risk of impotence (me, not him). However, I shouldn’t worry — because the rate is “only 5-15%.” I wonder about the fact that he, a physician, has no concept of the meaning of numbers. Think about it by making it concrete. Let’s suppose the rate is 15%. And suppose he averages 7 of these procedures each month. What he’s proposing is essentially that on the first of the month he will gather 7 of us in the parking lot. He’ll hand out seven envelopes, one of which randomly has a black spot. Would I be willing to accept such a gamble, as I stand out there in his parking lot wearing only my skivvies, slightly shivering in the cool morning air? No! Considering that the risk is impotence, are you out of your mind? Well, then, if the chance were “only” 5%, would I do it? You mean 20 envelopes and then I’m impotent? You mean .05 is supposed to be enough protection? Nope. Sorry. Not this month.
There’s a basic question of risks and benefits.
What George Burns illustrates is that there’s a population subgroup in which cigars can work. . . . Er, uh, I mean, in which cigars don’t hurt. . . . Er, I guess I mean, in which other factors predominate so the cigar effect isn’t noticeable one way or the other. Some factor(s) made him able to live to 100. If we’d known to recognize them, we’d have known not to nag him to quit smoking. But we didn’t know. We need more humility in our predictions — and a lot more curiosity. Our fault lies not in our stars. but in our laziness and self-assurance.
More seriously, many treatments may only be effective in subgroups: in specific categories of patients or in specific forms of the disease. Similarly, as in the case of cigars, if you’re more worried about safety than efficacy, any risks may be subgroup specific. Unfortunately, very few controlled clinical trials are designed with adequate protection against Type I and Type II errors in obviously relevant subgroups; and the subgroup analyses are designed as secondary analyses, rather than as part of the prospective primary analysis that’s the only one that counts scientifically. So “science” might not find out.
Admittedly, it’s easy to go wrong using case reports, anecdotes — if only because they usually violate a basic principle of science: just changing one factor at a time. (I should know: I’m a hypochondriac with a tendency to rationalize obsessively.) Still, anecdotes are heuristically useful in exactly the way just mentioned: they can point out the shape of the categories we’re dealing with.
Anecdotes may not be irrefutable, but they do provide some evidence. The more anecdotes you’ve got, the stronger they are.
See the previous post Clinical trial design — for beginners if you want to get into the nitty gritty of clinical trials and making them scientific.
The boundary between “anecdote” and “science” is less clear than some people imagine they think. I was once the statistician on a pilot clinical trial in acute spinal cord injury. It had a strongly positive result that we reported in NEJM. Later, an investigator colleague argued heatedly that, until the results of our multi-center trial came in, there was “no evidence” that the treatment works — because the accepted “criterion” of “science” is the one the FDA uses: there have to be two “pivotal” trials establishing the effect. He contended that, if a neurosurgeon were to call and inform me that my son was in his ICU with spinal cord injury, it would be “superstitious” of me to request the experimental treatment. Well, poppycock! In that situation, I wouldn’t be delivering an eternal scientific verdict that could wait to stand the test of time, but rather I’d be an individual making an informed, risky, time-critical pragmatic decision. The pilot trial is not “no evidence.” It’s the best, imperfect evidence I have available and it would be irrational to ignore it.
Conversely, the classic FDA criteria are not immune to disbelief. Getting two pivotal studies with p<.05 means the combined chance of Type I error is .05 x .05, or 1 in 400. That means it can be expected every 400 times. And you don’t have to wait until the last of a series of 400 in order to get the erroneous one: it can happen equally well at any point in the series, the beginning as easily as the end. The one you’re looking at could be it.
I have a friend, Fred Geisler, who points out that there is no controlled, randomized evidence to prove the superiority of having a parachute to not having one when jumping out of an airplane. Nonetheless, he insists on using one. He feels he has valid reasons for this policy and remains unembarrassed about relying on judgment. (I personally do not jump out of airplanes at all, even with a parachute, but that’s not the issue.)
Speaking as a monk of science, I regard no “fact” as ever completely proven. Many facts have overwhelming evidence — like Ptolemaic Astronomy did, or 19th Century physics — and the practical difficulty in overturning them would be formidable. Still, there’s always a mathematical chance of disproof. Given the history of progress in science, eventual overturn is inevitable. As a monk of science, I can wait for the final answer and I know that in principle no answer can ever be final.
However, as a person participating in decisions about health for myself and those I love, I’m not a scientist–precisely because I can’t wait. Time is going by and chances are being lost. I make practical judgment calls. I’d be a fool not to use the best science I can find as part of my process, but I’m not a scientist today. I need to come to closure on a decision. My evidence is imperfect; I use what I can find. My intellectual processing power is finite. But I can be rational. And I can exert myself to do my best.
“Science” is not getting p<.05, or even getting it twice; and it isn’t rigidly following any such arbitrary criterion. That’s, in fact, the opposite of science: it’s superstition, mumbo jumbo.
“Science” is also not some slowly expanding collection of static, “established” results.
“Science” is a way of life, an activity of questioning. It’s a willingness to evaluate experience objectively, aware of the specific kind and degree of reliability for each bit of evidence and the role it plays in the whole pattern.
Readers of this post may also be interested in Clinical trial design for beginners.