Not being able to reproduce famous published results is putting credibility of all scientists in deep trouble
Big data crisis- There’s an increasing concern among scholars that, in many areas of science, famous published results tend to be impossible to reproduce.
This crisis can be severe. For example, in 2011, Bayer HealthCare reviewed 67 in-house projects and found that they could replicate less than 25 per cent. Furthermore, over two-thirds of the projects had major inconsistencies. More recently, in November, an investigation of 28 major psychology papers found that only half could be replicated.
Similar findings are reported across other fields, including medicine and economics. These striking results put the credibility of all scientists in deep trouble.
What is causing this big problem? There are many contributing factors. As a statistician, I see huge issues with the way science is done in the era of big data. The reproducibility crisis is driven in part by invalid statistical analyses that are from data-driven hypotheses – the opposite of how things are traditionally done.
In a classical experiment, the statistician and scientist first together frame a hypothesis. Then scientists conduct experiments to collect data, which are subsequently analyzed by statisticians.
A famous example of this process is the “lady tasting tea” story. Back in the 1920s, at a party of academics, a woman claimed to be able to tell the difference in flavor if the tea or milk was added first in a cup. Statistician Ronald Fisher doubted that she had any such talent. He hypothesized that, out of eight cups of tea, prepared such that four cups had milk added first and the other four cups had tea added first, the number of correct guesses would follow a probability model called the hypergeometric distribution.
Such an experiment was done with eight cups of tea sent to the lady in a random order – and, according to legend, she categorized all eight correctly. This was strong evidence against Fisher’s hypothesis. The chances that the lady had achieved all correct answers through random guessing were an extremely low 1.4 per cent.
That process – hypothesize, then gather data, then analyze – is rare in the big data era. Today’s technology can collect huge amounts of data, on the order of 2.5 exabytes a day.
While this is a good thing, science often develops at a much slower speed, and so researchers may not know how to dictate the right hypothesis in the analysis of data. For example, scientists can now collect tens of thousands of gene expressions from people, but it is very hard to decide whether one should include or exclude a particular gene in the hypothesis. In this case, it is appealing to form the hypothesis based on the data. While such hypotheses may appear compelling, conventional inferences from these hypotheses are generally invalid. This is because, in contrast to the “lady tasting tea” process, the order of building the hypothesis and seeing the data has reversed.