In case you’ve been living under a rock, science is in a replication crisis. Disappointingly often, when an interesting effect is claimed to be statistically significant, other scientists or laboratories cannot reproduce it, even under similar conditions to the original experiment. If nobody can reproduce what other scholars are claiming is true, or test and therefore extend their hypotheses, then it’s unclear how science can progress.
The single phrase which perhaps best captures the reason why is ‘researcher degrees of freedom’. There are just so many different ways that a scientist can creatively decide which outcomes to report, which kinds of analysis to run, how to use arbitrary thresholds for statistical significance (‘p-hacking’), and so on. Even with perfectly good intentions, scientists are humans, and it was not realised until relatively recently just how unreliable these practices have made large bodies of evidence1.
In my view, the moniker ‘crisis’ is justified. In one famous review paper from the Center for Open Science (COS), only a third of psychology papers were found to be replicable. In this context, ‘replicable’ means that the re-run version of the experiment finds a statistically significant effect in the same direction. The crisis is perhaps most acute in psychology, but many other fields have been seriously undermined, including medicine. By some estimates, less than half of pre-clinical research in the life sciences is replicable2. When Sanjay Srivastava came to designing his seminar series at the University of Oregon about the state of scientific practice in psychology, he called it ‘Everything is F*cked’.
The COS also tried to reproduce a corpus of cancer papers, and were unable to do so in 59 per cent of cases. To be clear, that doesn’t mean that six out of every ten cancer papers are bogus. It could very well be that COS doesn’t have the right methodology or tacit knowledge to achieve the same conditions as the original research. The optimal level of reproducibility is not 100 per cent, and it shouldn’t be the same across fields3. Still, I believe it’s fair to say that reproducibility ought to be the norm.
As an aside, the Center for Open Science is an independent non-profit and operates outside any university. One of the many troubling aspects of the replication crisis is the ways in which academia has failed to be a self-correcting system, and has been relying on outside observers or groups to properly scrutinise research.
Non-replicable research leads society down wild geese chases at immense cost. By some estimates, $28 billion is spent per year just in the United States on preclinical research whose results cannot be reproduced. These expenditures have been in the news recently in the context of Alzheimer’s research. Sylvain Lesné’s landmark 2006 paper and subsequent influential body of research supporting the controversial Aβ*56 hypothesis turns out to have been based on image manipulation and other suspicious characteristics. After a lengthy saga, it was eventually retracted in 2024. While I don’t subscribe to apocalyptic pronouncements that this has caused us to waste two decades of Alzheimer’s research, huge amounts of researcher time and money were misdirected because of a failure to properly incentivise replication.
Thankfully, outright scientific fraud is rare. But when you look at how lazy examples of academic fraud can be, it depresses you. One infamous case is Dan Ariely at Duke University. Data forgery was spotted in one of his famous journal articles because the number of miles driven by cars in his dataset was “by coincidence” entirely uniformly distributed, and nobody noticed until after it had accrued hundreds of citations. In a tragic irony, the subject matter of the paper was… honesty.
The definitive history of the replication crisis is still yet to be written, but for a general audience, Stuart Ritchie’s Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth is as good a place to start as any. Gavin Leech had a wonderful project called Reversals in Psychology, documenting many of the worst reversals of psychology studies4. The most worrying cases were the ones in which the true effect was in the opposite direction from what was originally reported.
Many solutions have been proposed to the replication crisis. For example, more and more studies are required or encouraged to be preregistered, in which it’s stated in advance which outcomes and analysis will be reported. Others have argued that social sciences need to have a more consistent paradigm for the concept of ‘replication’ to even make sense. Similarly, Duncan Watts argues some fields inherently need to become more solution-oriented in order that the results be ‘replicated’ by their real-life applications. It’s sometimes said that there is no replication crisis in physics, electrical engineering, or chemistry, because we “replicate” them every day with our use of technologies dependent on them. That is a loose description, but it certainly contains more than a kernel of truth.
Generally speaking, those reforms sound good to me. But they are all approaches to stopping so much non-reproducible research from being created in the first place. Even under the best possible reforms, the problems wouldn’t be solved overnight. And, one way or another, we’re still left with the past body of scientific evidence to evaluate, which used the previous (faulty) methods. All of which is to say: It would be great to have some kind of mechanism to directly incentivise more replication studies.
While we don’t have a complete understanding of what caused the replication crisis, one can think of lots of reasons. For one thing, replications are tedious: Reproducing someone else’s results requires an extraordinary level of patience, as well as a certain disagreeableness and courage to be disliked. I can’t imagine that systematically trying to reproduce one hundred psychology experiments that you might not even be interested in is particularly fun. Also, once you’ve decided to conduct a replication, it can be difficult to get your results published. While not many journals explicitly reject replications, few actively seek them out, and they are implicitly discouraged by a strong focus on novelty. Matt Clancy has a nice literature review about how academia’s “publish-or-perish” culture creates a systematic bias toward non-reproducible research5. Opinions differ on the magnitude of publication bias, but few doubt that it exists: a preference for publishing positive and original results, and against publishing negative results and replications6. I like to think of investing in science in terms of how much truth you get per euro. From what we’ve learned about the replication crisis, I expect that funding replications has one of, if not the best, conversion rate from euro into useful knowledge.
Some people say that the centrality of replication to science has been overstated. It’s also said that the “replication” framing overgeneralises from psychology. Those things might both be true. For many of my favourite social science papers, it’s not clear what a replication would even look like. There are many ways that a paper can advance the literature. This is entirely compatible with lack-of-replication being a huge problem, about which far too little is being done.
Incentives for replication are a special case of encouraging more red-teaming in science. There is much else to be said about this topic – some are fond of the idea of assigning, wherever possible, data collection and data analysis to separate individuals – but for now we will stay focused on replication.
Once, when I was at a conference which featured some discussion about the replication crisis, I overheard a passerby say that “only academics would call learning a crisis”. Indeed, we’re learning a lot about opportunities to do things better. What I want to talk about in this post is what creative ways of allocating funding specifically for replications might look like.
Ireland could be the world’s hub for replications
In last month’s post, I mentioned how, even though Ireland is small, if we are willing to specialise, there are individual research fields in which we could become a major or even by far the dominant funder. Replication studies are an example. The director of the National Institutes of Health, Jay Bhattacharya, has said that, if we were serious about fixing the replication crisis, then a significant fraction of the total budget of the NIH would be directed toward replications. To date, the NIH has dedicated only $2 million to replication, or less than 0.01 per cent of its budget.
After the use of some kind of lottery system for distributing grants, earmarking a general pool of money for replications is probably the most common reform inspired by the field of metascience. But these attempts have run into a few issues. In the case of the NIH, the offer of funding came in the form of an email to all of the 37,500 principal investigators on their system, which offered to pay for another lab to poke holes in the research. The result was a programme that received only a handful of total applicants. It’s a hard sell to get academics to volunteer to have their work face additional scrutiny. In the end, the NIH expects to fund around six projects through this programme. Still, it is an interesting pilot, and a step in the right direction.
Other efforts have included a venture by the Dutch Research Council, in which researchers were invited to repeat major studies, but their database only lists 24 examples of completed or in-progress projects. There had been other replication efforts by a Brazilian nonprofit and Germany’s science ministry.
To the best of my knowledge, Ireland has never had dedicated funding for replications. As yet, we have no equivalent of the UK Reproducibility Network. Of course, the normal funding channels do sometimes distribute money to replications, and Ireland is home to some great figures in replication science, including Dermot Lynott at Maynooth University. I would also single out the Sports Science Replication Centre at TU Dublin for praise. Their review paper from June this year is worth a read. It would be well within even the current level of Irish expenditure on science to open many more such centres, and become a global centre for replications. It may not make the Irish popular at conferences or among their colleagues, but the effect on promoting scientific truth would be a great investment.
Replication residencies
Lauren Gilbert, who leads much of the work related to metascience at Renaissance Philanthropy, suggested an idea to me that she calls the Randomised Replication Residency7. This would be a two-year largely autonomous fellowship for a scholar with a demonstrated track record of taking down bad science to work on grantmaking teams, both to replicate existing work, and to teach those teams about the replication process. You can read Lauren’s full proposal here.
If I were given €10 million to improve the state of science to distribute however I wanted, I would be tempted to just give it all to Andrew Gelman. In his blog, textbooks, and papers, Andrew has been amazingly effective in poking holes in dodgy research, including papers which have been the explicit basis of policy. Another personal hero of mine is Elisabeth Bik, author of the Science Integrity Digest. She is well-known for having an extraordinary eye for spotting fishy elements in scientific figures8. Initial problems that she noticed in the preliminary research on hydroxychloroquine and ivermectin as a COVID treatment led to a cascade involving dozens of retractions. She works outside traditional academia, and her work is partly funded via Patreon.
Alas, one worries about how scalable such a replication fellowship would be. I suppose that any plan that relies on recruiting a maverick genius is not really a plan. On the other hand, it’s an untested proposition how much positive impact we can have with just a handful of Replication Residents.
Replication bounties
Attempts to promote replication so far suggest that there needs to be more of an incentive for academics to cooperate with and voluntarily sign up to these efforts. Currently, we’re all stick and no carrot.
When I talk to scientists with an interest in replication, they often tell me that targeted replications and bounties would be a better way to increase funding than a general pool of replication funds. After all, not all replications are interesting, nor is all original research. I’ve seen this being done by some philanthropists and private foundations, as when two researchers were hired to investigate a famous economics paper about the centrality of the the potato (really!) to the history of urbanisation.
The crudest way to structure a replication bounty might be to have funding for attempted replication scale the more citations the research has. There is a cottage industry of ‘smart citations’ and other metrics that more closely follow a piece of research’s impact. The State could also offer bounties to conduct replications of the research most relevant to its current decisions. This could also tie in with the Replication Resident idea, where such a person could have the discretion to offer a bounty for the replication of some research they think to be important.
Conclusion: The importance of spillovers
The most frequent comment I received over email and in-person about last month’s post was: “How many of the spillovers from science are really national?”. The new technologies and positive social developments flowing out of Irish research may benefit mostly other countries. The subtext was usually: sure, altruistically Ireland should be funding science more, but selfishly shouldn’t we just free ride and let other countries do all the serious science?
Similarly, the benefits of replication studies will flow largely abroad. Is it equally true for a country as for an individual, that it doesn’t really pay to be obsessed with replication? There are a few things to say to this:
The Alzheimer’s example and others suggest that large amounts of money are currently being misdeployed, in Ireland as elsewhere, because of non-reproducible research. You may expect the large pharmaceutical sector here to have a long-term interest in this.
It can be difficult to get political support to finance research that is done entirely abroad. But I suspect that there might be more political tolerance of Ireland funding replications that are conducted overseas. One way to frame replications is a high-leverage way to improve Irish research, acting as a multiplier on existing efforts.
In a small open economy like Ireland, much more of the value of spillovers will go abroad, compared to large and less trade-dependent ones9. All else being equal, we would expect a much higher fraction of the value created by American research to stay in America compared to a place like Ireland. In the Jones-Summers model we discussed last time, the 67 per cent estimated social rate of return was actually before accounting for international spillovers, which raises it even higher. This indicates that, even if they didn’t care one iota about the entire rest of the world, the US would still be underinvesting in research. Another way of looking at it is that, even if the bucket of the social benefits of science has many big holes in it, it’s an awfully big bucket.
There are reasons why I suspect that spillovers are more geographically concentrated than you might expect. For example, I take one of the lessons of Ed Glaeser’s pioneering work on economic geography to be that universities are one of the key factors in explaining which cities have declined in population and economic outcomes. There are many reasons for that, but I find it at least suggestive that research spillovers often occur more locally than an entire nation, even at the level of an individual city. The way I interpret Glaeser’s research, the main causal pathway for that effect is that higher human capital lives in your city when it has universities, rather than direct research spillovers. On the other hand, if a city promotes good science, won’t more educated, thoughtful, and interesting people want to live there?
Perhaps the connection to economic geography sounds a bit tenuous. We really don’t know how spillovers work or how the calculus of funding research might change in small open economies. But I would argue that that very fact illustrates several of my points. The body of research which produces estimates of the social returns on government funding of science is tiny. There are only a handful of papers on this topic which use either modern causal inference or have a sound theoretical basis. I haven’t been able to find a single one about the EU. If it turns out that those papers don’t replicate, that would be a huge issue! It could possibly invalidate the entire thesis of last month’s post, which was that governments are massively underinvesting in research. If the original body of research has enough problems in it, it could even reverse the implications for policymakers. And yet, there is currently essentially no incentive for anyone to closely scrutinise that body of evidence. To me, Replication Residencies and bounties seem like a promising way to do exactly that.
In future posts, I would like to explore these cases where government decisions hinge upon a small number of papers or low-quality evidence. It’s more common than you might expect for the highest quality of evidence to inform a decision in medicine, economics, or public policy to be extraordinarily weak. Among the many reasons why we want replications is to improve that.
Sam Enright is the Innovation Policy Lead at Progress Ireland, and editor-in-chief of The Fitzwilliam. If you have thoughts about this piece, you can email sam@progressireland.org. You can follow him on Twitter here.
For those of you wanting to brush up on Leaving Cert statistics: a significance level is a cutoff to declaring a p-value as significant, where a p-value is the probability of observing such an extreme result conditional on the null hypothesis being true. A common misunderstanding is that p-values are about the probability that a result would occur “purely by random chance”, and in fact misapplication of such methods was one of the root causes of the replication crisis to begin with. I digress…
I appreciate the irony of making policy recommendations about replication informed by what is, in many cases, a small number of papers that may themselves not replicate. However, I will note that I draw disproportionately from economics, and that economics has weathered the storm of the replication crisis better than other social sciences.
You might distinguish between “computational replicability” (in which the methods are clear enough that your reader can re-run your code and produce the same set of results given the same observations) and “result replicability” (where other labs get the same answer). We should be aiming for 100 per cent computational replicability, but the optimal level of result replicability will depend upon a complicated mix of factors about how much relevant environmental variation there is to begin with. Unfortunately, we are still nowhere near even achieving computational reproducibility, as was memorably demonstrated in one of my favourite spoof papers, Data is not available upon request.
That project had a remarkable afterlife. Gavin’s list was taken up by the Framework for Open and Reproducible Research Training (FORRT) as the basis of their Reversals & Replications database. Yours truly can be found on the contributors page.
Commenters will often gesture toward “incentives” as the explanation for the replication crisis, but this is nuanced. There is something of an existence proof in the fact that some fields (generally in the hard sciences) have been making breathtaking progress, while others (generally social sciences) have been progressing much slower and have faced a replication crisis. This is despite the fact that the “incentives” faced by both sets of academics are very similar.
There is a working paper by Witold Więcek, which suggests that when you use the appropriate Bayesian modelling, the importance of publication bias has been overstated. I look forward to blogging about it after it’s published. This is a good example of how reasonable people can disagree about the precise causes of the replication crisis, while agreeing that currently replications are under-supplied.
Lauren is currently seeking funding for this idea, for anyone interested in running a pilot programme… if so, make sure to CC sam@progressireland.org!
One of the individuals tasked with identifying suspicious aspects of Lesné’s Alzheimer’s papers was – you guessed it – Elisabeth Bik. As an aside, one hopes that modern AI and computer vision will successfully automate and massively scale detection for scientific fraud and methodological sloppiness. I would welcome emails from readers interested in these issues.
It is not as well known as it should be that America is one of the least trade-dependent major economies.