The “big data” revolution has transformed more than industry – it is reshaping how academic research is conducted as well. As computational quantitative methods increasingly infuse across nontraditional academic disciplines, they are upending the traditional balance in which academia pioneers new approaches and industry commercializes them. In the big data era, only a handful of universities possess the datasets, computational resources and expertise to make seminal advances and even those institutions typically partner with major companies to gain access to their unique resources. This raises the very real question of how top granting agencies like the US National Science Foundation (NSF) can remain relevant in an era in which their reviewers, drawn largely from academia, are increasingly unable to cope with the exponentially changing field of big data.
As someone who writes about the field of big data I receive at least a few press releases per week from universities touting their latest grant award, publication or website release related to big data or deep learning. I also make it a habit of regularly scanning the awards databases of major funders like NSF to keep tabs on interesting new research and general trends in funding in areas like big data, deep learning and the like.
One trend that has struck me is the growing divide between the work coming out of academia and the advances pouring forth from the commercial and open data sectors each day, the overwhelming majority of which is published outside academic venues, from blogs to social media. The individuals performing that research are rarely affiliated with any academic institution and most have never published in an academic venue.
When I see that an academic research group received a million dollars from a funding agency to search a few million tweets or several million dollars to toss a few hundred gigabytes into ElasticSearch or a few hundred thousand dollars to run a few dozen documents through an off-the-shelf commercial software package that will take about 15 seconds of computational time, I have to stop to ask just how these projects are slipping through a peer review process that, in theory, should have stopped them at the first stage of the review pipeline.
Computational grants display a similarly strange disconnect. When I see a computational grant for hundreds of thousands of hours of computing time on a top supercomputer to run a specific dataset through a specific piece of software and claiming that machine is the only one in the world capable of running the analysis, but where I had actually run that exact dataset through that exact piece of software on my 5-year-old MacBook Air a year before, I have to start wondering how there is such a disconnect between what funding agencies are approving and the actual state of the world outside academia.