Big Data Predictions-With just over a week left on the 2019 calendar, it’s now time for predictions. We’ll run several stories featuring the 2020 predictions of industry experts and observers in the field. It all starts today with what is arguably the most critical aspect of the big data question: The data itself.
There’s no denying that Hadoop had a rough year in 2019. But is it completely dead? Haoyuan “HY” Li, the founder and CTO of Alluxio, says that Hadoop storage, in the form of the Hadoop Distributed File System (HDFS) is dead, but Hadoop compute, in the form of Apache Spark, lives strong.
“There is a lot of talk about Hadoop being dead,” Li says. “But the Hadoop ecosystem has rising stars. Compute frameworks like Spark and Presto extract more value from data and have been adopted into the broader compute ecosystem. Hadoop storage (HDFS) is dead because of its complexity and cost and because compute fundamentally cannot scale elastically if it stays tied to HDFS. For real-time insights, users need immediate and elastic compute capacity that’s available in the cloud. Data in HDFS will move to the most optimal and cost-efficient system, be it cloud storage or on-prem object storage. HDFS will die but Hadoop compute will live on and live strong.”
As HDFS data lake deployments slow, Cloudian is ready to swoop in and capture the data into its object store, says Jon Toor, CMO of Cloudian.
“In 2020, we will see a growing number of organizations capitalizing on object storage to create structured/tagged data from unstructured data, allowing metadata to be used to make sense of the tsunami of data generated by AI and ML workloads,” Toor writes.
The end of one thing, like Hadoop, will give rise the beginning of another, according to ThoughtSpot CEO Sudheesh Nair.