Big Data Apps- Many organizations have shifted to a cloud-first mentality for deploying their big data applications. But without expending effort to optimize or tune these cloud apps, customers will waste billions of dollars’ worth of computing resources, according to a new report.
Pepperdata today released a report detailing how efficiently its customers’ cloud clusters are running (or rather, how inefficiently they were running before Pepperdata applied its machine learning-based optimization software to them).
With more than 4.5 million customer apps running on 5,000 nodes storing 400PB of data in the cloud, the company has a unique view the state of big data cloud performance, and the results are not pretty.
“Our report reveals that, within enterprise workloads that are not optimized by solutions that allow for observability and continuous tuning, there exists enormous waste–and enormous potential to optimize workloads and cut that waste,” the company says in the report, titled “Pepperdata 2020 Big Data Performance Report.”
Take Apache Spark applications, for example. Spark has largely usurped Hadoop as the big data platform of choice, whether on prem or the cloud. However, while Spark is head-and-shoulders better than Hadoop in multiple ways, it still suffers performance bugaboos, principally because it is so notoriously difficult to tune.
Among its customers’ Spark applications, Pepperdata found that the median rate of maximum memory utilization was just 42% percent. That means the Spark applications were failing to utilize all the memory that customers had allocated to the Spark environment in nearly six out of 10 cases.
The average wastage across 40 large clusters was 60%, the company found. The vast majority of the wastage occurs in just 5% to 10% of the jobs, Pepperdata concluded. This is why application optimization is “inherently such a needle-in-a-haystack challenge,” the company commented.
As applications spin up on the cloud, there is a lot of money on the line–and a lot of waste to prevent or recoup. According to a Gartner forecast from November 2019, the worldwide public cloud services market is forecast to grow 17% in 2020 to total $266.4 billion, up from $227.8 billion in 2019.