Apache Spark analytics framework underpins HANA Vora, the front-end query tool SAP users will need to link HANA technology to data in the Hadoop distributed file system.
The amount of digital data created annually is exploding. To process the massive volume of data generated by enterprise applications, as well as the information flowing in from a variety of external sources, organizations need a broad range of analytical capabilities. Some are now turning to the HANA big data tools offered by SAP.
“Big data is only going to get bigger and richer, as well as originate and flow from an increasing number of sources, both internal and external,” according to a report from Forrester Research titled “Ultra-Fast Data Access Is The Key To Unleashing Full Big Data Potential.”
Enterprises need a “modern data analytics strategy that provides a ubiquitous, real-time data access layer to all relevant data from all different sources,” the report noted.
To meet the needs of these enterprises, SAP is continuing to invest in providing business users with access to advanced analytics tools that use itsHANA in-memory, column-oriented, relational database management system, said Anne Moxie, senior analyst at Boston-based Nucleus Research.
Werner Hopf, CEO of Dolphin Enterprise Solutions Corp., agreed with Moxie’s assessment. Dolphin is an SAP partner based in Morgan Hill, Calif.
“SAP invested a ton of development over the past two or three years to extend HANA capabilities, so it can also be used as the underlying database for transaction processing systems,” he said.
For example, last September, SAP announced HANA Vora, a new in-memory query engine for Hadoop that addresses the challenges companies face as they manage distributed big data, Moxie said.
HANA on its own, however, is not well suited to very large data volumes because it’s not cost-effective to put large amounts of information in memory, said John Appleby, U.S. general manager at London-based global consultancy and SAP partner Bluefin Solutions, a Mindtree company. “We’re pleased that SAP has embraced Hadoop.”
‘First-class citizen’ HANA Vora is main HANA big data tool
HANA Vora, which was made generally available in March, allows companies to analyze data stored in Hadoop, enterprise systems and other distributed data sources, according to SAP. HANA Vora makes use of and extends theApache Spark execution framework to provide enriched interactive analytics on enterprise and Hadoop data, helping companies in various industries glean more insight from their big data.
CenterPoint Energy, an electric and natural-gas utility based in Houston, is one of the first SAP users to implement the HANA big data platform and HANA Vora to bring together its highly distributed enterprise data framework.
Hadoop will enable CenterPoint Energy to cut the information technology costs associated with increasing big-data storage requirements, while HANA Vora will allow for more informed business decisions through analytics, SAP said.
CenterPoint Energy, which delivers power to more than 2.3 million consumers in six states, collects electronic meter data every 15 minutes for energy use reporting — and that means hefty data-storage costs.
In six weeks, SAP and CenterPoint Energy built a testing environment that processed over 5 billion data records with Hadoop, HANA and HANA Vora, according to SAP. After that successful test deployment, CenterPoint Energy opted to implement and standardize on the HANA big data platforms.
“Our initial analysis proved that SAP HANA paired with SAP HANA Vora is the right solution for us moving forward operationally,” said Gary Hayes, CIO and senior vice president of CenterPoint Energy, in a statement.
HANA Vora has a strong ability to handle structured as well as transactional data running in the HANA enterprise computing platform, said Irfan Khan, CTO of SAP’s global customer operations.
“But by deploying Vora on a cluster of machines that are running the Spark foundation, and, at the same time, sitting on top of the storage foundation of, say, Hadoop, we can push a variety of different types of work directly from HANA as a computing platform,” Khan said. The result is “much more of a business-coherent view of what’s going on.”
HANA Vora sits as a “first-class citizen” inside of the Spark foundation, allowing SAP to either push down very specific types of analytical workloads into Spark storage, or bring back contextual information into the transactional core to provide much more meaningful insight to customers, Khan said.
From a big-data analytics perspective, the main challenge with in-memory systems such as HANA is the cost-to-value ratio, said Dolphin’s Hopf. Main memory is expensive, and enterprises quickly reach a data volume where the cost simply outweighs the benefits of the analyses they can do for various scenarios.
That’s why adding Hadoop support was essential in making HANA big data practical, according to Hopf. Incorporating some of the HANA database technology in the HANA Vora analytics front end and having it sit on top of Hadoop and Spark “allows customers to run high-performance analytics on subsets of data that can be stored in fairly massive Hadoop data lakes,” he said.
Hadoop and Spark to play essential role in HANA big data projects
According to Moxie, combining HANA Vora with Hadoop and Spark is a big step in giving businesses full access to all of their data. As the internet of things (IoT) grows, Spark will be effective for the distributive processing and extracting of data sets needed for that sort of work.