Microsoft, Databricks, and Intel were among those lining up in support of Spark at this week’s Spark Summit. We’ve got the full story in our Big Data Roundup for the week ending June 12, 2016.
It’s Apache Spark time in our Big Data Roundup for the week ending June 12. At theSpark Summit West 2016, vendors big and small made announcements supporting the real-time big data analytics platform. Microsoft is getting behind Spark with several of its products. Distribution company Databricks revealed general availability of its Community Edition. And Intel declared Spark to be at the center of the big data revolution.
Let’s start with Microsoft. This week the company announced general availability of Apache Spark 1.6.1 for Azure HDInsight, and Power BI support for Spark Streaming. Azure HDInsight is Microsoft’s answer to Hadoop in the Azure cloud. Based on Hortonworks Data Platform Hadoop distribution, the service deploys and provisions managed Apache Hadoop clusters in the Azure cloud, providing a framework designed to process, analyze, and report on big data.
Now, the company is adding Spark for HDInsight, and Microsoft says it’s a popular service, being adopted in 50% of all new HDInsight clusters deployed.
“With GA, we are revealing improvements we’ve made to the service to make Spark hardened for the enterprise and easy for your users,” wrote Oliver Chiu, a senior product marketing manager for big data and data warehousing at Microsoft, in a blog post. “This includes improvements to the availability, scalability, and productivity of our managed Spark service.”
Microsoft also said it worked with Hortonworks to add capabilities to the YARN resource manager. In addition, Redmond co-led Project Livy with Cloudera and other organizations to create an open source Apache licensed REST web service for managing long-running Spark contexts and submitting Spark jobs.
Microsoft said it will offer an integration between Spark and the Azure Data Lake Store to enable Spark to store and process data of any size. Microsoft plans to enable role-based data access at the storage level through integration of Spark and the Data Lake Store.
And, for data scientists specifically, Microsoft introdcued out-of-the-box integration with Jupyter data science notebooks.
Microsoft had something for business intelligence professionals and analysts as well. The company will offer integration with Power BI and other BI tools such as Tableau, SAP Lumira, and QlikView.
“This lets you build interactive visualizations over data of any size,” Chui wrote. “In addition to the traditional dashboards, Power BI offers a streaming connector that has integration with Spark allowing you to publish real-time events from Spark Streaming directly to Power BI.”
Databricks is the chief commercial distribution company behind Apache Spark and this week the company announced general availability of its Data Bricks Community Edition, a free version of its just-in-time data platform built on top of Apache Spark.
For Full Story, Please click here.