Hadoop data and traditional BI may seem worlds apart, but Yellow Pages Ltd. in Canada was able to successfully bridge the two and uncover new business insights in the process.
Three years ago, Montreal-based Yellow Pages Ltd. started expanding its use of Hadoop. The company, which offers a variety of mobile apps and digital marketing services in addition to traditional telephone directories in Canada, was in the process of moving some outsourced analytics applications back in-house, and one of the applications used a Hadoop cluster as its data transformation tool.
But after that application, which calculates and reports on ROI metrics for clients that advertise with Yellow Pages, was replicated internally, Richard Langlois, the company’s director of big data and analytics, noticed that the Hadoop cluster was underutilized. It was basically just being used to stage data for the ROI application, which only took about three hours per day. Langlois wondered if the cluster, based on Cloudera’s Hadoop distribution, could be put to use throughout the rest of the day — essentially, as a Hadoop BI system.
“When we brought back the application, the Hadoop part was simply used to sort our records — it was used as an ETL machine,” he said, referring to extract, transform and load data integration processes. Langlois decided to see if he could tune the cluster to also run more traditional BI applications on the Hadoop platform.
Choose your tools with users in mind
Langlois eventually brought in software vendor AtScale’s namesake technology, which aggregates and manages frequently queried Hadoop datainto a server’s memory. That is designed to make the data quicker to access than traditional Hadoop queries, which typically are optimized more for large-scale operations.
In addition to the ROI calculator application, Yellow Pages is now using the Hadoop and BI setup for things such as giving corporate customers information about how their digital ads rank on user engagement against ones placed by their competitors.
Don’t skip Hadoop BI governance steps
Exposing Hadoop data to a greater number of business users created some governance concerns for Langlois and his team. The idea of using Hadoop as a data lake continues to grow in organizations, as more businesses view Hadoop clusters as a relatively cheap storage option for new types of unstructured and semi-structured data that can fuel expanded BI and big data analytics initiatives. But it would be easy to dump potentially sensitive data into Hadoop without putting proper controls on who can access it and how it can be used.
Sky’s the limit for using Hadoop data
Organizing the data in Hadoop for BI applications has also had the side benefit of staging it for other analytics uses. For example, Langlois and his team recently implemented the Spark processing engine on top of Hadoop for a machine learning application. He said the application learns from experiences with previous Yellow Pages clients how successful marketing campaigns were structured and then takes information about new clients, such as their industry and regional location, to prescribe specific marketing strategies. The analytics team is also looking at ways to use similar machine learning techniques to move beyond selling ad placements to selling customer leads directly to businesses.
These ongoing developments are being driven in part by a strategy of making it easier for product managers and data scientists alike to query data for Hadoop BI uses. In turn, Langlois said the analytics efforts are helping Yellow Pages to maintain its business relevance in the 21st century; in fact, the company got 61% of its 2015 third-quarter revenue from digital offerings. “It’s a totally different Yellow Pages from five years ago,” he said. “This is why analytics is important. It changes our products and improves our processes.”