There’s no question that there are data lake skeptics. The term practically invites sarcastic variants — data swamp, data puddle — and visions of watery doom. And there are more substantive arguments against the validity of the Hadoop data lake architecture.
Gartner is a prominent doubter — the consulting and market research outfit stated its case in a July 2014 report punningly but sharply titled The Data Lake Fallacy: All Water and No Substance. The report pointed to data lake challenges such as culture changes, lack of skills and data governance issues. In an accompanying press release, Gartner analyst Andrew White said that “without at least some semblance of information governance, the lake will end up being a collection of disconnected data pools or information silos all in one place.”
It’s enough to give you a sinking feeling. But not everyone is down on the data lake. In an article published on companion site SearchBusinessAnalytics in Aug. 2014, consultant Wayne Eckerson wrote that analytics architectures of the future could plausibly revolve around Hadoop, which he described as “a scalable, flexible data processing platform that can meet most enterprise data analysis requirements.”
And indeed, there are organizations that are, yes, diving into the data lake by deploying Hadoop clusters as their lead platform for collecting raw data and then processing and analyzing it.
SearchDataManagement and SearchBusinessAnalytics have published a series of stories that highlight the experiences of some of those users. In one, we examine the issues that three companies faced in building and managing data lakes, and how they addressed those challenges. In another, we explore data lake deployments at insurer Allstate and managed services provider Solutionary. A third story looks at Hadoop’s potential business benefits through the eyes of one IT exec and data lake manager.
We also have insight on the data lake concept from various consultants and industry analysts. In a Q&A, Eckerson details hurdles and misconceptions that can hinder data lake development. In other interviews, Mike Gualtieri of Forrester Research discusses what needs to happen to make data lakes more broadly feasible and consultant Joe Caserta says not to ignore traditional IT principles when designing a Hadoop data lake architecture. And Andy Hayler of The Information Difference calls for more examples of viable big data use cases to help keep data lake adoption from bogging down.
A Hadoop data lake project isn’t all rest and relaxation
What the data lake buzz is all about, plus a reality check
The data lake’s key enabler: Hadoop 2 and its YARN resource manager