Hadoop is NOT “Big Data” is NOT Analytics

I am amazed at the way the words “Hadoop”, “Big Data” and “Analytics” are bandied about in a very haphazard fashion these days. For those desirous of working in the field of Analytics (especially the very young but also some not so young), my earnest entreaty is to understand that these three words mean very different things. Using them interchangeably just demonstrates ignorance rather than expertise.

Perhaps, a bit of history would help to give some perspective. Folks in academia have been solving “big data” problems for a long time using the power of cluster and distributed computing to solve embarrasingly parallel problems. Before the advent of inexpensive “cloud-based” resources, universities and research organizations would build their own very large “super clusters” using either commodity off-the-shelf (COTS) components or if you went back even further, would use large, shared-memory computers (the likes of Silicon Graphics sold these). As research and some large industrial organisations started building “Beowulf” clusters, they started putting together operating system packages that made it easier for people to quickly set up their clusters. Of course people had to write distributed applications on them using specialised languages which could become quite involved.

