Big data and data analytics offer the opportunity to gain insight into consumer shopping habits, preferences and sleep patterns. But some companies have trouble figuring out what all the information means. This primer can help ease the process
What to Do With Data- The terms “big data” and “data analytics” have become popular buzzwords in recent years. While the terms conjure up notions of market omnipotence built off the incomprehensible amounts of data generated every instant in the digital age, many who use these terms don’t truly comprehend what they mean, what their capabilities and limitations are, or how to use them.
In this article, we’ll provide a basic introduction to the concept of big data, as well as the more practical concept of data analytics. We’ll also make the argument that big data, while a powerful tool, is not necessary for every data analysis project. We’ll then discuss some of the common pitfalls people encounter in data analytics. Finally, we’ll walk through the basic process of how to structure a successful data analytics process to avoid those pitfalls while getting the most of the available data resources.
What is big data?
Before getting into the best practices of data analytics, it’s important to get a firm understanding of some of the terms. Big data is the type of term that can spawn a different definition depending on whom you ask but a widely accepted definition comes from the Gaithersburg, Maryland-based National Institute of Standards and Technology: “Big data consists of extensive data sets — primarily in the characteristics of volume, variety, velocity and/or variability — that require a scalable architecture for efficient storage, manipulation and analysis.”
The NIST notes that the definition above contains an inherent interplay between the characteristics of the data and the need to be able to process it with sufficient levels of performance (speed) and cost efficiency, i.e., the “architecture” element.
The architecture element is not fundamental to the discussion of developing a process for data analytics, but the massive amounts of data potentially involved in data analytics requires either a massively powerful computer to manage the collection, storage and analysis of those data (vertical scaling) or distribution of the data collection, storage and processing among many integrated individual computers (horizontal scaling).