Big data and analytics are moving into more mature stages of deployment.
This is good news, especially for small to mid-sized companies that are deploying the technology and have been struggling to define an architecture for big data in their companies.
Uncertainty about how to define an overarching architecture for big data and analytics is one of the reasons why mid- and small-sized companies have lagged in their big data and analytics deployments. In many cases, they have chosen to wait on the sidelines to see how trends like hybrid computing, data marts and master databases, and control over security and governance were going to play out.
At last, there seems to be an emerging best practice data architecture that everyone can follow. In this architecture:
Cloud services are being used to store and process big data; and
On-premise computing is being used to develop local data marts throughout companies where companies perform their own analytics.
Let's take a closer look at the reasoning behind this big data and analytics architecture:
The role of cloud
If your company is small or mid-sized, it is cost-prohibitive to start buying clusters of servers that parallel process big data in your data center—not to mention hiring or cross-training the very expensive professionals who know how to optimize, upgrade and maintain a parallel processing environment. Companies opting to process and store their data onsite also have considerable investments into hardware, software and storage. All of this produces economics that point to outsourcing your big data hardware, software, processing and storage to the cloud.
Governance (e.g., security and compliance concerns) is one reason why companies remain reluctant to consign all of their mission-critical data to the cloud, where it is more difficult to oversee the stewardship of this data. Consequently, many companies opt to move data into their own on-premise data centers once the data has been processed in the cloud.
There is also a second reason why many companies opt to go on-premise with their processed data: concern about the proprietary applications and algorithms developed to mine this data, because many cloud providers have a policy that any applications that their customers develop in the cloud may be shared with other customers.
By keeping their apps in-house, and developing an on-premise master dataset that smaller data marts can be splintered from, companies maintain direct control over their data and apps.