Big data has been a big buzzword in technological circles for the last decade or so, but many people have a limited awareness of what it actually means and how it can be used. Clearly, intuitively, big data means a lot of data. And it does. Around the turn-of-the-century the VVV definition of big data came into being, and it still holds good today: data of greater volume, velocity, and variety. Greater than what? Effectively greater than anything that can be handled by conventional data processing systems. This creates enormous challenges in terms of data storage and analysis, but also, commercially, enormous opportunities to target what consumers really want more accurately than ever before.
Big data can come from virtually any source: traditional sources, such as surveys and focus groups, newer sources, e.g. through harvesting from Facebook or Twitter, and even nonhuman sources, e.g. data gathered from the Internet of Things, the Internet linked technology devices that feed back information regarding their usage. This has led to an absolute explosion in the amount of data available, but also to problems of quality control. This has led workers in the field to add another two Vs to those mentioned above: value and veracity. Data scientists spend around 80% of their time cleaning up data sets in order to get the truest, most valuable information from them. A simple example might be interrogating data to discover consumer satisfaction with a specific product: when taking data from a social media source, clearly that data will have more value if it is gathered from specific groups of those who definitely use or are most likely to use the product than from the source in its entirety, e.g., if you are gathering data on a new mobile phone, you would want to separate data from twentysomethings who are very keen on such products from data from senior citizens who are more likely to be complaining about the concept of mobile phones as a whole.
The biggest challenge with big data is, as the name implies, that it’s big. Between 2012 and 2020, it is estimated that the available amount of big data rose from 4.4 zettabytes (ZB) to 44 ZB, and forecasters predict a further rise to around 165 ZB by 2025 (a zettabyte is 1 trillion gigabytes, so we are talking truly mind-blowing quantities). This not only creates the need for new and innovative ways of sorting data for value and veracity, it also creates an astonishing demand for data storage. Cloud services have been particularly valuable in this area, in fact without such services big data storage and analysis would be next to impossible.
So, what is the advantage of big data? Put simply, the more input you have, the more accurate the answers you will get, always provided that you have properly cleaned the data set to start with. Statisticians are familiar with the jellybean jar scenario: if you ask one person to guess the number of jellybeans in a jar, it’s highly unlikely that you’ll get the right answer. If you ask one thousand people to guess the number and take an average of their answers, it’s likely to be pretty close. That’s what big data does, just on a larger scale – it’s like being able to ask a billion people for their answer. For any company that relies on large scale consumer information for success, big data is set to become the number one essential tool of the new technological age.