How to Keep “Big Data” From Turning into “Bad Data” Resulting in eDiscovery and information Governance Risksby Dean Gonsowski on October 10th, 2012
In a recent Inside Counsel article, I explored the tension between big data and the potentially competing notion of information governance by looking at the 5 Vs of Big Data…
“The Five Vs” of Big Data
1. Volume: Volume, not surprisingly, is the hallmark of the big data concept. Since data creation doubles every 18 months, we’ve rapidly moved from a gigabyte world to a universe where terabytes and exabytes rule the day. In fact, according to a 2011 report from the McKinsey Global Institute, numerous U.S. companies now have more data stored than the U.S. Library of Congress, which has more than 285 terabytes of data (as of early this year). And to complicate matters, this trend is escalating exponentially with no reasonable expectation of abating.
2. Velocity: According to the analysts firm Gartner, velocity can be thought of in terms of “streams of data, structured record creation, and availability for access and delivery.” In practical terms, this means organizations are having to constantly address a torrential flow of data into/out of their information management systems. Take Twitter, for example, where it’s possible to see more than 400 million tweets per day. As with the first V, data velocity isn’t slowing down anytime either.
3. Variety: Perhaps more vexing than both the volume and velocity issues, the Variety element of big data increases complexity exponentially as organizations must account for data sources/types that are moving in different vectors. Just to name a few variants, most organizations routinely must wrestle with structured data (databases), unstructured data (loose files/documents), email, video, static images, audio files, transactional data, social media, cloud content and more.
4. Value: A more novel big data concept, value hasn’t typically been part of the typical definition. Here, the critical inquiry is whether the retained information is valuable either individually or in combination with other data elements, which are capable of rendering patterns and insights. Given the rampant existence of spam, non-business data (like fantasy football emails) and duplicative content, it’s easy to see that just because data may have the other 3 Vs, it isn’t inherently valuable from a big data perspective.
5. Veracity: Particularly in an information governance era, it’s vital that the big data elements have the requisite level of veracity (or integrity). In other words, specific controls must be put in place to ensure that the integrity of the data is not impugned. Otherwise, any subsequent usage (particularly for a legal or regulatory proceeding, like e-discovery) may be unnecessarily compromised.”
“Many organizations sadly aren’t cognizant of the lurking tensions associated with the rapid acceleration of big data initiatives and other competing corporate concerns around important constructs like information governance. Latent information risk is a byproduct of keeping too much data and the resulting exposure due to e-discovery costs/sanctions, potential security breaches and regulatory investigations. As evidence of this potential information liability, it costs only $.20 a day to manage 1GB of storage. Yet, according to a recent Rand survey, it costs $18,000 to review that same gigabyte of storage for e-discovery purposes.”
For more on this topic, click here.