By Mark Andrejevic
Pomona College, USA via International Journal of Communication
Image Source: Visualising Information for Advocacy by Tactical Technology Collective, Creative Commons Attribution-ShareAlike 3.0 Unported License.
In the sense of standing for more information than any individual human or group of humans can comprehend, the notion of big data has existed since the dawn of consciousness. The world and its universe are, to anything or anyone with senses, incomprehensibly big data. The contemporary usage is distinct, however, in that it marks the emergence of the prospect of making sense of an incomprehensibly large trove of recorded data—the promise of being able to put it to meaningful use even though no individual or group of individuals can comprehend it.
More prosaically, big data denotes the moment when automated forms of pattern recognition known as data analytics can catch up with automated forms of data collection and storage. Such data analytics are distinct from simple searching and querying of large data sources, a practice with a much longer legacy. Thus, the big data moment and the advent of data-mining techniques go hand in hand. The magnitude of what counts as big data, then, will likely continue to increase to keep pace with both data storage and data processing capacities. IBM, which is investing heavily in data mining and predictive analytics, notes that big data is not just about size but also about the speed of data generation and processing and the heterogeneity of data that can be dumped into combined databases. It describes these dimensions in terms of the three “Vs”: volume, velocity, and variety (IBM, 2012, para. 2).
Big-data mining is omnivorous, in part because it has embarked on the project of discerning unexpected, unanticipated correlations. As IBM puts it, “Big data is any type of data—structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together” (IBM, 2012, para. 9). Data can be collected, sorted, and correlated on a hitherto unprecedented scale that promises to generate useful patterns far beyond the human mind’s ability to detect or even explain.
As data-mining consultant Colleen McCue (2007) puts it, “With data mining we can perform exhaustive searches of very large databases using automated methods, searching well beyond the capacity of human analysts or even a team of analysts” (p. 23). In short, data mining promises to generate patterns of actionable information that outstrip the reach of the unaided human brain. In his book Too Big to Know, David Weinberger (2011) describes this “new knowledge” as requiring “not just giant computers but a network to connect them, to feed them, and to make their work accessible. It exists at the network level, not in the heads of individual human beings” (p. 130).
Such observations trace the emerging contours of a “big data divide” insofar as putting the data to use requires access to and control over costly technological infrastructures, expensive data sets, and the software, processing power, and expertise for analyzing them. If, as Weinberger puts it, in the era of big data “the smartest person in the room is the room,” then much depends on who owns and operates the room. The forms of “knowing” associated with big data mining are available only to those with access to the machines, the databases, and the algorithms. Assuming for the sake of argument that the big data prognosticators (e.g., Mayer-Schönberger & Cukier, 2012) are correct, the era of big data—characterized by the ability to make use of databases too large for any individual or group of individuals to comprehend—ushers in powerful new capabilities for decision making and prediction unavailable to those without access to the databases, storage, and processing power.
In manifold spheres of social practice, then, those with access to databases, processing power, and data-mining expertise will find themselves advantageously positioned compared to those without such access. But the divide at issue is not simply between what boyd and Crawford (2011) describe as database “haves” and “have-nots”; it is also about asymmetric sorting processes and different ways of thinking about how data relate to knowledge and its application.
This article is an excerpt taken from a research paper titled "The Big Data Divide" by Mark Andrejevic at International Journal of Communication 8 (2014), 1673–1689. Copyright © 2014 (Mark Andrejevic). Licensed under the Creative Commons Attribution Non-commercial No Derivatives (by-nc-nd).
Declaration: Cover Image is for representation purpose only and is not a part of the original content.
IBM. (2012). Bringing big data to the enterprise. Retrieved from http://www- 01.ibm.com/software/in/data/bigdata
McCue, C. (2007). Data mining and predictive analysis: Intelligence gathering and crime analysis. New York, NY: Butterworth-Heinemann
Weinberger, D. (2011). Too big to know: Rethinking knowledge now that the facts aren’t the facts, experts are everywhere, and the smartest person in the room is the room. New York, NY: Basic Books.
Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Boston, MA and New York, NY: Eamon Dolan/Houghton Mifflin Harcourt.
boyd, d., & Crawford, K. (2011, September). Six provocations for big data. Presentation at A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, Oxford Internet Institute, Oxford University, Oxford, UK. Available at SSRN http://ssrn.com/abstract=1926431 or http://dx.doi.org/10.2139/ssrn.1926431