I’ve seen a lot of writing lately on Big Data vs. Small Data. I know this is something I should pay attention to, because they are capitalizing words that you usually don’t capitalize! Here are some still-forming thoughts…
Rufus Pollock, Director of the Open Knowledge Foundation, recently wrote on Al Jazeera that:
Size doesn’t matter. What matters is having the data, of whatever size, that helps us solve a problem of addresses the question we have – and for many problems and questions, Small Data is enough
He argues that Small Data is about the enabling potential of the laptop computer, combined with the communicative ability unleashed by the internet. I was sparked by his post, and others, to jot down some of my own thoughts about these newly capitalized things.
How do I Define Big Data?
Big Data is getting loads of press. Supporters are focusing in on the idea that ginormous sets of data reveal hidden patterns and truths otherwise impossible to see. Many critics respond that they are missing inherent biases, ignoring ethical considerations, and remind that the data never holds absolute truths. In any case, data literacy is on people’s minds, and getting funding.
My working definition of what Big Data is focused more on the “how” of it all. For one, most Big Data projects run on implicit, unknown, or purposely full hidden, data collection. Cell phone providers don’t exactly advertise that they are tracking everywhere you go. Another aspect of the “how” of Big Data is that the datasets are large enough that they require computer-assisted analysis. You can’t sit down and draw raw Big Data on a piece of paper on a wall. You have to use tools that perform algorithmic computations on the raw data for you. And what do people use these tools for? They try to describe what is going on, and they try to predict what might happen next.
So What Does Small Data Mean to Me?
Small Data is the new term many are using to argue against Big Data – as such it has a malleable definition based on each person’s goal! For me, Small Data is the thing that community groups have always used to do their work better in a few ways:
- Evaluate: Groups use Small Data to evaluate programs so they can improve them
- Communicate: Groups use Small Data to communicate about their programs and topics with the public and the communities they serve
- Advocate: Groups use Small Data to make evidence-based arguments to those in power
The “how” of Small Data is very different than the ideas I laid out for Big Data. Small Data runs on explicitely collected data – the data is collected in the open, with notice, and on purpose. Small Data can be analyzed by interested layman. Small Data doesn’t depend on technology-assisted analysis, but can engage it as appropriate.
Do my definitions present a useful distinction? I imagine that is what you’re thinking right now. Well, for me the primary difference is around the activities I can do to empower people to play with data. My workshops and projects focus on finding stories, and telling stories, with data. With Small Data, I have techniques for doing both. With Big Data, I don’t have good hands-on activities for understanding how to find stories.
I connect this primarily to the fact that Big Data relies on algorithmic investigations, and I haven’t thought about how to get around that. Algorithms aren’t hands-on. You can do engaging activities to understand how they work, but not to actually do them. In addition – most of the community groups, organizations, and local governments I work with don’t have Big Data problems.
Put those two things together and you’ll see why I don’t focus on Big Data in my work. Philosophically, I want to empower people to use information to make the change they want, and right now that means using Small Data. That’s my current thought, and guides my current focus.