So not surprisingly I was excited to see Data Cuisine coming to Boston for a workshop by Suzanne Jaschko and Moritz Stefaner! Sadly I’m out of town and can’t make the workshop, but it sparked me thinking about food and data, and creative data representation a bit.
He worked with a farmer to breed super colorful red peppers, then fed a mash of them to chickens to create the red yolk you see above! Why? All to start a person about to eat the egg wondering how it got that red. What did the chicken eat to make that happen? Why have I never thought about the supply chain going into this egg before?
To me, Barber’s red pepper egg is a wonderful example of data representation as food. The food chain data in beautifully captured in the red yolk, and it prompts you to ask questions directly aligned with his goals in presenting it. Wonderful!
More abstract representations of data in food are like a missed opportunity to me. The artistic merit can be there, but leaves the viewer hungry for more. A strong mapping between the medium of your data presentation, and the data and story itself, is key to creating a lasting impression.
With my focus on capacity building, I’m trying to find fun ways for NGOs to learn about accuracy and data at a very basic level. Patrick agues that in fact you need rigorous statistical analysis to do this well, from his background in human rights data. I pushed a bit, asking him is there was a 80/20 shortcut. His response was to paint a great distinction between homogenous and heterogenous observability of data. For instance, there are many examples of questions that don’t require quantitative rigor – case existence, case history, etc. This sparked a fun conversation about visual techniques for conveying uncertainty.
Watch the video to see the short conversation, or just catch the audio below.
Data can be used for a variety of things. In thinking about setting up architectures for data use within your organization, you need to focus on two main questions:
Does the data we have align with our goals?
How can we use data to further our mission?
Alignment with Your Goals
People see data everywhere now, and get overly excited about it. When you think about using data within your organization, you have to return to the roots of what your organization is all about and make sure the data is in alignment with that.
There are a few common patterns organizations fall into when using data. First, many collect data simply because it is easy to collect, without considering whether and how it can be used. Second, many tend to focus on quantitative over qualitative data, when in fact the strongest arguments are often made using both. You have to understand what kind of data you have before you can use it effectively:
All these types of data need to align with your goals. You can use data in a wide variety of your efforts, from inspiring more activism to changing behavior. The key piece is your use of data must support those activities.
Using Data to Further Your Mission
Your data is not an end in itself. It is an asset you can use to do your work more effectively.
You can use data in lots of ways to further your mission. Three quick examples:
improve operations: you can monitor engagement on social media campaigns
spread the message: you can use data in your communications materials to advocate for change in new ways
bring people together: you can gather around the data to find stories (and paint murals)
Of course there are loads of other things you can do as well. The key here is that This framing encourages you to be goal-centric, rather than technology-centric (which is a big danger when working with data). You don’t want to get lost in the hype around the latest and greatest tools. That approach does help you advance your mission. A beautiful external-facing infographic that doesn’t fit into your ladder of engagement, or includes no call to action, is useless. A dashboard showing key indicators doesn’t mean much if they aren’t the right key indicators.
I hope this quick intro helps ground some of the hype out there around data use, and help you figure out what architectures to support for data use within your organization.
do restaurants with more waste go out of business at a higher rate?
are certain towns more wasteful than others?
This process of asking questions help you move beyond the data you have, to getting the data you need to answer the questions you have. This question-centric approach is critical to make sure you don’t fall victim to having your dataset in hand be a constraint that stops you from finding an interesting story.
An Example of Getting More Data
So how do you go from these questions to more data? I encourage folks to go “data shopping” (a term I enjoy stealing from my colleagues at the Tactical Technology Collective). This involve taking each of your questions and thinking about what other data you need to answer it, and where you might get that data. Returning to the food waste example above, to answer the question of whether more expensive restaurants waste more food, you need to categorize restaurants as expensive or not. My students remembered that most restaurant review sites, like Yelp, have a dollar-bill scale that tells you how expensive a restaurant is.
How could you get that data? You could do it by hand, but that would take a while for all the restaurants in the food waste spreadsheet. Instead, they pointed out that Yelp has an API, and you could write some software to query that and ask Yelp for the dollar-rating of each restaurant on the list.
Types of Data Sources
This examples uses one source of data – a private company. There are, of course, others. Here’s the list I tend to introduce:
Private Companies – There is tons of data collected and stored by private companies, and sometimes they will give or sell it to you.
Governments – There is loads of official data collected by government agencies, and you have a right to the vast majority of it (depending on where you live).
Non-Profits or Advocacy Groups – Interest groups typically collect datasets to back up and inform the advocacy they are doing.
Crowdsourcing / Do-It-Yourself – Sometimes the data isn’t there, so you need to make it yourself!
That’s the list I use. Am I missing a category?
Ways to Get Data
Fine, so there is data in a lot of places… how do we get it? Here’s my list of techniques:
Download Open Data – Yes, sometimes the data is just out there waiting for you to find and download it. This doesn’t mean it is usable, but it is often there. Usually large non-profits and governments have big data repositories you can poke around. Sometimes it will be stuck in a PDF or HTML table, but you can still get it out.
Ask For It – I mean it. Sometimes you just need to make a phone call and ask. A little social engineering goes a long way!
Scrape It – Far too often the data is out there, but not in a nicely usable form… you need to scrape it from a website. Scraping involves taking taking data is scattered around a website and using a process to get it all in one place in the same format. Nowadays there are lots of tools to help you scrape websites.
Manually Collect It – If the data isn’t there, you gotta make it yourself. This might involve crowd-sourced data collection, a focus group, or asking of social media.
Answering Your Questions
I introduce these two lists, of data sources and ways to get data, in order to support the data shopping process. With a richer set of data in hand, you’re better positioned to find the most interested and meaningful stories in your data.
Can a vegetable tell a story about food access in Somerville? Yep.
In public settings, it can be quite hard to get folks walking by interested in a data-driven argument about your cause. We often argue that a creative data sculpture can grab their attention… like maybe a vegetable laser cut with some data about food security!
Here’s all the veggies we cut – eggplant, cucumber, zucchini, bread, and watermelon:
In addition, we prompted folks to interact with two questions – both of which they could answer with M&Ms and raisins. Asking folks to take an M&M survey is a highly effective way to get them to interact with their data!
Recently I’ve seen a number of new examples of physically-embodied data presentations – examples where each person participates with their body representing the data that they are. Using your body to act as the data in this way is not only fun, but reminds me of the work I used to do with the concept of “body syntonicity” here at the MIT Media Lab’s Lifelong Kindergarten group. Seymour Papert coined this term to describe how children would program and predict a LOGO Turtle’s motion by imagining they were the Turtle (1).
A Corporate Example
The first connection I saw recently was a video ad for Prudential while I listened to Pandora Radio. They are trying to tell a data story about how long people live after retirement, with the goal of getting them to set up a retirement plan with Prudential. The campaign is very appealing from a data-presentation point of view. In one ad they asked people how much money they thought they needed for retirement, then gave each a length of ribbon, and had them walk from the center of a circle to the length of the ribbon:
Another let people put a sticker on a big chart to build a histogram of the oldest person they knew:
These are cool, and look fun. Letting people be the data connects them with the information in a real, body-syntonic way. I’m sure this makes the people more likely to be interested in Prudential’s product offerings and planning services.
An Academic Example
In the academic realm – my colleague Nathan recently went to the Computer Support Collaborative Work conference, where he learned about the MyPosition project from Nina Valkanova, Robert Walker and others. Her recent work revolves around concepts of presenting information in public spaces. Here’s an academic paper describing the MyPosition project. It allows people stand in front of a projected poll and add their vote by holding up their hand:
Their findings in the paper around social pressure are interesting, as is the fact that people got around the fancy tech to actually engage in the question they were polling. Also the idea that people used it more when it showed real people’s faces is interesting. All in all, it presents a fascinating example and some usable insights into how to design these types of public interactive data presentations.
Their pieces are embodied data sculptures – wearable objects that represented the data story they want to tell. This example is fantastic empowerment, data literacy, and art work. I enjoy it in so many ways and look forward to talking with the creators sometime in the future.
Be the Turtle
So what’s the takeaway? As a young participant in a robotics workshop I ran years ago said – “Be the turtle”. Think about ways you can engage people to actively be the data in the story you’re trying to tell.
(1) Papert built on Freud’s notion of “ego syntonicity”, which concerned the mind. This presentation I found online digs into this more in relation to computer programming.
One of the things I emphasize in my workshop is building a toolbox of presentation techniques. With a toolbox ready at hand, it is a lot less intimidating to pick an appropriate technique for a specific audience and goal. I’ve defined my own list of techniques, but it by no means the only way to slice up the space.
One other particularly useful list comes from a classic academic paper called “Narrative Visualization: Telling Stories with Data” by Edward Segel, Jeffrey Heer (download it here). The paper meticulously reviews about 60 online visualization, mostly from newspapers, to define some recurring genres. If you can stomach the academic prose, the paper is worth a read.
Their “genres” focus on 2-d visual presentations of data stories, to be expected based on the title of the paper and the examples they pull from. However, within that space it is a particularly wonderful list:
magazine style: “an image embedded in a page of text”
annotated chart: a traditional chart of graph with textual callouts highlighting specific data points
partitioned poster: a “multiple view visualization”
flow chart: a directed series of pieces of information
comic strip: multiple frames in a linear path
slide show: a series of visuals presented one at a time to assemble a narrative
film/video/animation: fairly self-descriptive
These vary based on the number of “frames” (visuals) presented, and how they are shown over time. This list breaks down the set of techniques differently than I usually do, and that’s a nice thing so I thought I’d share it!
From there they move to a discussion of author- vs. reader-driven approaches. That’s a wonderful reminder to decide early on whether you are building an exploratory or explanatory presentation. Are you trying to tell a strong narrative, or showing information and letting the viewer take away a story?
Many people have written about techniques for telling data driven stories (1). However, I’m struggling to find a similar list of techniques to help people in finding stories in their data. To do that you need to have a sense of what kind of data stories can be told. Here’s my current take at a few categories of data stories that can be told (expanding on earlier thoughts I had written about). I use this list to help community groups find stories in their data that they want to tell. Each includes a real example based on data scraped from the Somerville tree audit (the town I live in). All of these techniques benefit from existing statistical techniques that can be used to back up the pattens they illustrate. You can find stories of factoids, connections, comparisons, changes over time, and personal connections in your data.
There’s only one Eastern Redbud tree in all of Somerville! What’s the story of that tree? Turns out the leaves change to bright pink in fall, but everything else it yellow and orange.
Sometimes in large sets of data you find the most interesting thing is the story of one particular point. This could be an “outlier” (a data point not like the others) like the Redbush example above, or it could be the data point that is most common (can we tap more of the Maple trees that dominate Somerville?). Going in depth on one particular piece of your data can be a type of data story that fascinates and surprises people.
How come Somerville Ave has some many trees in the best condition? Oh, it was recently renovated… that is why those are all new trees. There’s a story about more aesthetic outcomes of big street resurfacing projects.
When two aspects of your data seem related, you can tell a story about their connection. The fancy name for this is “correlation“, and you of course need to be careful attributing causes for the connection. That said, finding a connection between two aspects of your data can lead to a good story that connects things people otherwise don’t think about together.
Walking down Somerville Ave. gives you a good sense of the most populous trees across the city. That street is a good representative of the tree population in the city as a whole. Is your street different?
Comparing between sections of your data can a good way to find an illustrative story to tell. Often one part of your data tells one story, but another part tells a totally different story. Or, as in this example above, maybe there is a more human slice of your data that serves as an exemplar of an overall pattern.
Stories of Change
Turns out there was a big die-off of trees in 2008. Was the climate weird that year? (I made this up since I don’t have any time-based data)
People like thinking about things changing over time. We experience and think about the world based on how we interact with it over time. Telling a story a story about change over time appeals to people’s interest in understanding what caused the change.
You live on Highland Rd? Did you know that ALL 9 Spruce trees in Somerville are on Highland Rd? Maybe we should rename it “Spruce Rd”?
Another way to find a story in data is to think about how it relates to your life. People with map literacy like maps because they can place themselves on it. This personalization of the story creates a connection to the real world meaning of the data and can be a powerful type of story for small audiences. Stories about your personal experiences can be grounding and real.
This is just one take on the type of data stories that can be told. Please let me know how you think about this! Telling that story effectively is a whole different topic, but I find the story finding exercise much easier when I introduce a bunch of categories like this. Most of these benefit from multiple sets of data, so remember to go data “shopping” during your story finding process.
Even if your visual data presentation looks awesome, that doesn’t mean the message is getting across. One reason this happens is that sometimes the numbers don’t mean anything to the audience; they don’t have the number in a context they can relate to. This is one of the powers of map-based presentations: viewers can often place themselves in the map say things like “let me compare my town to the next one over”. That offers a relevant context for the information. So how do you do this with raw numbers?
Recently, I attended the OpenVis Conf event here in Boston. It was a fantastically nerdy collection of smart folks talking about visualization. One of the speakers was Amanda Cox, from the New York Times. One of the ideas she touched on was the concept of the “Kooky Comparison” (check out the video of her talk if you have an hour to spare). She particularly likes graphics that include this comparison of a piece of information to something else in a silly or surprising way. For instance, comparing the cost of printer ink to the cost of blood!
Even better, Glen Chiacchieri built a Chrome browser plugin called The Dictionary of Numbers. It looks at the webpage you are reading, and if it finds quantities in the text it tried to automatically insert a comparison in human terms:
So cool! I’d say this offers relevant, and irrelevant comparisons that set the number in context 🙂
Getting back to the point… if you find your story muddled by questions of scale and context, try a comparison (kooky or not) to make the number relevant and understandable to your audience
There are lots of people excited about fancy-pants computer-generated data pictures right now, but I want to remind you that doing things in the physical world can often be more compelling. Externalizing our ideas into real objects gives us something we can interact with and talk around with other people. Here’s a concrete example.
This photo shows a soda bottle filled up with just the amount of sugar in that drink. This is a bit of a classic public health example; most people are surprised at the amount of sugar in a soda. Representing this physically brings home the idea that when you drink the bottle, you’re consuming that amount of sugar. A bar chart would be far less compelling, and you wouldn’t be able to relate to it. This is a simple example, but the underlying concept is clear.
What You Should Do:
Consider whether your data can be brought off the page (or screen). We live in an interactive, three-dimensional, world so you should be creative about bringing your data presentation into it. Surprising your audience with a novel display can engage them long enough for you to tell the rest of your story.