data-analysis – Data Therapy

data literacy, data sculptures, data-analysis, techniques, workshops

Data + Design + Impact: A Workshop on Data Sculptures and Civic Change

Teaching data storytelling is difficult. The norms, “canonical” readings, and the cutting edge evolve every month! With such an evolving landscape to work within, I relish the opportunities I have to explore something new and different with students in a focused way. This past week I had the great pleasure of doing precisely that at the invitation of the FHNW Institute of Industrial Design as part of their International Design Workshop. The week, which reminded me of MIT’s IAPsession, is an opportunity to bring in outside instructors to work with undergraduate industrial design students in the week that starts their spring semester. Fifteen students and I explored designing and building data sculptures based on exploring civic datasets. I’ve been writing about how I want to push the field of physical data storytelling, and I hope these contributions help. Here’s a run down of the project they made, and what I learned to inform my teaching practice.

My annotated postcard from the top of FNHW’s beautiful campus library.

recorded unrecorded

By Fiona Beer, Leon Rauscher and Stephan Jäger

Refugee crises around the world are heart-wrenching and complex. Measuring and understanding the scale of the movement is difficult. In Europe ongoing flows across the Mediterranean Sea have been amplifying for the past decade. This group decided to take on the challenge of telling a story about this refugee crisis to their fellow students by making a large data sculpture that you walk through.

The piece centers around two mirrors facing each other, with lines and text decorating them, and invites the viewer to walk through. As they state:

This installation tries to show the number we know of, but simultaneously to represent the dark figure of deaths, names, age, origins, and causes of death, we have no idea of.

They acknowledge and embrace that reliable and complete data about migration, refugees, and deaths are impossible to come by.

Echoing the aesthetic of topographic map contour lines, they represent the “known” data on the floor and up the glass; one side for the living, the other for the dead. In addition, the dotted line suggests the data we don’t know, and have little way to estimate – the “dark figure of migrating people and death toll.”

This is all fairly readable from outside the sculpture, but stepping in between the mirrors creates an infinitely repeated landscape. The contour lines extend on forever, echoing the endless journey undertaken by so many migrants. The numbers and lines become hard to read, echoing the truly unknowable nature (a creative approach to the problem of visualizing uncertainty the field struggles with).

I think the core metaphor of the contour lines and the reflections that underlie this piece are resonant and memorable. More importantly, for me they pulled off a visual and physical design that embraces the humanity of the data, rather than treating them as abstract numbers that mask the crisis.

Data Sources: 2018 migratory deaths, UN Refugee Agency Data Portal

Wealth Inequality in Switzerland

By Christina Bocken, David von Rotz and Silvan Häseli

Switzerland is a wealthy (and expensive) country, but there is large income inequality. This group was inspired by a piece from CBS This Morning to explore this inequality in Switzerland. They found, to their surprise, that just 2.5% of the Swiss population holds more than half all of the wealth. They decided to share this story, and what causes it, with a doll-house sculpture and a board game.

The multi-story dollhouse is divided into 3 sections about wealth ownership, each populated according to how much of the population holds that wealth. As the note on the yellow section states – “65% of people share 3.5% of all the wealth in Switzerland”.

To pull readers into why, they created a modified version of the popular snakes-and-ladders board game – called “Who Becomes a Millionaire”. Walk around back of the dollhouse and you’ll find a large playing board, dice, and instructions. Here’s the thing – the game is rigged. The rules change based on your gender, family background, education, and where you live! Playing as a woman nor born in Switzerland? Sorry, you don’t advance very quickly.

Inequality is hard to tell stories about, so I was excited to see this group take this challenge on. I think the house is a lovely invitation to come investigate the piece, and the board game is a creative pull to get the audience to dig in to the data more deeply. Building around the metaphors of the house, and the rules of the game being rigged, were key “anchors” to their design.

Data Sources: Swiss Federal Statistics Office

Medienkonsum 2020 (Media Consumption 2020)

By Eva Bieli, Valéire Huber and Sofia Leurink

Intrigued by our Media Cloud database of news reporting, and the surge of reporting on coronavirus, this group decided to explore the Swiss news landscape in 2020. News sucks us in; we want to read it, feel responsible to read it, but often end up left feeling powerless and upset by it. They found the top 3 themes written about were all depressing things – fears around coronavirus, aftermath from the assassination of Suleiman in Iran , and the tragedy of the Australian fires. This team asked “How consciously do you consume media? How do you digest bad news?”

This piece is built around a visual and explanatory pun – the idea of “consuming media” like one does food. Mapping each topic onto a pie-chart style plate, they laser-cut words into biscuits based on their frequency of use when discussing each of those top three topics. This is a literal representation of the news we consume, and I can tell you that picking up a cookie etched with “coronavirus” certainly does give you pause!

I think this piece is a delightful provocation to think about the news we consume each day. The representation in food is an inviting way into a thoughtful topic, hopefully breaking down barriers the audience might have to reflecting about their own news consumption. The 3 plates are playful, pulling you over to understand why they are different sizes.

Data Sources: Media Cloud (Swiss national and local media source collections)

The True Cost

By Nathan Blain, Rino Schläfli and Daniel Mankel

The true environmental cost of the goods we produce and purchase is almost impossible to comprehend – there are just too many variables. This group rose to this explanatory challenge, creating an immersive multi-sensory piece that assaults your senses when you choose an environmentally unfriendly option.

The piece present you with a choice between two identical-looking shirts, sit down and press the button associated with the one you want. Once your choice is made, three things happen that represent the environmental cost of the object:

The animation and sound of water on a screen increase based on the amount of water used to produce the shirt.
Smoke pours out at you (from a fog machine), obscuring your vision and making you pull back, based on the how many greenhouse gases are emitted in the production and transport of the shirt.
Your other hand, inserted (trustingly) into a dark box, is sprayed with water based on the amount of water used to produce the shirt. Well, it doesn’t actually spray your hand yet… they almost had it working by the opening so hopefully they’ll get it working in another day.

I’m delighted by their idea of creating a multi-channel assault on your senses. We don’t have good approaches to communicating the un-captured costs of these goods. This piece is a provocation to consider the physical impacts of those choices on your own body, rather than in some far-off rain forest or ocean.

Data Sources: TBD (I’m not sure what they used)

The Mask Collection

By Jasmin Schnellmann, Jasmin Vavrecka and Timo Lanz

This group was struck by the public reaction and news reporting about coronavirus. While certainly it is cause for concern, they realized that global air pollution is a far more serious problem right now, in terms of deaths and scope. Trying to find a way to communicate this risk, they struck on the symbolic power of the medical face mask.

They created two objects for the mythical fashion show about “the mask collection”. The first is a small wallet made out of masks and sized based on the deaths due to Corona virus. The second is a large purse made of sewn together masks, sized by deaths due to air pollution. The viewer reading more closely is likely to be surprised by the “price tags” on each, with the small one having a higher one while the bigger one has a lower cost. These price tags show more detailed information about deaths and media coverage – which are inverses of each other. Air pollution has far more deaths but far less coverage (in 2020).

The objects they created speak very strongly to me. They make you look twice to understand how they are made, with strong red thread echoing a sense of importance and dread. It is a simple visual mapping, but presented as a collection of daily use objects they speak to the choices we make, and the risks we face, each day.

Data Sources: Media Cloud, Guardian UK and NYTimes articles, World Health Organization data portal

My Takeaways

I was impressed by the students creativity, fortitude, and craftsmanship. They created these in just one week(!) with no previous experience with data storytelling. I continuously challenged them to think beyond simple tricks like physical bar charts and such, and I think they rose to the challenge.

From this one week I can picture expanding my approach into a semester-long course that explores the aesthetic principles of physical variables of data sculptures. Externalizing these models and stories creates an opportunity to communicate around and about them, and affords me the opportunity to revisit back on my training in the pedagogy of constructionism.

This builds on earlier thoughts I captured in a paper with Catherine D’Ignazio at the Pedagogy and Physicalization workshop at DIS 2017. It also has links to projects from many other academics working on these questions of data storytelling and 3d physical materials:

I look forward to exploring these techniques with students and building my own data sculpture practice more. I hope they inspire, resonate, and challenge you as well! If you happen to be in Basel this Feb or March, these are on display for a few weeks.

data culture, data literacy, data-analysis, DataBasic, presentation

Making Tools More Learner-Friendly

I often advise learners to be careful with what tools they choose to spend time learning. Some powerful ones have steep learning curves, full of jargon and technical hurdles. Others are simple and self-explanatory, but can’t do more than one thing. I’ve been trying to find better ways to connect with tool builders and talk to them about how they need to build learner-centered tools.

Catherine D’Ignazio and I put these thoughts together into a talk for OpenVisConf this year. This is a super-dorky conference for data viz professionals… just the place to find more tool builders to talk to! We put together an argument that data visualization tool as informal learning spaces. Watch the video below:

activities, data literacy, data-analysis, tools

New DataBasic Tool Lets You “Connect the Dots” in Data

Catherine and I have launched a new DataBasic tool and activity, Connect the Dots, aimed at helping students and educators see how their data is connected with a visual network diagram.

By showing the relationships between things, networks are useful for finding answers that aren’t readily apparent through spreadsheet data alone. To that end, we’ve built Connect the Dots to help teach how analyzing the connections between the “dots” in data is a fundamentally different approach to understanding it.

The new tool gives users a network diagram to reveal links as well as a high level report about what the network looks like. Using network analysis helped Google revolutionize search technology and was used by journalists who investigated the connections between people and banks during the Panama Papers Leak.

Connect the Dots is the fourth and most recent addition to DataBasic, a growing suite of easy-to-use web tools designed to make data analysis and storytelling more accessible to a general and non-technical audience launched last year.

As with the previous three tools released in the DataBasic suite, Connect the Dots was designed so that its lessons can be easily planned to help students learn how to use data to tell a story. Connect the Dots comes with a learning guide and introductory video made for classes and workshops for participants from middle school through higher education. The learning guide has a 45-minute activity that walks people through an exercise in naming their favorite local restaurants and seeking patterns in the networks that result. To get started using the tool, sample data sets such as Donald Trump’s inside connections and characters from the play Les Miserables have also been included to help introduce users to vocabulary terms and the algorithms at work behind the scenes. Like the other DataBasic tools, Connect the Dots is available in English, Portuguese, and Spanish.

Learn more about Connect the Dots and all the DataBasic tools here.

Have you used DataBasic tools in your classroom, organization, or personal projects? If so, we’d love to hear your story! Write to help@databasic.io and tell us about your experience.

data-analysis, meta

Thoughts on “Big Data” & “Small Data”

I’ve seen a lot of writing lately on Big Data vs. Small Data. I know this is something I should pay attention to, because they are capitalizing words that you usually don’t capitalize! Here are some still-forming thoughts…

Rufus Pollock, Director of the Open Knowledge Foundation, recently wrote on Al Jazeera that:

Size doesn’t matter. What matters is having the data, of whatever size, that helps us solve a problem of addresses the question we have – and for many problems and questions, Small Data is enough

He argues that Small Data is about the enabling potential of the laptop computer, combined with the communicative ability unleashed by the internet. I was sparked by his post, and others, to jot down some of my own thoughts about these newly capitalized things.

How do I Define Big Data?

Big Data is getting loads of press. Supporters are focusing in on the idea that ginormous sets of data reveal hidden patterns and truths otherwise impossible to see. Many critics respond that they are missing inherent biases, ignoring ethical considerations, and remind that the data never holds absolute truths. In any case, data literacy is on people’s minds, and getting funding.

My working definition of what Big Data is focused more on the “how” of it all. For one, most Big Data projects run on implicit, unknown, or purposely full hidden, data collection. Cell phone providers don’t exactly advertise that they are tracking everywhere you go. Another aspect of the “how” of Big Data is that the datasets are large enough that they require computer-assisted analysis. You can’t sit down and draw raw Big Data on a piece of paper on a wall. You have to use tools that perform algorithmic computations on the raw data for you. And what do people use these tools for? They try to describe what is going on, and they try to predict what might happen next.

So What Does Small Data Mean to Me?

Small Data is the new term many are using to argue against Big Data – as such it has a malleable definition based on each person’s goal! For me, Small Data is the thing that community groups have always used to do their work better in a few ways:

Evaluate: Groups use Small Data to evaluate programs so they can improve them
Communicate: Groups use Small Data to communicate about their programs and topics with the public and the communities they serve
Advocate: Groups use Small Data to make evidence-based arguments to those in power

The “how” of Small Data is very different than the ideas I laid out for Big Data. Small Data runs on explicitely collected data – the data is collected in the open, with notice, and on purpose. Small Data can be analyzed by interested layman. Small Data doesn’t depend on technology-assisted analysis, but can engage it as appropriate.

So What?

Do my definitions present a useful distinction? I imagine that is what you’re thinking right now. Well, for me the primary difference is around the activities I can do to empower people to play with data. My workshops and projects focus on finding stories, and telling stories, with data. With Small Data, I have techniques for doing both. With Big Data, I don’t have good hands-on activities for understanding how to find stories.

I connect this primarily to the fact that Big Data relies on algorithmic investigations, and I haven’t thought about how to get around that. Algorithms aren’t hands-on. You can do engaging activities to understand how they work, but not to actually do them. In addition – most of the community groups, organizations, and local governments I work with don’t have Big Data problems.

Put those two things together and you’ll see why I don’t focus on Big Data in my work. Philosophically, I want to empower people to use information to make the change they want, and right now that means using Small Data. That’s my current thought, and guides my current focus.

data journalism, data-analysis, techniques

Finding Data Stories

Many people have written about techniques for telling data driven stories (1). However, I’m struggling to find a similar list of techniques to help people in finding stories in their data. To do that you need to have a sense of what kind of data stories can be told. Here’s my current take at a few categories of data stories that can be told (expanding on earlier thoughts I had written about). I use this list to help community groups find stories in their data that they want to tell. Each includes a real example based on data scraped from the Somerville tree audit (the town I live in). All of these techniques benefit from existing statistical techniques that can be used to back up the pattens they illustrate. You can find stories of factoids, connections, comparisons, changes over time, and personal connections in your data.

Factoid Stories

There’s only one Eastern Redbud tree in all of Somerville! What’s the story of that tree? Turns out the leaves change to bright pink in fall, but everything else it yellow and orange.

An Eastern Redbush tree (from Wikipedia – not the actual tree in Somerville)

Sometimes in large sets of data you find the most interesting thing is the story of one particular point. This could be an “outlier” (a data point not like the others) like the Redbush example above, or it could be the data point that is most common (can we tap more of the Maple trees that dominate Somerville?). Going in depth on one particular piece of your data can be a type of data story that fascinates and surprises people.

Connection Stories

How come Somerville Ave has some many trees in the best condition? Oh, it was recently renovated… that is why those are all new trees. There’s a story about more aesthetic outcomes of big street resurfacing projects.

a map of somerville with healthy trees in green (created in TableauPublic) — A map of somerville with healthy trees in green (created in TableauPublic)

When two aspects of your data seem related, you can tell a story about their connection. The fancy name for this is “correlation“, and you of course need to be careful attributing causes for the connection. That said, finding a connection between two aspects of your data can lead to a good story that connects things people otherwise don’t think about together.

Comparison Stories

Walking down Somerville Ave. gives you a good sense of the most populous trees across the city. That street is a good representative of the tree population in the city as a whole. Is your street different?

Comparison of tree populations in the city and along one street (large bubbles mean more trees)

Comparing between sections of your data can a good way to find an illustrative story to tell. Often one part of your data tells one story, but another part tells a totally different story. Or, as in this example above, maybe there is a more human slice of your data that serves as an exemplar of an overall pattern.

Stories of Change

Turns out there was a big die-off of trees in 2008. Was the climate weird that year? (I made this up since I don’t have any time-based data)

People like thinking about things changing over time. We experience and think about the world based on how we interact with it over time. Telling a story a story about change over time appeals to people’s interest in understanding what caused the change.

“You” Stories

You live on Highland Rd? Did you know that ALL 9 Spruce trees in Somerville are on Highland Rd? Maybe we should rename it “Spruce Rd”?

Map of spruce trees on Highland Rd, colored by tree health (created in TableauPublic)

Another way to find a story in data is to think about how it relates to your life. People with map literacy like maps because they can place themselves on it. This personalization of the story creates a connection to the real world meaning of the data and can be a powerful type of story for small audiences. Stories about your personal experiences can be grounding and real.

In Conclusion…

This is just one take on the type of data stories that can be told. Please let me know how you think about this! Telling that story effectively is a whole different topic, but I find the story finding exercise much easier when I introduce a bunch of categories like this. Most of these benefit from multiple sets of data, so remember to go data “shopping” during your story finding process.

Footnotes:

(1) For instance, I’m a huge fan of Seger and Heer’s Narrative Visualization paper, where they give a catalog of visual storytelling techniques. Also good is Marije Rooze’s thesis work (particularly the tagged gallery of visualizations from the Guardian and New York Times).

data journalism, data-analysis, tools

Tools for Data Scraping and Visualization

Over the last few weeks I co-taught a short-course on data scraping and data presentation for. It was a pleasure to get a chance to teach with Ethan Zuckerman (my boss) and interact with the creative group of students! You can peruse the syllabus outline if you like.

In my Data Therapy work I don’t usually introduce tools, because there are loads of YouTube tutorials and written tutorials. However, while co-teaching a short-course for incoming students in the Comparative Media Studies program here at MIT, I led two short “lab” sessions on tools for data scraping, interrogation, and visualization.

There are a myriad of tools that support these efforts, so I was forced to pick just a handle to introduce to these students. I wanted to share the short lists of tools I choose to share.

Data Scraping:

As much as possible, avoid writing code! Many of these tools can help you avoid writing software to do the scraping. There are constantly new tools being built, but I recommend these:

Copy/Paste: Never forget the awesome power of copy/paste! There are many times when an hour of copying and pasting will be faster than learning any sort of new tool!
Import.io: Still nascent, but this is a radical re-thinking of how you scrape. Point and click to train their scraper. It’s very early, and buggy, but on many simple webpages it works well!
Regular Expressions: Install a text editor like Sublime Text and you get the power of regular expressions (which I call “Super Find and Replace”). It lets you define a pattern and find it in any large document. Sure the pattern definition is cryptic, but learning it is totally worth it (here’s an online playground).
Jquery in the browser: Install the bookmarklet, and you can add the JQuery javascript library to any webpage you are viewing. From there you can use a basic understanding of javascript and the Javascript console (in most browsers) to pull parts of a webpage into an array.
ScraperWiki: There are a few things this makes really easy – getting recent tweets, getting twitter followers, and a few others. Otherwise this is a good engine for software coding.
Software Development: If you are a coder, and the website you need to scrape has javascript and logins and such, then you might need to go this route (ugh). If so, here’s a functioning example of a scraper built in Python (with Beautiful Soup and Mechanize). I would use Watir if you want to do this in Ruby.

Data Interrogation and Visualization:

There are even more tools that help you here. I picked a handful of single-purpose tools, and some generic ones to share.

Tabula: There are few PDF-cleaning tools, but this one has worked particularly well for me. If your data is in a PDF, and selectable, then I recommend this! (disclosure: the Knight Foundation funds much of my paycheck, and contributed to Tabula’s development as well)
OpenRefine: This data cleaning tool lets you do things like cluster rows in your data that are spelled similarly, look for correlations at a high level, and more! The School of Data has written well about this – read their OpenRefine handbook.
Wordle: As maligned as word clouds have been, I still believe in their role as a proxy for deep text analysis. They give a nice visual representation of how frequently words appear in quotes, writing, etc.
Quartz ChartBuilder: If you need to make clean and simple charts, this is the tool for you. Much nicer than the output of Excel.
TimelineJS: Need an online timeline? This is an awesome tool. Disclosure: another Knight-funded project.
Google Fusion Tables: This tool has empowered loads of folks to create maps online. I’m not a big user, but lots of folks recommend it to me.
TileMill: Google maps isn’t the only way to make a map. TileMill lets you create beautiful interactive maps that fit your needs. Disclosure: another Knight-funded project.
Tableau Public: Tableau is a much nicer way to explore your data than Excel pivot tables. You can drag and drop columns onto a grid and it suggests visualizations that might be revealing in your attempts to find stories.

I hope those are helpful in your data scraping and story-finding adventures!

Curious for More Tools?

Keep your eye on the School of Data and Tactical Technology Collective.

data-analysis, data-mural, Uncategorized, workshops

Helping a Community Find Stories in Their Data

My Data Mural work has led me into a new area – actually helping community groups find the stories they want to tell in their raw data. Until now, all my data therapy work has focused on how to present the data-driven stories more creatively. This post shares some of the techniques I’m trying out.

Step 1: Speak like a normal person

I know, it should be obvious, but too often when entering the realm of data-anything, we fall back into using big words. That doesn’t fly when working with community groups that don’t have a shared meaning for those words. I tried to figure out how to use regular words to talk about the types of stories that you can look for. I came up with this set to start with:

comparison: you see two pieces of data that are really interesting when compared to each other
factoid: you see one fact that jumps out at you as particularly interesting or startling
connection: you see a connection between two pieces of info – you can’t say one causes another, but they’re interesting when put together
personal: you have a compelling story or picture that is about one person
change: you see one of your measures changing over time

I used regular words to describe the types of data stories in order to make the activity less intimidating to non-data people. Many people nodded their heads as I described these categories (especially at the second workshop where I spoke about them better!). I was inspired by the Data Stories section of the Data Journalism Handbook.

Step 2: Try it out together first

To come up with a shared definition of what these types of stories meant, I showed a few data points from an amusing data set – the Somerville “Happiness Survey” (raw data).

We quickly tried to find stories of each type in this tiny data set. Practicing all together on a tiny dataset can create a shared language for finding stories in data. In the breakouts that followed this activity, I could hear people using some of these words with each other to talk about the data they were looking at.

Step 3: Use less data

Usually data analysis starts with a giant set of documents. This model doesn’t really work for a small community group made up of people that aren’t data nerds. For our “story-finding” workshops we culled down the full data they gave us, producing a 4-page data handout for people. Limiting the data helped the community group not be overwhelmed by the task of finding a story they wanted to tell. We definitely made some “editorial” decisions that limited the stories they could find, but we did this with the help of a smaller group of our community partners so it wasn’t arbitrary.

So how did it go?

We scaffolded the story-finding around the idea of telling a story in our “The data say____” format. This gave us a common way to talk about the stories with each other. Just as importantly, this forced each person to justify why they thought it was a compelling story to tell in mural form.

So did we build the group’s capacity for data analysis? Our pre-post survey did NOT show a noticeable increase in people’s self-assessed ease of finding stories in data. Damn. But wait… the answer is probably more nuanced than that. They did say they came away with more knowledge about the topic the data was about. They also said one of the most interesting things they learned was “telling data stories”, and in each of these two pilots they came out with a data-driven story that they wanted to tell.

Is exposure to data story-finding a sufficient outcome? Am I trying to do too much capacity building all at once? I’m still pondering how to do this better, so please suggest any tips!

Curious about these pilots? You can read some more on my collaborator Emily’s Connection Lab blog:

Cross-posted to the MIT Center for Civic Media blog.