data literacy – Data Therapy

data literacy, data sculptures, data-analysis, techniques, workshops

Data + Design + Impact: A Workshop on Data Sculptures and Civic Change

Teaching data storytelling is difficult. The norms, “canonical” readings, and the cutting edge evolve every month! With such an evolving landscape to work within, I relish the opportunities I have to explore something new and different with students in a focused way. This past week I had the great pleasure of doing precisely that at the invitation of the FHNW Institute of Industrial Design as part of their International Design Workshop. The week, which reminded me of MIT’s IAPsession, is an opportunity to bring in outside instructors to work with undergraduate industrial design students in the week that starts their spring semester. Fifteen students and I explored designing and building data sculptures based on exploring civic datasets. I’ve been writing about how I want to push the field of physical data storytelling, and I hope these contributions help. Here’s a run down of the project they made, and what I learned to inform my teaching practice.

My annotated postcard from the top of FNHW’s beautiful campus library.

recorded unrecorded

By Fiona Beer, Leon Rauscher and Stephan Jäger

Refugee crises around the world are heart-wrenching and complex. Measuring and understanding the scale of the movement is difficult. In Europe ongoing flows across the Mediterranean Sea have been amplifying for the past decade. This group decided to take on the challenge of telling a story about this refugee crisis to their fellow students by making a large data sculpture that you walk through.

The piece centers around two mirrors facing each other, with lines and text decorating them, and invites the viewer to walk through. As they state:

This installation tries to show the number we know of, but simultaneously to represent the dark figure of deaths, names, age, origins, and causes of death, we have no idea of.

They acknowledge and embrace that reliable and complete data about migration, refugees, and deaths are impossible to come by.

Echoing the aesthetic of topographic map contour lines, they represent the “known” data on the floor and up the glass; one side for the living, the other for the dead. In addition, the dotted line suggests the data we don’t know, and have little way to estimate – the “dark figure of migrating people and death toll.”

This is all fairly readable from outside the sculpture, but stepping in between the mirrors creates an infinitely repeated landscape. The contour lines extend on forever, echoing the endless journey undertaken by so many migrants. The numbers and lines become hard to read, echoing the truly unknowable nature (a creative approach to the problem of visualizing uncertainty the field struggles with).

I think the core metaphor of the contour lines and the reflections that underlie this piece are resonant and memorable. More importantly, for me they pulled off a visual and physical design that embraces the humanity of the data, rather than treating them as abstract numbers that mask the crisis.

Data Sources: 2018 migratory deaths, UN Refugee Agency Data Portal

Wealth Inequality in Switzerland

By Christina Bocken, David von Rotz and Silvan Häseli

Switzerland is a wealthy (and expensive) country, but there is large income inequality. This group was inspired by a piece from CBS This Morning to explore this inequality in Switzerland. They found, to their surprise, that just 2.5% of the Swiss population holds more than half all of the wealth. They decided to share this story, and what causes it, with a doll-house sculpture and a board game.

The multi-story dollhouse is divided into 3 sections about wealth ownership, each populated according to how much of the population holds that wealth. As the note on the yellow section states – “65% of people share 3.5% of all the wealth in Switzerland”.

To pull readers into why, they created a modified version of the popular snakes-and-ladders board game – called “Who Becomes a Millionaire”. Walk around back of the dollhouse and you’ll find a large playing board, dice, and instructions. Here’s the thing – the game is rigged. The rules change based on your gender, family background, education, and where you live! Playing as a woman nor born in Switzerland? Sorry, you don’t advance very quickly.

Inequality is hard to tell stories about, so I was excited to see this group take this challenge on. I think the house is a lovely invitation to come investigate the piece, and the board game is a creative pull to get the audience to dig in to the data more deeply. Building around the metaphors of the house, and the rules of the game being rigged, were key “anchors” to their design.

Data Sources: Swiss Federal Statistics Office

Medienkonsum 2020 (Media Consumption 2020)

By Eva Bieli, Valéire Huber and Sofia Leurink

Intrigued by our Media Cloud database of news reporting, and the surge of reporting on coronavirus, this group decided to explore the Swiss news landscape in 2020. News sucks us in; we want to read it, feel responsible to read it, but often end up left feeling powerless and upset by it. They found the top 3 themes written about were all depressing things – fears around coronavirus, aftermath from the assassination of Suleiman in Iran , and the tragedy of the Australian fires. This team asked “How consciously do you consume media? How do you digest bad news?”

This piece is built around a visual and explanatory pun – the idea of “consuming media” like one does food. Mapping each topic onto a pie-chart style plate, they laser-cut words into biscuits based on their frequency of use when discussing each of those top three topics. This is a literal representation of the news we consume, and I can tell you that picking up a cookie etched with “coronavirus” certainly does give you pause!

I think this piece is a delightful provocation to think about the news we consume each day. The representation in food is an inviting way into a thoughtful topic, hopefully breaking down barriers the audience might have to reflecting about their own news consumption. The 3 plates are playful, pulling you over to understand why they are different sizes.

Data Sources: Media Cloud (Swiss national and local media source collections)

The True Cost

By Nathan Blain, Rino Schläfli and Daniel Mankel

The true environmental cost of the goods we produce and purchase is almost impossible to comprehend – there are just too many variables. This group rose to this explanatory challenge, creating an immersive multi-sensory piece that assaults your senses when you choose an environmentally unfriendly option.

The piece present you with a choice between two identical-looking shirts, sit down and press the button associated with the one you want. Once your choice is made, three things happen that represent the environmental cost of the object:

The animation and sound of water on a screen increase based on the amount of water used to produce the shirt.
Smoke pours out at you (from a fog machine), obscuring your vision and making you pull back, based on the how many greenhouse gases are emitted in the production and transport of the shirt.
Your other hand, inserted (trustingly) into a dark box, is sprayed with water based on the amount of water used to produce the shirt. Well, it doesn’t actually spray your hand yet… they almost had it working by the opening so hopefully they’ll get it working in another day.

I’m delighted by their idea of creating a multi-channel assault on your senses. We don’t have good approaches to communicating the un-captured costs of these goods. This piece is a provocation to consider the physical impacts of those choices on your own body, rather than in some far-off rain forest or ocean.

Data Sources: TBD (I’m not sure what they used)

The Mask Collection

By Jasmin Schnellmann, Jasmin Vavrecka and Timo Lanz

This group was struck by the public reaction and news reporting about coronavirus. While certainly it is cause for concern, they realized that global air pollution is a far more serious problem right now, in terms of deaths and scope. Trying to find a way to communicate this risk, they struck on the symbolic power of the medical face mask.

They created two objects for the mythical fashion show about “the mask collection”. The first is a small wallet made out of masks and sized based on the deaths due to Corona virus. The second is a large purse made of sewn together masks, sized by deaths due to air pollution. The viewer reading more closely is likely to be surprised by the “price tags” on each, with the small one having a higher one while the bigger one has a lower cost. These price tags show more detailed information about deaths and media coverage – which are inverses of each other. Air pollution has far more deaths but far less coverage (in 2020).

The objects they created speak very strongly to me. They make you look twice to understand how they are made, with strong red thread echoing a sense of importance and dread. It is a simple visual mapping, but presented as a collection of daily use objects they speak to the choices we make, and the risks we face, each day.

Data Sources: Media Cloud, Guardian UK and NYTimes articles, World Health Organization data portal

My Takeaways

I was impressed by the students creativity, fortitude, and craftsmanship. They created these in just one week(!) with no previous experience with data storytelling. I continuously challenged them to think beyond simple tricks like physical bar charts and such, and I think they rose to the challenge.

From this one week I can picture expanding my approach into a semester-long course that explores the aesthetic principles of physical variables of data sculptures. Externalizing these models and stories creates an opportunity to communicate around and about them, and affords me the opportunity to revisit back on my training in the pedagogy of constructionism.

This builds on earlier thoughts I captured in a paper with Catherine D’Ignazio at the Pedagogy and Physicalization workshop at DIS 2017. It also has links to projects from many other academics working on these questions of data storytelling and 3d physical materials:

I look forward to exploring these techniques with students and building my own data sculpture practice more. I hope they inspire, resonate, and challenge you as well! If you happen to be in Basel this Feb or March, these are on display for a few weeks.

data culture, data literacy, ethics

Aligning Your Data Methods and Your Mission

This blog post is based on a keynote I gave recently at the 2019 SSIR Data on Purpose event.

Are you optimistic or pessimistic about data as a tool for good in the world? Over the last few years I’ve seen the shift in answers to this questions. People used to answer “optimistic”, but now most people indicate some mix of emotions. You’ve probably seen the Gartner hype cycle with its suggestion that a technology receives inflated expectations and then is overused to the point of disillusionment. I’d argue that the over-hyping and the disappointment happen at the same time… that time is now for social good organizations that are trying to use data to further their missions.

The response I’ve been crafting focuses on acknowledging the damaging history of the data practices we routinely employ, taking a step back from hype-driven roll-out of data programs, and working to align your data and your mission. The Data Culture Project, my collaboration with Catherine D’Ignazio, works with organizations large and small across the world to help make that happen.

One of the core problems in creating a data culture that aligns with your mission is the history we are fighting against. Data has been a tool for those in power to consolidate that power for centuries. For organizations working in the social good sector, this shouldfeel problematic! If you’re deploying some tool or process, you need to be wary of any pieces that reinforce that history. They can cultivate the opposite of the empowerment, engagement, and ownership goals that are probably at the heart of your mission.

Warning: In this post I’m going to depress you by reviewing some of that history. Don’t worry, I’ll close with some inspirations so it isn’t all doom and gloom. Make sure not to stop reading in the middle, otherwise you might walk away feeling pretty bad about the world!

A Depressing History of Data

I won’t surprise anyone by talking about human history as full of stories of those in power seeking to oppress others. However, I want to highlight a few of those instances that were data-driven. Going back as far as the ancient Egyptians, we can find evidence that they tabulated and tracked populations to determine how much labor they would use to construct their giant monuments to themselves (read more at the UK Office for National Statistics). They created census data to drive massive construction projects in their own likeness.

lots of egyptian laborers pulling a large sculpture — Source – A Popular Account of the Ancient Egyptians, Sir John Gardner Wilkinson (1854)

Fast forward to Britain in the 1700s and you find the horrors of the slave trade via cross-atlantic shipping; all cataloged in stunning details by the shopping industry and the strict regulations they were under. Many of these massive data records are still available today. This economic data numerically catalogues the human suffering of tens of millions at the hands of those in power across the western world.

Next consider another of the darkest times in recent history – the Nazi regime. Their massive atrocities were informed and guided by the tabulations of their census, driven by IBM’s custom manufactured counting machines (read more on Wikipedia). This is a history IBM would like us to forget with their new Watson AI initiatives, but Watson was in fact the one that oversaw all this work as CEO at the time.

A few decades later we find another example in South America, recipient of massive investment and development packages from large multi-national UN-style agencies. All this drove numbers that showed enormous economic growth, while in fact huge populations were suffering. As famed author Eduardo Galeano writes – “the more watched and desperate the people, the more the statistics smiled and laughed”.

As for our current times? You can barely throw a stone without hitting another story of a large centralized technology company using some data in morally questionable ways. Amazon pitching facial recognition to ICE here in the US to keep migrants and asylum seekers out (The Verge), Facebook building massive datasets about non-users to improve their invasive advertising targeting (Reuters), China creating a “social credit score” to control and guide citizen norms (Bloomberg) – the dystopia is here. We are all quantized without any meaningful consent in the massive databases of the big tech corporations.

I trust you’re on board now with the idea that data has a dark history full of examples like these ones that I’ve quickly touched on. Want more? Read Algorithms of Oppression, Weapons of Math Destruction, Automating Inequality, or one of the other recent books about this.

A Disempowering Process

I know, that was pretty depressing. Usually I don’t pull in forced labor, slavery, and the Nazi holocaust into one blog post. Sorry. This thing is, this is the past and present that your data process is living within. We can’t ignore it. You have to work hard to make sure you’re not part of it. To fight this history, first we have to understand the patterns that drove it.

All of these examples showcase a problematic pattern of data use that we can characterize in four ways:

Opaque processes– the subjects of the data aren’t given any insight into what happens to the data about them
Extractive collection– the data is pulled from the subjects and their community by those outside of it
High technological complexity– the mechanisms used to analyze the data, digital and non-digital, have a steep learning curve
Control of impact– the people the data is about have no say in the impacts of the decisions being made with the data

From my point of view, these are process problems(read more in my paper on with Catherine D’Ignazio). Ok, in some of the more egregious examples above these wouldn’t be described as “problems”, because clearly their goals were to actively oppress and kill the subjects of the data. However, that isn’t the goal of most data endeavors!

The thing is, even many well-meaning, pro-social data efforts use this problematic process. Consider the history of public health and epidemiology to start. In 1663 you have John Graunt carrying out the first recoded experiments in statical data analysis; the ancestor of epidemiology (learn more on Wikipedia). By recoding information about mortality, he theorized that he could design an early warning system for the bubonic plague ravaging Europe. Definitely working for the social good, but in a position of power with no engagement with the effected populations. Extractive dart collection, a complicated statistical process, and no control of impact for the population in mind.

Or how about the famed maps of John Snow, used to discover the origins of cholera in the early 1800s (learn more on Wikipedia). A noble, impactful, and meaningful example of data usage for sure – literally saving lives. The same process criticisms hold – a person of privilege mapping data about “the poor” to discover something without any role of the people that were the data themselves.

When we quickly read these two stories, they sound like amazing historical examples of using data for good! However, when you examine them more deeply, you find the same four criticisms weighed above. Their data methods didn’t match their mission.

Some Inspirations

Knowing this history, how do you make sure you’re not doomed to repeat it? So how do you avoid these pitfalls? You build a data culture within your organization that can do better. You empower staff up and down your org chart to identify problems data can help solve and them support them solving the those problems. You open up your process, you bring people together, you help them make decisions with data. You don’t need data scientists to to this, you need a data culture. This is what our Data Culture Project is all about. Here are some examples to help explain what I mean.

Two images from the “The Exhibit of American Negroes” exhibit created by W.E.B. Du Bois (source)

A wonderful historical example is the recently re-discovered works of W.E.B. Du Bois. He pulled census data, among other sources, to create a catalog of “the African American” in 1900. He brought the inventive and novel infographics to the world’s fair in Paris to showcase the work that needed to happen to create true freedom in the US post-slavery (work that is still being done today). He worked with African American students at the university to repurpose this census data to tell their story. These graphics are an example of self-determination – highlighting the problems the subjects of the data themselves have chosen. His statistical techniques were detailed, but he invented new ways to communicate them to a larger, less data-literate audience.

A data mural created by the Created by the Collaborate for Healthy Weight Coalition (August 2013)

A general theme in my work is using the arts as an invitation to bring people together around data to tell their own story. My work on data murals, a collaboration with my wife Emily Bhargava, is a prime example of this. We bring a group of people together around some data about them, help them find a story they want to tell, and then collaboratively design a mural to tell it. This puts the ownership of the data analysis and the data story in their hands, flipping the standard process on its head. The subjects of the data are empowering to tell the data story, with invitations to analyze the data that build their capacity and meet them where they are in skills and interests.

Data 4 Black Lives founders Yeshimabeit Milner, Lucas Mason-Brown, and Max Clermont

A more community-focused example comes from the Data 4 Black Lives organization(D4BL). The brutal legacy of slavery in the US permeates our culture, so it should be no surprise that it continues to poison the datasets we use as well. D4BL is working to highlight these problems, bring together organizers, data scientists, and students, and also influence policy to put data in service of the needs of black lives in the US. This is a traditionally marginalized community empowering themselves with the language of those in power (data) and trying to build community in service of their own goals.

hearts and a woman seeming to be in pain, half the hearts are filled in, half are empty — 1st place winner Danford Marco and his Khanga design

For a non-US example, we can look to the work done by the Tanzania Bhora Initiative and Faru Arts and Sports Development Organizatio as part of the Data Zetu Intiaitive in Tanzania. They ran a competition for designers to create khanga cloth patterns based on data (the khanga is a traditional cotton clothoften adorned with sayings or shout-outs). The project built the capacity of designers to speak data, and ended with a fashion show showcasing the winning designs (read more in their blog post). The first place winner (Danford Marco) created a design to reflect that 1 out of every 2 married women have faced some kind of abuse from their husband. A staggering statistic, and a stunning design to bring attention to the problem. This kind of creative approach to building data capacity is an example of a very different process, one that is inclusive, builds capacity, and gives ownership of the data to the people it is about.

Match your Mission and Methods

an arrow connecting data to a building with a heart — Align your data methods and your mission

I’m hoping by now that I’ve convinced you that you need to think harder about the data methods you use in order to avoid re-creating the terrible historical practice associated with data. I’m focused on organizations that work for the social good, but this argument holds true for anyone using data. The inspirational examples I highlight all paint a path forward that lets us match our mission and our methods. Which path will you follow?

data culture, data literacy, workshops

Workshop at the 2018 UN World Data Forum

A few years ago I went to the first UN World Data Forum and made some amazing connections with non-profits large and small (read more about that here). A common theme at that event was how to help organizations and governments get the data they needed to start work on the Sustainable Development Goals.

dt6wkqgx0aak6ux

I just returned from the 2018 event, and found a new message repeated over and over – how can we help those who have data communicate about its potential and its impact? I’ll write more about that later. For now I want to share a bit about the session I ran with my collaborator Maryna Taran from the World Food Program (WFP). It was a pleasure to return to the event where we first met and speak to the impact we’ve had at WFP, and how the Data Culture Project has grown to a suite of 7 hands-on activities you can use for free right now.

Empowering Those That Don’t “Speak” Data

Our session was designed to focus on bringing the non-data literate into the data-centered conversation. The idea is that we can help these folks learn to “speak” data with playful activities that try to meet them where they are, rather than with technical trainings that focus on specific tools.

We introduced our arts-centric approach to creating participatory invitations through the data cycle – from data collection, to story-finding, to story-telling. Specifically, we ran our Paper Spreadsheets activity and our Data Sculptures activity. Maryna also shared how the WFP has rolled out a data program globally, where the Data Culture Proect activities fit into it, and some of the impacts they’ve seen already.

dqhypxtwsaakt1j — Participants filling in a paper spreadsheet.

The Paper Spreadsheet activity led to a wonderful discussion of data types, survey question create, and security concerns. The Data Sculptures folks created were a great mix of different types of stories, so I highlighted some of the scaffolding we’ve created for finding stories in data.

One of the most rewarding comments at the end was from a woman who worked on the data analysis side creating charts and such for her team. She noted that she often will share a chart with others on the team and they’ll say “tell me the story”, much to her frustration – she just didn’t understand what they meant. What more did they want than the chart showing them the evidence of the claim or pattern? She was pleased to share that after this session, she finally had a way to think about the difference between the charts she was making and the story that her colleagues might be looking for! Such a wonderful comment that resonated with a lot of the points Maryna was making about how and why WFP is rolling out the Data Culture Project activities in parallel with their more technical data trainings.

Here are the slides we used, for reference:

Empowering those that don't "speak" data from rahulbot

data literacy, tools

Tools Won’t Write Your Data Story For You

When people think about working with data, they usually think about the technologies that help us capture, manipulate, and make arguments with data. Over the last decade we’ve seen radical growth and innovation in the toolchains available to do this, leading to a huge increase in the number of people that have started working with data in some capacity. For data literacy learners, it is tempting to let the tool dictate the outcomes. Need to make a chart? Let the tool recommend one. The problem is that these tools don’t help you with the process of working with data. The tools won’t write your data story for you — you have to run that process yourself. Here are a couple of examples that help illuminate this gap that I see, and how my work with Catherine D’Ignazio on the Data Culture Project helps address it.

Excel doesn’t help you ask the right questions

When you’re first starting to work with a dataset on some problem, it is critical to frame a good set of questions aligned with your goals. Do you have a hypothesis that you are trying to test? Is there a specific audience you are trying to engage with the data? What assumptions do you bring to the data? Tools like Excel help you dive in to a dataset with low friction, but can’t help you identify the right questions to ask. Sometimes you need to take a step back and think about what you’re trying to achieve.

We created our “Ask Good Questions” activity to introduce this idea. You use our WTFcsv tool to browse quick visual summaries of columns in your dataset, and then brainstorm questions that could be interesting to ask it, other datasets that might help you answer those questions, and how you’d get those datasets. You don’t try to answer the questions at this initial phase, you just capture them all as potential roads to follow.

A photo a from workshop in Boston, MA, USA where we tested some of these activites.

Diving into one dataset can be like putting on blinders; it limits your ability to see the possibilities outside of that single dataset. You have to take a step back to consider questions that might require you to pull in and use other datasets. Tools can’t do this for you; it is up to you to make the effort to distance yourself from where you start in order to make sure you stay on target for you goals with the data.

Tableau won’t write you a strong narrative arc

Humans think and understand the world through stories. Background context, strong characters, a clear flow from start to end — these are the key elements of story that help engage us all. Telling stories and making arguments with data is no different. Tools like Tableau can be very powerful for helping us debug the visual representation of our data stories, but they don’t scaffold the process of coming up with the narrative arc. Which characters appeal to the audience we have in mind? What resolution of our data story will drive action towards the goals we want?

Our “Data Storybook” activity helps you prototype your data story’s narrative arc. Once you’ve got a story to tell, follow our instructions to fold a big piece of paper into a small storybook. Then use comic-book or children’s-book style illustrations to tell your story over three page flips. Your book has to say “once upon a time” on the front cover, and say “the end” on the back cover. There is something magical about the page turn… it forces you to think about the beginning, middle and end of your story. This is how you work on a strong narrative arc.

Participants at a workshop in Belo Horizonte, Brazil, writing a data storybook.

When you’re telling a data story, you have to make sure the flow is supported at each point by the data you include. Any data you include must reinforce and connect to the main narrative arc, no matter how detailed it is, otherwise you’re going to lead your intended audience astray and lose them in a confusing non-central plot-line. The tools can’t do this process for you.

R can’t help you pick the right way to tell your story

Once you’ve got a strong narrative with your data, how do you decide what format to tell it in? This decisions has to be driven by strong definitions of your audience and goals. Tools like R can’t make those connections for you; you have to consider the constraints yourself. Will readers have the visual literacy and geographic awareness that it takes to read a map visualization? Are they pre-disposed to agree or disagree with you? Are you presenting in a formal setting to an engaged audience, or on the street at a festival? These kinds of questions are critical for making an informed decision about what medium to tell your data story in.

We built our “Remix an Infographic” and “ConvinceMe” activities to help you work on this skill. The first invites you to look at the argument an infographic is making and then try to “remix it” — telling the same story as a data sculpture, creative map, personal story, or data game. Understanding the affordances of each media is critical for picking an appropriate one for your audience and goal. The second activity, ConvinceMe, helps you practice brainstorming data arguments that can drive different audiences to action. You identify stakeholders that can effect change on the system your dataset represents, invite volunteers to role play them, and then collectively try to drive them to action with creative data-informed arguments.

Workshop participants in Boston, MA, USA planning out their remix on a whiteboard.

Any dataset contains a multitude of potential stories, all of which can be told in a variety of ways. The digital tools that exist don’t help you navigate that space to pick the best story, nor tell it in the more appropriate way.

Working with Data Relies on Strong Processes and Strong Tools

People think about technology as soon as you mention data. I hope these example help illustrate why I think it is so important to separate the issues of the processes and technological tools for working with data. The innovative tools can only help you so much.

Organizations around the world are using these activities I mentioned to work on the process pieces, often in parallel with their work introducing technological platforms. Non-profits, newsrooms, libraries — these are some of our many partners on the Data Culture Project.

Do you want to try these activities out in your organization or classroom? Visit http://datacultureproject.org now to see our free suites of tools for working with people in creative, arts-based ways at every point of the process of going from data to story.

data literacy, teaching

Final Projects from Data Storytelling Studio 2018

Each spring I have the pleasure of teaching MIT undergrads and grad students in my Data Storytelling Studio course. It is a hands-on, projects focused course built around creating quick prototypes of data-driven stories that try to get an audience to do something. This year the course focused on climate change as the theme, as part of the Boston Civic Media intiative. I provide relatively clean datasets they can use, sourced from online portals and local community groups. Over the course of the semester they work on building charts, creative charts, maps, creative maps, data sculptures, and interactive data experiences. Checkout the newly published full course content on MIT’s OpenCourseware site. Here is a quick run-down of their final projects.

Building Back Somerville’s Urban Forests

Students in this group combined a few datasets to create a public event that motivated people to ask the city to plant more trees. They created the shadow of a “missing” tree on the ground and filled it with facts about the impacts and dangers to trees in the city. Visitors were invited to fill out paper leaves with their favorite tree stories and hang them on a tree the community built together. Their conversations with visitors illuminated how the public thinks about the tree canopy of the city, and how and why people were motivated to advocate for more. Read more about the project in their write-up.

Save the Bees

This group used data about bees populations to create a playground game for 2nd and 3rd grade students. The game helps them learn about why bees are important to us, and how to protect them. The kids enjoyed the game thoroughly, and made great suggestions for how to improve it. The quotes from the kids illustrated their drive to understand more fully how bees help our planet, and what they could do to help. Learn more about the game in their write-up.

img_7497-1024x768

Adventures of a Frequent Flyer

This group created a game visitors to a farmers market could play to explore data about how bees are moved around the US to pollinate crops in various states. Their goal was to engage the visitors in advocating for better laws to protect bees that move through their states. Participants were invited to pick their favorite fruit, and then follow the bees as they travelled to the state that produced it most. Read their whole write-up here.

hao

It’s a Mysterbee

This group focused on the MIT population of students, appealing to they sense of curiosity to motivate them to work on projects that might help bee colonies thrive. This interactive data sculpture invited participants to dip crackers in honey to find out about the volume of bee colony production in two different years. We don’t fully understand how bee colonies thrive and/or fail, which turned out bothers MIT students desire to understand everything! The projects across campus that were shared sparked folks who participated in the activity to get involved in the research. Read more in their write-up.

img_20180513_194142252-768x432

data culture, data literacy

Data Literacy Workshop @ PDC 2018

All our data literacy and data culture work is grounded in real workshops with community groups, non-profits, governments and businesses. However, I am an academic working at a university, so I also publish papers and go to conferences and such. For any others in that vein, below is information about a Data Literacy workshop I’m planning with Catherine D’Ignazio and Firaz Peer at the Participatory Design Conference this August. This is part of larger our efforts to build a larger group of peers working on these topics, and translate our collective learnings for use with a non-academic audience!

Learn more at http://firazpeer.lmc.gatech.edu/pdcworkshop/

Data Literacy Workshops as Participatory Design

A workshop at the Participatory Design Conference, Belgium, Aug 21, 2018.

About the workshop

Big Data analysis and data-driven decision-making are buzzwords that are quickly becoming aspirational goals within industry and government settings. This so called data revolution has resulted in what some have called a data divide, where those with privileged access and knowledge about such data are given a seat at the bargaining table, while the voices of those who lack such skills, continue to be ignored. The data literacy workshop we are proposing is designed to work with the data newcomers within our communities, to give them a chance to use publically available data as a resource to advocate for change. Grounded within the Participatory Design goals of equalizing power relations through democratic practices, the workshop activities allow data newcomers to engage constructively with issues that they care about. Our goal in proposing these sets of activities as a workshop is to generate discussions around data literacy, engagement, empowerment, access, power and privilege that are typically associated with data and cities, and build connections between the PDC audience and the data literacy practitioners so they can take this research forward in innovative ways

Our goal in proposing this interactive data literacy workshop to the PDC audience is to offer it as a method that they can use to engage with those who are new to data and analysis. We hope to create connections between the PD discipline and practitioners within the data literacy space to learn from each other and inform this emerging field, to try to move the needle away from boring spreadsheet trainings conducted in dry online settings. We are interested in learning how our attendees define the term ‘data literacy’ within their own research and practice, and the tools, methods and techniques they use to operationalize it. In addition to demonstration of our methods, our workshop schedule also sets aside time for discussions and brainstorming of additional activities/techniques within this pedagogical realm. We would like to get a sense of what empowerment through data means to our participants and the communities they collaborate with. How can designers negotiate power and privilege differentials in relation to access and skills of working with data?

To participate

We invite researchers, practitioners, activists, educators and designers who are interested in furthering the state of data literacy within their communities to submit short position papers (upto 1500 words). We invite researchers, practitioners, activists, educators and designers who are interested in furthering the state of data literacy within their communities to submit short position papers (up to 1500 words). We are open to a range of paper topics. For example, your paper might discuss how you conceive of data literacy or your research methods of choice. Your paper might discuss examples of data literacy and raise questions over what constitutes ethical engagement and empowerment. Your paper might outline uncharted territory in relation to identity, power and data literacy – including challenging the concept and emerging norms of data literacy. Or, finally, the paper might talk about interesting approaches to data literacy and how they might be made part of the workshop activities.

Papers should be in the ACM format as suggested by the PDC organizers and should be submitted to the organizers before May 10, 2018. Final decision on acceptance will be communicated to the applicants by May 25th, 2018.

Please email your position papers to firazpeer@gatech.edu. We expect to select a minimum of 10 and a maximum of 20 participants to take part in this workshop. Accepted participants will need to register for the workshop through the conference website.

Conference website: https://pdc2018.org/

Workshop website: http://firazpeer.lmc.gatech.edu/pdcworkshop/

Organizers

Firaz Peer, Georgia Institute of Technology
Rahul Bhargava, MIT Media Lab
Catherine D’Ignazio, Emerson College

data literacy, workshops

Data Literacy as a STEAM Activity for Youth

I’ve been connecting with more and more educators that want to take a creative approach to building data literacy with their students. Schools traditionally introduce data with in-class surveys and charting. This approach to generating their own data can be a wonderful way to empower learners to collect and represent data themselves. A more recent movement has centered around the STEAM movement – including the Arts along with the Science, Technology, Engineering, and Math curricular focus. I’m seeing a pattern at the intersection of these two approaches – educators are seeing strong engagement and results when they introduce their students to working with data through arts-based activities. Here’s a case study from a collaboration with the MIT Museum to flesh out how this can work.

Environmental Data Mosaics at the MIT Museum

This case-study was contributed by Brian Mernoff, one of my collaborators at the MIT Museum.

Each February, during Massachusetts school break, the MIT Museum runs a week of hands-on activities and workshops called Feb Fest. This year, the event was themed around our temporary exhibit, Big Bang Data, which explored how the increasing use of data affects technology, culture, and society. The purpose of the workshop was to let students view data sets of interest, understand these data sets, and share what they have learned with others in a creative and accessible way; all pieces of building their data literacy.

Data Sculptures as a Quick Introduction

As soon as students entered the classroom, they were asked to create a data sculpture based upon one of the sets of data placed at on their table. This is an activity the MIT Museum Idea Hub has already been running regularly. These data sheets contained relatively straightforward data sets to analyze, such as happiness in Somerville, and the cost of college over time. Art supplies were on the table, and the students worked with each other to create these sculptures while getting to know one another. After about half an hour each team presented how they decided to represent their data to the class. This activity was a great way to get them to get used to talking about data with each other and representing it in a novel way.

steam-data-literacy1 — Data sculptures created by participants

Building a Collaborative Data Mosaic

After presenting the data sculptures, we began the main activity for the day. Students were given a list of websites (see below) that they could visit containing environmental data in either graphical or numeric form (see the Environmental Data Search worksheet). Once they had explored the websites, they discussed these websites with a second group of two at their table and determined which one of the links was most interesting to them to explore for the remainder of the project. Once the website was chosen, they again worked in their original group of two to find a story in one of the data sets on the website using the “Finding a Data Story” worksheet. After doing so, the two smaller groups recombined and chose which of the two stories they would like to tell in the final project.

In their story, students needed to explain the problem the data connects to, what the data is and shows, why the data is important, what the audience of the story should do about it, and what would happen in the long run if the reader did what was suggested (see the Data Story Mosaic Layout worksheet).

steam-data-literacy2 — Some of the tiles participants made for the data mosaic

Learning Outcomes

Beyond these physical artifacts, the students’ discussions about data were particularly impressive. One group brought up a very interesting question about rare bird sightings and proceeded to debate it for about 15 minutes. They noticed that certain areas of the United States had more overall sightings of rare birds. At the same time, they looked at another data set on the same website showing the number of reporting bird observers across the country. Combining these graphs, they noticed that more rare birds are spotted where there are more reporters. This brought up the question of whether or not rare birds are actually as rare as shown by the data if there is such a close relationship between the two data sets. Both sides of the debate made good arguments and they eventually settled on the idea that the data was still valid, but incomplete. They would need more experiments in order to say anything conclusively. This demonstrates that the learners were in the “data headspace”, thinking about standard questions of representation, outliers, and normalization.

A second group, studying data on arable land, was trying to combine their data set with information on organic farming. This brought up good questions about what the terms “organic” and “GMO” actually meant, as well as whether or not it is related to the ability to reuse land over time. To their surprise, the students did some more research and realized that genetically modified foods and some types of “non-organic” farming actually increase what land can be farmed. Again, the activity pulled the learners into a space where they were curious and driven to understand the real-world approaches and impacts the data might be representing; making sure they understood what they had in front of them before finding a story to tell with it.

Overall, these projects allowed students not just to analyze data to find trends, but to think about why data is important and it can be used to find solutions to problems. Through their mosaics, students explored and discussed different potential solutions to determine which one they wanted to communicate with a larger audience.

The Opportunity of STEAM

Brian’s workshop is a wonderful example of how a creative arts-based approach to working with data can engage and proboke students in novel ways. It matches results we’ve seen in previous work on creating data murals with youth in Brazil, and working with a network of school on data challenges. These workshops are starting to help us build an evidence base for using the arts as an introduction to working with data. This can meet a larger set of students where they are. The physical artifacts and conversations around them are assets we use for evaluation and assessment. Are you an educator? We’d enjoy hearing how you are approaching this.

References

Websites with Environmental Data

data culture, data literacy

Building Data Capacity Roundtable (Video Available)

Our partners at the Stanford’s Digital Impact initiative recently invited us to host a virtual roundtable discussion focused on building data capacity. In case you missed it, the recording and transcript are now online!

We gave a quick background on the Data Culture Project. Then we tried a quick online data sculpture activity; asking participants to make and share a photo of a physical data story just using things they found around their office. From there we pivoted into a discussion of how the World Food Programme and El Radioperiódico Clarín are building capacity to work with data in creative ways.

Panelists included:

Catherine D’Ignazio, Assistant Professor of Civic Media & Data Visualization at Emerson College,
Rahul Bhargava, Research Scientist at MIT Center for Civic Media;
Maryna Taran, Data and Digital Collaboration Product Manager at World Food Programme; and
Andrés Felipe Vera Ramírez, Strategic Director at El Radioperiódico Clarín.

data literacy, teaching

Launching the Data Culture Project

Learning to work with data is like learning a new language — immersing yourself in the culture is the best way to do it. For some individuals, this means jumping into tools like Excel, Tableau, programming, or R Studio. But what does this mean for a group of people that work together? We often talk about data literacy as if it’s an individual capacity, but what about data literacy for a community? How does an organization learn how to work with data?

About a year ago we (Rahul Bhargava and Catherine D’Ignazio) found that more and more users of our DataBasic.io suite of tools and activities were asking this question — online and in workshops. In response, with support from the Stanford Center on Philanthropy and Civil Society, we’ve worked together with 25 organizations to create the Data Culture Project. We’re happy to launch it publicly today! Visit datacultureproject.org to learn more.

Update: Join our webinar on April 12th to learn more!

The Data Culture Project is a hands-on learning program to kickstart a data culture within your organization. We provide facilitation videos to help you run creative introductions to get people across your organization talking to each other — from IT to marketing to programs to evaluation. These are not boring spreadsheet trainings! Try running our fun activities — one per month works as a brown bag lunch to focus people on a common learning goal. For example, “Sketching a Story” brings people together around basic concepts of quantitative text analysis and visual storytelling. “Asking Good Questions” introduces principles of exploratory data analysis in a fun environment. What’s more, you can use the sample data that we provide, or you can integrate your organization’s data as the topic of conversation and learning.

Developing Together

We built DataBasic.io to help individuals build their data literacy in more creative ways. We’ve baked in design principles that focused on learners (read our paper), argued to tool designers that their web-based tools are in fact informal learning spaces (watch our talk video), documented how our activities are particularly well suited to data literacy learners (read another paper), and focused them on building a data mindset (read our opinion piece).

These activities and tools were designed and iterated on with interested users (with support from the Knight Foundation). We develop all our tools based on the problem organizations bring to us. Our latest grant was a partnership with Tech Networks of Boston, who brought years of experience working with organizations to develop their capacity and skills in a variety of ways. We prototyped a first set of videos, for the WordCounter “Sketch a Story” activity with them, and tried it out in a local workshop with some of their partners and clients.

Trying Out a Model — the Data Culture Pilot

Based on how that went, we recruited 25 organizations from around the world to help us build the Data Culture Project. Non-profits, newsrooms, libraries, community groups were included in this cohort, and we created a network to help us guide our prototyping. Over the last 6 months, each group ran 3 activities within their organizations as brown-bag lunches.

It was wonderful to have collaborators that were willing to try out some half-baked things! After each workshop, they shared how it went on a group mailing list. Then each month we hosted an online chat to get feedback and share insights and common points from the feedback.

Even in these prototype sessions, the participants shared some wonderful insights. Here are just a few:

“It did lead to a pretty significant rethink fo the communications director for what is coming out in the spring.”
“I hear back from participants regularly about how much they enjoyed the activities and wondering what comes next.”
“As they were working through their data sets, they kept coming up with more questions it made them wonder about and more things to consider about those questions.”
“They can relate everything back to their own situations / data / organizations.”

We were heartened and excited to see that our design partners were able to see impacts already!

How to Join the Community

We are launching the Data Culture Project today. Here’s how to make the best use of the project and the community:

Read about why you don’t need a data scientist; you need a data culture to understand why data literacy needs to be understood as a community capacity, in addition to an individual capacity.
Run one or more of the activities listed on the Data Culture Project home page. We found in the pilot that running one per month (and providing pizza) can work to bring people together.
Remix and modify the activity to work for you and tell us about it! At the bottom of each activity page, you’ll see a “Learn With Others” comment box where you can tell others what worked for you (á la Internet food recipe sites).
Join our mailing list to connect with others working on creative approaches to building capacity in their organizations (and be the first to hear about new activities and projects).

Remix and modify the activity to work for you and tell us about it! At the bottom of each activity page in the Data Culture Project, you’ll see a “Learn With Others” comment box where you can tell others what worked for you (á la Internet food recipe sites).

We are grateful to the Stanford Center on Philanthropy and Civil Society for supporting the development of the Data Culture Project. The Data Culture Project is headed by Rahul Bhargava and Catherine D’Ignazio, undertaken as a collaboration between the MIT Center for Civic Media and the Engagement Lab@Emerson College, and with the assistance of Becky Michelson (project manager) and Jon Elbaz (research assistant).

data literacy, ethics

The algorithms aren’t biased, we are

Excited about using AI to improve your organization’s operations? Curious about the promise of insights and predictions from computer models? I want to warn you about bias and how it can appear in those types of projects, share some illustrative examples, and translate the latest academic research on “algorithmic bias”.

First off – language matters. What we call things shapes our understanding of them. That’s why I try to avoid the hype-driven term “artificial intelligence”. Most projects called that are more usefully described as “machine learning”. Machine learning can be described as the process of training a computer to make decisions that you want help making. This post describes why you need to worry about the data in your machine learning problem.

This matters in a lot of ways. “Algorithmic bias” is showing up all over the press right now. What does that term mean? Algorithms are doling our discriminatory sentence recommendations for judges to use. Algorithms are baking in gender stereotypes to translation services. Algorithms are pushing viewers towards extremist videos on YouTube. Most folks I know agree this is not the world we want. Let’s dig into why that is happening, and put the blame where it should be.

Your machine is learning, but who is teaching it?

Physics is hard for me. Even worse – i don’t think I’ll ever be good at physics. I attribute a lot of this to a poor high school physics teacher, who was condescending to me and the other students. On the other hand, while I’m not great at complicated math, I like trying to learn it better. I trace this continued enthusiasm to my junior high school math teacher, who introduced us to the topic with excitement and playfulness (including donut rewards for solving bonus problems!).

My point in sharing this story? Teachers matter. This is even more true in machine learning – machines don’t bring prior experience, contextual beliefs, and all the other things that make it important to meet human learners where they are and provide many paths into content. Machines only learn from only what you show them.

So in machine learning, the questions that matter are “what is the textbook” and “who is the teacher”. The textbook in machine learning is the “training data” that you show to your software to teach it how to make decisions. This usually is some data you’ve examined and labeled with the answer you want. Often it is data you’ve gathered from lots of other sources that did that work already (we often call this a “corpus”). If you’re trying to predict how likely someone receiving a micro-loan is to repay it, then you might pick training data that includes previous payment histories of current loan recipients.

The second part is about who the teacher is. The teacher decides what questions to ask, and tells learners what matters. In machine learning, the teacher is responsible for “feature selection” – deciding what pieces of the data the machine is allowed to use to make its decisions. Sometimes this feature selection is done for you by what is and isn’t included in the training sets you have. More often you use some statistics to have the computer pick the features most likely to be useful. Returning to our micro-loan example: some candidate features could be loan duration, total amount, whether the recipient has a cellphone, marital status, or their race.

These two questions – training data and training features – are central to any machine learning project.

Algorithms are mirrors

Let’s return to this question of language with this in mind.. perhaps a more useful term for “machine learning” would be “machine teaching”. This would put the responsibility where it lies, on the teacher. If you’re doing “machine learning”, you’re most interested in what it is learning to do. With “machine teaching”, you’re most interested in what you are teaching a machine to do. That’s a subtle difference in leanguage, but a big difference in understanding.

Putting the responsibility on the teacher helps us realize how tricky this process is. Remember this list of biases examples I started with? That sentencing algorithm is discriminatory because it was taught with sentencing data for the US court system, which data shows is vey forgiving to everyone except black men. That translation algorithm that bakes in gender stereotypes was probably taught with data from the news or literature, which we known bakes in our-of-date gender roles and norms (ie. Doctors are “he”, while nurses are “she”). That algorithm that surfaces fake stories on your feed is taught to share what lots of other people share, irrespective of accuracy.

All that data is about us.

Those algorithms aren’t biased, we are! Algorithms are mirrors.

They reflect the biases in our questions and our data. These biases get baked into machine learning pejects in both feature selection and training data. This is on us, not the computers.

Corrective lenses

So how do we detect and correct this? Teachers feel a responsibility for, and pride in, their students’ learning. Developers of machine learning models should feel a similar responsibility, and perhaps should be allowed to feel a similar pride.

I’m heartened by examples like Microsoft’s efforts to undo gender bias in publicly available language models (trying to solve the “doctors are men” problem). I love my colleague Joy Buolamwini’s efforts to reframe this as a question of “justice” in the social and technical intervention she calls the “Algorithmic Justice League” (video). ProPublica’s investigative reporting is holding companies accountable for their discriminatory sentencing predictions. The amazing Zeynep Tufekci is leading the way in speaking and writing about the danger this poses to society at large. Cathy O’Neil’s Weapons of Math Destruction documents the myriad of implications for this, raising a warning flag for society at large. Fields like law are debating the implications of algorithm-driven decision making in public policy settings. City ordinances are started to tackle the question of how to legislate against some of the effects I’ve described.

These efforts can hopefully serve as “corrective lenses” for these algorithmic mirrors – addressing the troubling aspects we see in our own reflections. The key here is to remember that it is up to us to do something about this. Determining a decision with an algorithm doesn’t automatically make it reliable and trustworthy; just like quantifying something with data doesn’t automatically make it true. We need to look at our own reflections in these algorithmic mirrors and make sure we see the future we want to see.