Workshop at the 2018 UN World Data Forum

A few years ago I went to the first UN World Data Forum and made some amazing connections with non-profits large and small (read more about that here).  A common theme at that event was how to help organizations and governments get the data they needed to start work on the Sustainable Development Goals.

dt6wkqgx0aak6ux

I just returned from the 2018 event, and found a new message repeated over and over – how can we help those who have data communicate about its potential and its impact? I’ll write more about that later.  For now I want to share a bit about the session I ran with my collaborator Maryna Taran from the World Food Program (WFP).  It was a pleasure to return to the event where we first met and speak to the impact we’ve had at WFP, and how the Data Culture Project has grown to a suite of 7 hands-on activities you can use for free right now.

Empowering Those That Don’t “Speak” Data

Our session was designed to focus on bringing the non-data literate into the data-centered conversation.  The idea is that we can help these folks learn to “speak” data with playful activities that try to meet them where they are, rather than with technical trainings that focus on specific tools.

We introduced our arts-centric approach to creating participatory invitations through the data cycle – from data collection, to story-finding, to story-telling.  Specifically, we ran our Paper Spreadsheets activity and our Data Sculptures activity.  Maryna also shared how the WFP has rolled out a data program globally, where the Data Culture Proect activities fit into it, and some of the impacts they’ve seen already.

dqhypxtwsaakt1j
Participants filling in a paper spreadsheet.

The Paper Spreadsheet activity led to a wonderful discussion of data types, survey question create, and security concerns. The Data Sculptures folks created were a great mix of different types of stories, so I highlighted some of the scaffolding we’ve created for finding stories in data.

One of the most rewarding comments at the end was from a woman who worked on the data analysis side creating charts and such for her team.  She noted that she often will share a chart with others on the team and they’ll say “tell me the story”, much to her frustration – she just didn’t understand what they meant.  What more did they want than the chart showing them the evidence of the claim or pattern? She was pleased to share that after this session, she finally had a way to think about the difference between the charts she was making and the story that her colleagues might be looking for!  Such a wonderful comment that resonated with a lot of the points Maryna was making about how and why WFP is rolling out the Data Culture Project activities in parallel with their more technical data trainings.

Here are the slides we used, for reference:

Tools Won’t Write Your Data Story For You

When people think about working with data, they usually think about the technologies that help us capture, manipulate, and make arguments with data. Over the last decade we’ve seen radical growth and innovation in the toolchains available to do this, leading to a huge increase in the number of people that have started working with data in some capacity. For data literacy learners, it is tempting to let the tool dictate the outcomes. Need to make a chart? Let the tool recommend one. The problem is that these tools don’t help you with the process of working with data. The tools won’t write your data story for you — you have to run that process yourself. Here are a couple of examples that help illuminate this gap that I see, and how my work with Catherine D’Ignazio on the Data Culture Project helps address it.

Excel doesn’t help you ask the right questions

When you’re first starting to work with a dataset on some problem, it is critical to frame a good set of questions aligned with your goals. Do you have a hypothesis that you are trying to test? Is there a specific audience you are trying to engage with the data? What assumptions do you bring to the data? Tools like Excel help you dive in to a dataset with low friction, but can’t help you identify the right questions to ask. Sometimes you need to take a step back and think about what you’re trying to achieve.

We created our “Ask Good Questions” activity to introduce this idea. You use our WTFcsv tool to browse quick visual summaries of columns in your dataset, and then brainstorm questions that could be interesting to ask it, other datasets that might help you answer those questions, and how you’d get those datasets. You don’t try to answer the questions at this initial phase, you just capture them all as potential roads to follow.

A photo a from workshop in Boston, MA, USA where we tested some of these activites.

Diving into one dataset can be like putting on blinders; it limits your ability to see the possibilities outside of that single dataset. You have to take a step back to consider questions that might require you to pull in and use other datasets. Tools can’t do this for you; it is up to you to make the effort to distance yourself from where you start in order to make sure you stay on target for you goals with the data.

Tableau won’t write you a strong narrative arc

Humans think and understand the world through stories. Background context, strong characters, a clear flow from start to end — these are the key elements of story that help engage us all. Telling stories and making arguments with data is no different. Tools like Tableau can be very powerful for helping us debug the visual representation of our data stories, but they don’t scaffold the process of coming up with the narrative arc. Which characters appeal to the audience we have in mind? What resolution of our data story will drive action towards the goals we want?

Our “Data Storybook” activity helps you prototype your data story’s narrative arc. Once you’ve got a story to tell, follow our instructions to fold a big piece of paper into a small storybook. Then use comic-book or children’s-book style illustrations to tell your story over three page flips. Your book has to say “once upon a time” on the front cover, and say “the end” on the back cover. There is something magical about the page turn… it forces you to think about the beginning, middle and end of your story. This is how you work on a strong narrative arc.

Participants at a workshop in Belo Horizonte, Brazil, writing a data storybook.

When you’re telling a data story, you have to make sure the flow is supported at each point by the data you include. Any data you include must reinforce and connect to the main narrative arc, no matter how detailed it is, otherwise you’re going to lead your intended audience astray and lose them in a confusing non-central plot-line. The tools can’t do this process for you.

R can’t help you pick the right way to tell your story

Once you’ve got a strong narrative with your data, how do you decide what format to tell it in? This decisions has to be driven by strong definitions of your audience and goals. Tools like R can’t make those connections for you; you have to consider the constraints yourself. Will readers have the visual literacy and geographic awareness that it takes to read a map visualization? Are they pre-disposed to agree or disagree with you? Are you presenting in a formal setting to an engaged audience, or on the street at a festival? These kinds of questions are critical for making an informed decision about what medium to tell your data story in.

We built our “Remix an Infographic” and “ConvinceMe” activities to help you work on this skill. The first invites you to look at the argument an infographic is making and then try to “remix it” — telling the same story as a data sculpture, creative map, personal story, or data game. Understanding the affordances of each media is critical for picking an appropriate one for your audience and goal. The second activity, ConvinceMe, helps you practice brainstorming data arguments that can drive different audiences to action. You identify stakeholders that can effect change on the system your dataset represents, invite volunteers to role play them, and then collectively try to drive them to action with creative data-informed arguments.

Workshop participants in Boston, MA, USA planning out their remix on a whiteboard.

Any dataset contains a multitude of potential stories, all of which can be told in a variety of ways. The digital tools that exist don’t help you navigate that space to pick the best story, nor tell it in the more appropriate way.

Working with Data Relies on Strong Processes and Strong Tools

People think about technology as soon as you mention data. I hope these example help illustrate why I think it is so important to separate the issues of the processes and technological tools for working with data. The innovative tools can only help you so much.

Organizations around the world are using these activities I mentioned to work on the process pieces, often in parallel with their work introducing technological platforms. Non-profits, newsrooms, libraries — these are some of our many partners on the Data Culture Project.

Do you want to try these activities out in your organization or classroom? Visit http://datacultureproject.org now to see our free suites of tools for working with people in creative, arts-based ways at every point of the process of going from data to story.

Final Projects from Data Storytelling Studio 2018

Each spring I have the pleasure of teaching MIT undergrads and grad students in my Data Storytelling Studio course. It is a hands-on, projects focused course built around creating quick prototypes of data-driven stories that try to get an audience to do something.  This year the course focused on climate change as the theme, as part of the Boston Civic Media intiative.  I provide relatively clean datasets they can use, sourced from online portals and local community groups.  Over the course of the semester they work on building charts, creative charts, maps, creative maps, data sculptures, and interactive data experiences. Checkout the newly published full course content on MIT’s OpenCourseware site. Here is a quick run-down of their final projects.

Building Back Somerville’s Urban Forests

Students in this group combined a few datasets to create a public event that motivated people to ask the city to plant more trees.  They created the shadow of a “missing” tree on the ground and filled it with facts about the impacts and dangers to trees in the city.  Visitors were invited to fill out paper leaves with their favorite tree stories and hang them on a tree the community built together. Their conversations with visitors illuminated how the public thinks about the tree canopy of the city, and how and why people were motivated to advocate for more. Read more about the project in their write-up.

data_storytelling_final-021-768x432.jpeg

Save the Bees

This group used data about bees populations to create a playground game for 2nd and 3rd grade students.  The game helps them learn about why bees are important to us, and how to protect them.  The kids enjoyed the game thoroughly, and made great suggestions for how to improve it.  The quotes from the kids illustrated their drive to understand more fully how bees help our planet, and what they could do to help. Learn more about the game in their write-up.

img_7497-1024x768

Adventures of a Frequent Flyer

This group created a game visitors to a farmers market could play to explore data about how bees are moved around the US to pollinate crops in various states. Their goal was to engage the visitors in advocating for better laws to protect bees that move through their states. Participants were invited to pick their favorite fruit, and then follow the bees as they travelled to the state that produced it most. Read their whole write-up here.

hao

It’s a Mysterbee

This group focused on the MIT population of students, appealing to they sense of curiosity to motivate them to work on projects that might help bee colonies thrive. This interactive data sculpture invited participants to dip crackers in honey to find out about the volume of bee colony production in two different years. We don’t fully understand how bee colonies thrive and/or fail, which turned out bothers MIT students desire to understand everything! The projects across campus that were shared sparked folks who participated in the activity to get involved in the research. Read more in their write-up.

img_20180513_194142252-768x432

Data Literacy Workshop @ PDC 2018

All our data literacy and data culture work is grounded in real workshops with community groups, non-profits, governments and businesses. However, I am an academic working at a university, so I also publish papers and go to conferences and such.  For any others in that vein, below is information about a Data Literacy workshop I’m planning with Catherine D’Ignazio and Firaz Peer at the Participatory Design Conference this August.  This is part of larger our efforts to build a larger group of peers working on these topics, and translate our collective learnings for use with a non-academic audience!

Learn more at http://firazpeer.lmc.gatech.edu/pdcworkshop/ 

Data Literacy Workshops as Participatory Design

A workshop at the Participatory Design Conference, Belgium, Aug 21, 2018.

About the workshop

Big Data analysis and data-driven decision-making are buzzwords that are quickly becoming aspirational goals within industry and government settings. This so called data revolution has resulted in what some have called a data divide, where those with privileged access and knowledge about such data are given a seat at the bargaining table, while the voices of those who lack such skills, continue to be ignored. The data literacy workshop we are proposing is designed to work with the data newcomers within our communities, to give them a chance to use publically available data as a resource to advocate for change. Grounded within the Participatory Design goals of equalizing power relations through democratic practices, the workshop activities allow data newcomers to engage constructively with issues that they care about. Our goal in proposing these sets of activities as a workshop is to generate discussions around data literacy, engagement, empowerment, access, power and privilege that are typically associated with data and cities, and build connections between the PDC audience and the data literacy practitioners so they can take this research forward in innovative ways

Our goal in proposing this interactive data literacy workshop to the PDC audience is to offer it as a method that they can use to engage with those who are new to data and analysis. We hope to create connections between the PD discipline and practitioners within the data literacy space to learn from each other and inform this emerging field, to try to move the needle away from boring spreadsheet trainings conducted in dry online settings. We are interested in learning how our attendees define the term ‘data literacy’ within their own research and practice, and the tools, methods and techniques they use to operationalize it. In addition to demonstration of our methods, our workshop schedule also sets aside time for discussions and brainstorming of additional activities/techniques within this pedagogical realm. We would like to get a sense of what empowerment through data means to our participants and the communities they collaborate with. How can designers negotiate power and privilege differentials in relation to access and skills of working with data?

To participate

We invite researchers, practitioners, activists, educators and designers who are interested in furthering the state of data literacy within their communities to submit short position papers (upto 1500 words). We invite researchers, practitioners, activists, educators and designers who are interested in furthering the state of data literacy within their communities to submit short position papers (up to 1500 words). We are open to a range of paper topics. For example, your paper might discuss how you conceive of data literacy or your research methods of choice. Your paper might discuss examples of data literacy and raise questions over what constitutes ethical engagement and empowerment. Your paper might outline uncharted territory in relation to identity, power and data literacy – including challenging the concept and emerging norms of data literacy. Or, finally, the paper might talk about interesting approaches to data literacy and how they might be made part of the workshop activities.

Papers should be in the ACM format as suggested by the PDC organizers and should be submitted to the organizers before May 10, 2018. Final decision on acceptance will be communicated to the applicants by May 25th, 2018.

Please email your position papers to firazpeer@gatech.edu. We expect to select a minimum of 10 and a maximum of 20 participants to take part in this workshop. Accepted participants will need to register for the workshop through the conference website.

Conference website: https://pdc2018.org/

Workshop website: http://firazpeer.lmc.gatech.edu/pdcworkshop/

Organizers

  • Firaz Peer, Georgia Institute of Technology
  • Rahul Bhargava, MIT Media Lab
  • Catherine D’Ignazio, Emerson College

Data Literacy as a STEAM Activity for Youth

I’ve been connecting with more and more educators that want to take a creative approach to building data literacy with their students. Schools traditionally introduce data with in-class surveys and charting. This approach to generating their own data can be a wonderful way to empower learners to collect and represent data themselves. A more recent movement has centered around the STEAM movement – including the Arts along with the Science, Technology, Engineering, and Math curricular focus.  I’m seeing a pattern at the intersection of these two approaches – educators are seeing strong engagement and results when they introduce their students to working with data through arts-based activities. Here’s a case study from a collaboration with the MIT Museum to flesh out how this can work.

Environmental Data Mosaics at the MIT Museum

This case-study was contributed by Brian Mernoff, one of my collaborators at the MIT Museum.

Each February, during Massachusetts school break, the MIT Museum runs a week of hands-on activities and workshops called Feb Fest. This year, the event was themed around our temporary exhibit, Big Bang Data, which explored how the increasing use of data affects technology, culture, and society. The purpose of the workshop was to let students view data sets of interest, understand these data sets, and share what they have learned with others in a creative and accessible way; all pieces of building their data literacy.

Data Sculptures as a Quick Introduction

As soon as students entered the classroom, they were asked to create a data sculpture based upon one of the sets of data placed at on their table. This is an activity the MIT Museum Idea Hub has already been running regularly. These data sheets contained relatively straightforward data sets to analyze, such as happiness in Somerville, and the cost of college over time. Art supplies were on the table, and the students worked with each other to create these sculptures while getting to know one another. After about half an hour each team  presented how they decided to represent their data to the class. This activity was a great way to get them to get used to talking about data with each other and representing it in a novel way.

steam-data-literacy1
Data sculptures created by participants

Building a Collaborative Data Mosaic

After presenting the data sculptures, we began the main activity for the day. Students were given a list of websites (see below) that they could visit containing environmental data in either graphical or numeric form (see the Environmental Data Search worksheet). Once they had explored the websites, they discussed these websites with a second group of two at their table and determined which one of the links was most interesting to them to explore for the remainder of the project. Once the website was chosen, they again worked in their original group of two to find a story in one of the data sets on the website using the “Finding a Data Story” worksheet. After doing so, the two smaller groups recombined and chose which of the two stories they would like to tell in the final project.

In their story, students needed to explain the problem the data connects to, what the data is and shows, why the data is important, what the audience of the story should do about it, and what would happen in the long run if the reader did what was suggested (see the Data Story Mosaic Layout worksheet).

steam-data-literacy2
Some of the tiles participants made for the data mosaic

Learning Outcomes

Beyond these physical artifacts, the students’ discussions about data were particularly impressive. One group brought up a very interesting question about rare bird sightings and proceeded to debate it for about 15 minutes. They noticed that certain areas of the United States had more overall sightings of rare birds. At the same time, they looked at another data set on the same website showing the number of reporting bird observers across the country. Combining these graphs, they noticed that more rare birds are spotted where there are more reporters. This brought up the question of whether or not rare birds are actually as rare as shown by the data if there is such a close relationship between the two data sets. Both sides of the debate made good arguments and they eventually settled on the idea that the data was still valid, but incomplete. They would need more experiments in order to say anything conclusively.  This demonstrates that the learners were in the “data headspace”, thinking about standard questions of representation, outliers, and normalization.

A second group, studying data on arable land, was trying to combine their data set with information on organic farming. This brought up good questions about what the terms “organic” and “GMO”  actually meant, as well as whether or not it is related to the ability to reuse land over time. To their surprise, the students did some more research and realized that genetically modified foods and some types of “non-organic” farming actually increase what land can be farmed. Again, the activity pulled the learners into a space where they were curious and driven to understand the real-world approaches and impacts the data might be representing; making sure they understood what they had in front of them before finding a story to tell with it.

Overall, these projects allowed students not just to analyze data to find trends, but to think about why data is important and it can be used to find solutions to problems. Through their mosaics, students explored and discussed different potential solutions to determine which one they wanted to communicate with a larger audience.

The Opportunity of STEAM

Brian’s workshop is a wonderful example of how a creative arts-based approach to working with data can engage and proboke students in novel ways. It matches results we’ve seen in previous work on creating data murals with youth in Brazil, and working with a network of school on data challenges. These workshops are starting to help us build an evidence base for using the arts as an introduction to working with data. This can meet a larger set of students where they are.  The physical artifacts and conversations around them are assets we use for evaluation and assessment. Are you an educator? We’d enjoy hearing how you are approaching this.

References

Websites with Environmental Data

 

Building Data Capacity Roundtable (Video Available)

Our partners at the Stanford’s Digital Impact initiative recently invited us to host a virtual roundtable discussion focused on building data capacity. In case you missed it, the recording and transcript are now online!

We gave a quick background on the Data Culture Project. Then we tried a quick online data sculpture activity; asking participants to make and share a photo of a physical data story just using things they found around their office. From there we pivoted into a discussion of how the World Food Programme and El Radioperiódico Clarín are building capacity to work with data in creative ways.

Panelists included:

Launching the Data Culture Project

Learning to work with data is like learning a new language — immersing yourself in the culture is the best way to do it. For some individuals, this means jumping into tools like Excel, Tableau, programming, or R Studio. But what does this mean for a group of people that work together? We often talk about data literacy as if it’s an individual capacity, but what about data literacy for a community? How does an organization learn how to work with data?

About a year ago we (Rahul Bhargava and Catherine D’Ignazio) found that more and more users of our DataBasic.io suite of tools and activities were asking this question — online and in workshops. In response, with support from the Stanford Center on Philanthropy and Civil Society, we’ve worked together with 25 organizations to create the Data Culture Project. We’re happy to launch it publicly today! Visit datacultureproject.org to learn more.

Update: Join our webinar on April 12th to learn more!

The Data Culture Project is a hands-on learning program to kickstart a data culture within your organization. We provide facilitation videos to help you run creative introductions to get people across your organization talking to each other — from IT to marketing to programs to evaluation. These are not boring spreadsheet trainings! Try running our fun activities — one per month works as a brown bag lunch to focus people on a common learning goal. For example, “Sketching a Story” brings people together around basic concepts of quantitative text analysis and visual storytelling. “Asking Good Questions” introduces principles of exploratory data analysis in a fun environment. What’s more, you can use the sample data that we provide, or you can integrate your organization’s data as the topic of conversation and learning.

Developing Together

We built DataBasic.io to help individuals build their data literacy in more creative ways. We’ve baked in design principles that focused on learners (read our paper), argued to tool designers that their web-based tools are in fact informal learning spaces (watch our talk video), documented how our activities are particularly well suited to data literacy learners (read another paper), and focused them on building a data mindset (read our opinion piece).

These activities and tools were designed and iterated on with interested users (with support from the Knight Foundation). We develop all our tools based on the problem organizations bring to us. Our latest grant was a partnership with Tech Networks of Boston, who brought years of experience working with organizations to develop their capacity and skills in a variety of ways. We prototyped a first set of videos, for the WordCounter “Sketch a Story” activity with them, and tried it out in a local workshop with some of their partners and clients.

Trying Out a Model — the Data Culture Pilot

Based on how that went, we recruited 25 organizations from around the world to help us build the Data Culture Project. Non-profits, newsrooms, libraries, community groups were included in this cohort, and we created a network to help us guide our prototyping. Over the last 6 months, each group ran 3 activities within their organizations as brown-bag lunches.

It was wonderful to have collaborators that were willing to try out some half-baked things! After each workshop, they shared how it went on a group mailing list. Then each month we hosted an online chat to get feedback and share insights and common points from the feedback.

Even in these prototype sessions, the participants shared some wonderful insights. Here are just a few:

  • “It did lead to a pretty significant rethink fo the communications director for what is coming out in the spring.”
  • “I hear back from participants regularly about how much they enjoyed the activities and wondering what comes next.”
  • “As they were working through their data sets, they kept coming up with more questions it made them wonder about and more things to consider about those questions.”
  • “They can relate everything back to their own situations / data / organizations.”

We were heartened and excited to see that our design partners were able to see impacts already!

How to Join the Community

We are launching the Data Culture Project today. Here’s how to make the best use of the project and the community:

  • Read about why you don’t need a data scientist; you need a data culture to understand why data literacy needs to be understood as a community capacity, in addition to an individual capacity.
  • Run one or more of the activities listed on the Data Culture Project home page. We found in the pilot that running one per month (and providing pizza) can work to bring people together.
  • Remix and modify the activity to work for you and tell us about it! At the bottom of each activity page, you’ll see a “Learn With Others” comment box where you can tell others what worked for you (á la Internet food recipe sites).
  • Join our mailing list to connect with others working on creative approaches to building capacity in their organizations (and be the first to hear about new activities and projects).

Remix and modify the activity to work for you and tell us about it! At the bottom of each activity page in the Data Culture Project, you’ll see a “Learn With Others” comment box where you can tell others what worked for you (á la Internet food recipe sites).

We are grateful to the Stanford Center on Philanthropy and Civil Society for supporting the development of the Data Culture Project. The Data Culture Project is headed by Rahul Bhargava and Catherine D’Ignazio, undertaken as a collaboration between the MIT Center for Civic Media and the Engagement Lab@Emerson College, and with the assistance of Becky Michelson (project manager) and Jon Elbaz (research assistant).

The algorithms aren’t biased, we are

Excited about using AI to improve your organization’s operations? Curious about the promise of insights and predictions from computer models? I want to warn you about bias and how it can appear in those types of projects, share some illustrative examples, and translate the latest academic research on “algorithmic bias”.

First off – language matters. What we call things shapes our understanding of them. That’s why I try to avoid the hype-driven term “artificial intelligence”. Most projects called that are more usefully described as “machine learning”. Machine learning can be described as the process of training a computer to make decisions that you want help making. This post describes why you need to worry about the data in your machine learning problem.

This matters in a lot of ways. “Algorithmic bias” is showing up all over the press right now. What does that term mean? Algorithms are doling our discriminatory sentence recommendations for judges to use. Algorithms are baking in gender stereotypes to translation services. Algorithms are pushing viewers towards extremist videos on YouTube. Most folks I know agree this is not the world we want. Let’s dig into why that is happening, and put the blame where it should be.

Your machine is learning, but who is teaching it?

Physics is hard for me. Even worse – i don’t think I’ll ever be good at physics. I attribute a lot of this to a poor high school physics teacher, who was condescending to me and the other students. On the other hand, while I’m not great at complicated math, I like trying to learn it better. I trace this continued enthusiasm to my junior high school math teacher, who introduced us to the topic with excitement and playfulness (including donut rewards for solving bonus problems!).

My point in sharing this story? Teachers matter. This is even more true in machine learning – machines don’t bring prior experience, contextual beliefs, and all the other things that make it important to meet human learners where they are and provide many paths into content. Machines only learn from only what you show them.

So in machine learning, the questions that matter are “what is the textbook” and “who is the teacher”. The textbook in machine learning is the “training data” that you show to your software to teach it how to make decisions. This usually is some data you’ve examined and labeled with the answer you want. Often it is data you’ve gathered from lots of other sources that did that work already (we often call this a “corpus”). If you’re trying to predict how likely someone receiving a micro-loan  is to repay it, then you might pick training data that includes previous payment histories of current loan recipients.

The second part is about who the teacher is. The teacher decides what questions to ask, and tells learners what matters. In machine learning, the teacher is responsible for “feature selection” – deciding what pieces of the data the machine is allowed to use to make its decisions. Sometimes this feature selection is done for you by what is and isn’t included in the training sets you have. More often you use some statistics to have the computer pick the features most likely to be useful. Returning to our micro-loan example: some candidate features could be loan duration, total amount, whether the recipient has a cellphone, marital status, or their race.

These two questions – training data and training features – are central to any machine learning project.

Algorithms are mirrors

Let’s return to this question of language with this in mind.. perhaps a more useful term for “machine learning” would be “machine teaching”. This would put the responsibility where it lies, on the teacher. If you’re doing “machine learning”, you’re most interested in what it is learning to do. With “machine teaching”, you’re most interested in what you are teaching a machine to do. That’s a subtle difference in leanguage, but a big difference in understanding.

Putting the responsibility on the teacher helps us realize how tricky this process is. Remember this list of biases examples I started with? That sentencing algorithm is discriminatory because it was taught with sentencing data for the US court system, which data shows is vey forgiving to everyone except black men. That translation algorithm that bakes in gender stereotypes was probably taught with data from the news or literature, which we known bakes in our-of-date gender roles and norms (ie. Doctors are “he”, while nurses are “she”).  That algorithm that surfaces fake stories on your feed is taught to share what lots of other people share, irrespective of accuracy.

All that data is about us.

Those algorithms aren’t biased, we are! Algorithms are mirrors.

They reflect the biases in our questions and our data. These biases get baked into machine learning pejects in both feature selection and training data. This is on us, not the computers.

Corrective lenses

So how do we detect and correct this? Teachers feel a responsibility for, and pride in, their students’ learning. Developers of machine learning models should feel a similar responsibility, and perhaps should be allowed to feel a similar pride.

I’m heartened by examples like Microsoft’s efforts to undo gender bias in publicly available language models (trying to solve the “doctors are men” problem). I love my colleague Joy Buolamwini’s efforts to reframe this as a question of “justice” in the social and technical intervention she calls the “Algorithmic Justice League” (video). ProPublica’s investigative reporting  is holding companies accountable for their discriminatory sentencing predictions. The amazing Zeynep Tufekci is leading the way in speaking and writing about the danger this poses to society at large. Cathy O’Neil’s Weapons of Math Destruction documents the myriad of implications for this, raising a warning flag for society at large. Fields like law are debating the implications of algorithm-driven decision making in public policy settings.  City ordinances are started to tackle the question of how to legislate against some of the effects I’ve described.

These efforts can hopefully serve as “corrective lenses” for these algorithmic mirrors – addressing the troubling aspects we see in our own reflections. The key here is to remember that it is up to us to do something about this. Determining a decision with an algorithm doesn’t automatically make it reliable and trustworthy; just like quantifying something with data doesn’t automatically make it true. We need to look at our own reflections in these algorithmic mirrors and make sure we see the future we want to see.

You don’t need complicated software to learn how to work with data

Most data trainings are focused on computer-based tools. Excel tutorials, Tableau trainings, database intros – these all talk about working with data as a question of learning the right technology. I’m here to argue against that. Building your capacity to work with data can be done without becoming a “magician” in some software tool.

Data literacy is not the same as computer literacy. This is an important distinction, because there are lots of people that are intimidated by computer technologies; but many of them are otherwise ready and excited to work with data. In my workshops with non-profits I find that this technological focus excludes far too many people.  Defining data literacy in technological terms doesn’t welcome those people to learn.

To support this argument, let me start by describing what I mean by the skills needed to work with data. In my workshops we focuses on:

  • Asking good questions
  • Acquiring the right data to work with
  • Finding the data story you want to tell
  • Picking the right technique to tell that story
  • Trying it out to see if your audience understands your story

With Catherine D’Ignazio, I’ve been creating hands-on, participatory, arts-based activities to support each of these. Some involve simple web-based tools, but none are about mastering those tools as the skill to learn. They treat the technology as a one-button means to an end. The activity is designed to work the muscle.

Curious about how those work? If you want to learn how to start working with a set of data to ask good questions, use our WTFcsv activity. Struggling to learn about the types of stories you can find in data?  Try our data sculptures activity to quickly build some mental scaffolding you can use.

Those are two quick examples. Here’s a sketch of all the activities we are building out and how they fit into the process I just described:

DataBasic_activity_diagram_pdf__1_page_.png

Some of these are old, and well documented on DataBasic.io; others are new and lightly sketched out on my Data Therapy Activities page; the rest are still nascent. We’re trying to build a road for many more people to learn to “speak” data, before they even touch tools like Excel or Tableau. These activities support this alternate entry point to data literacy; one that is fun and engaging to everyone!

Don’t get me wrong – there is certainly a place for learning how to use these amazing software tools. My point is that technology isn’t the only way to build data literacy.

You don’t need to be a computer whiz to work with data; you can exercise the muscles required with hands-on arts-based activities. We’re trying to build and document an evidence base demonstrating how the muscles you develop for working with data outside of computers easily transfer to computer based tools. Stay tuned for future blog posts that summarize that evidence…

Fight the Quick Chart Buttons

I despise the “quick chart” buttons. This post explains why, and tries to help you go from making charts to telling stories.

Here’s an example of the quick chart buttons in Excel:

CreateHorizontalBarChart_png_733×514_pixels.png
Excel’s list of chart buttons doesn’t help you pick the right chart to show your data.  Caveat: newer versions try to help with a “Recommended Charts” option.

Most of our chart-making tools don’t help us pick the best chart to tell our data story, and this is a big problem for chart makers. They just offer up a set of options to let you quickly make a chart. That doesn’t help you put together a data story! We just end up with lots of bar charts and line charts 😦

I love chart picker guides like the PolicyViz’s Graphic Continuum, Abela’s Chart Suggestions, and the FT’s Visual Vocabulary.  These guides reframe the question of picking a chart as a question of identifying your story. That is a crucial distinction.

The visual depiction of information in a chart is an editorial process, not some objective representation of the data. The visual mapping of the data onto shape, color, position, and size are all subjective choices you should be making make. These should be conscious decisions, not at the mercy of some tytranical default button. The result of all these decisions should be a chart that is closer to a story then simple raw data.

Look at the difference between these two charts for an example:

compare-charts.png
Same data; different story.

The chart on the left might tell a story about Dragon Fruit underselling as compared to other fruits.  The chart on the right might tell a story about apples being a dominant player in the market that needs to be fought.  These are two very different stories; and all I did was change the color of one bar!

The key question is: what is your story? what chart can help you tell that story?

Anyway, back to the quick chart buttons. They don’t help you pick which chart to make! Bar charts are good for showing comparisons between a few categories within a dataset. What about when you want to show changes over time (line chart)? Or a distribution of two variables (scatter plot)?  Or the promotional share of one category compared to the total (pie chart)?

Different stories demand different charts.  So next time you’re putting a chart together, start by thinking about the type of data story you’re trying to tell. Then use a guide to find the right chart to show it. Don’t be seduced by the promised simplicity of the “quick chart” button!