Workshop: Communicating Impact in the Arts

I just had the pleasure of co-presenting a workshop for the National Guild for Community Arts Education with their Boston Ambassador, Kathe Swaback of Raw Art Works.  We focused on inspiring arts organizations to use their data to demonstrate their impact in creative ways.  The presentation I used is hosted on


I shared some powerful examples and helped them talk to each other about the challenges and successes in their organizations.

One challenge in our conversations was getting from mission, to outcomes, to ways to measure those outcomes and evaluate impact.  We took the approach of inspiring folks with ways they could communicate those data-stories once they had the data, rather than getting mired down in their individual outcome-identification processes.  The Guild is creating separate programs to help them do that, so I didn’t feel bad about taking this jump.

We practiced using different types of data presentation techniques using an excerpt from the MuralsArts PorchLight evaluation done by the Yale School of Medicine.  After scanning the handout, I assigned each small group a technique to use.

They came up with amazingly creative ways to tell the impact stories they saw in the data.  Everything from expressive data dancing, to participatory interviews where people move to answer questions!  I look forward to seeing how these organizations can adopt and try out some of these techniques.


Civic Visualization: Student Sketches

I just wrapped up teaching a 3-week, 5 session module for MIT undergraduates on Data Scraping and Civic Visualizations (read more posts about it).  As their final project I asked students to use some Boston-centric data to sketch a civic visualization.  Here’s a quick overview of their final projects, which I think are a wonderful example of the diversity of things folks can produce in a short amount of time.  Remember, these are sketches the students produced as their final projects… consider them prototypes and works-in-progress.  I think you’ll agree they did amazing work in such a short amount of time!

1.5 Million Hubway Trips


Ben Eysenbach and Yunjie Li dove into the Hubway bicycle sharing data release.  They wanted to understand how people perceive biking and help planners and bike riders make smart decisions to support the urban biking system. Ben and Yunjie found that Google bicycle time estimates are significantly off for female riders, and built some novel warped maps to show distances as-the-bike-rides across the city.  See more on their project website.

The Democratic Debate on Tumblr


Alyssa Smith, Claire Zhang, and Karliegh Moore collected and analyzed Tumbler posts about the first 2015 Democratic presidential debate.  They wanted to help campaigns understand how to use Tumblr as a social media platform, and delve into how tags are used as comments vs. classification.  Alyssa, Claire and Karliegh found Bernie Sanders, Hillary Clinton, and Donald Trump were the most discussed, with a heavy negative light on Trump.

Crime and Income Rates in Boston


Arinze Okeke, Benjamin Reynolds and Christopher Rogers explored data sets about crime and income in Boston from the city’s open data portal and the US Census.  They wanted to motivate people to think harder about income disparity and inform political debate to change policies to lower crime rates.  Arinze, Ben and Chris created a novel map and data sculpture to use as a discussion piece in a real-world setting, stacking pennies to represent income rates on top of a printed heatmap of crime data.

Should Our Children Be in Jail?


Andres Nater, Janelle Wellons and Lily Westort dug into data about children in actual prisons.  They wanted to argue to people that juveniles are being placed in prisons at an alarming rate in many states in the US.  Andres, Janelle and Lily created an inforgraphic that told a strong story about the impact of the cradle-to-prison pipeline.

Visualizing to *Find* Your Data Story

I consistently run across folks interested in visualizing a data set to reveal some compelling insight, or tell a strong story to support an argument.  However, the inevitably focus on the final product, rather than the process to get there.  People get stuck on the visual that tells their story, forgetting about the visuals that help them find their story.   The most important visualizations of your data are the ones that help you find and debug your story, not the final one you make to tell your story.  This is why I recommend Tableau Public as a great tool to learn, because its native language is the visual representation of your data.  Excel’s native language is the data in a tabular form, not the visuals that show that data.

Here are some other tools I introduce in the Data Scraping and Civic Visualization short course I teach here at MIT (CMS.622: Applying Media Technologies in Arts and Humanities).

  • Use word clouds to get a quick overview of your qualitative text data (try Tagxedo)
  • Tools Overview:find all of these on website
  • Use Quartz ChartBuilder to make clean and simple charts, without all the chartjunk
  • Use timelines to understand a story over time (try TimelineJS)
  • Experiment with more complicated charting techniques with Raw (a D3.js chart generator)
  • Make simple maps with Google Maps, analyze your data cartographically with CartoDB, or make your own with Leaflet.js
  • Test your story’s narrative quickly with a infographic generator like Infogram

Curios for more?  See own website for more tools that we have reviewed.

What You Should Use to Scrape and Clean Data

I am currently teaching a short module for a class at MIT called CMS.622: Applying Media Technologies in Arts and Humanities.  My module focuses on Data Scraping and Civic Visualization.  Here are a few of the tools I introduce related to scraping and cleaning.

Tools for Scraping Data

As much as possible, avoid writing code!  Many of these tools can help you avoid writing software to do the scraping.  There are constantly new tools being built, but I recommend these:

  • Copy/Paste: Never forget the awesome power of copy/paste! There are many times when an hour of copying and pasting will be faster than learning any sort of new tool!
  • Chrome Scraper Extension: This bare-bones plugin for the Chrome web browser gives you a right-click option to “scrape similar” and export in a number spreadsheet formats.
  • This is a radical re-thinking of how you scrape.  Point and click to train their scraper.  It’s buggy, but on many simple webpages it works well!
  • Jquery in the browser: Install the bookmarklet, and you can add the JQuery javascript library to any webpage you are viewing.  From there you can use a basic understanding of javascript and the Javascript console (in most browsers) to pull parts of a webpage into an array.
  • Software Development: If you are a coder, and the website you need to scrape has javascript and logins and such, then you might need to go this route (ugh).  If so, here are some example Jupyter notebooks that show how to use Requests and Beautiful Soup to scrape and parse a webpage.  If your source material is more complicated, try using Mechanize (or Watir if you want to do this in Ruby).

Tools for Cleaning Data

If you start with garbage, you end with garbage.  This is why clean data is such a big deal. I’ve written before about what clean data means to me, but here are some tools I introduce to help you clean your data:

  • Typing: Seriously.  If you don’t have much data to clean, just do it by hand.
  • Find/Replace: Again, I’m serious.  Don’t underestimate the power of 30 minutes of find/replace… it’s a lot easier than programing or using some tool.
  • Regular Expressions: Install a text editor like Sublime Text and you get the power of regular expressions (which I call “Super Find and Replace”).  It lets you define a pattern and find/replace it in any large document.  Sure the pattern definition is cryptic, but learning it is totally worth it (here’s an online playground).
  • Data Science Toolkit: This Swiss-army knife is a virtual machine you can install and use via APIs to do tons of data science things.  Go from address to lat/lng, quantify the sentiment of some text, pull the content from a webpage, extract people mentioned in text, and more.
  • CLIFF-CLAVIN: Our geo-parsing tool can identify places, people, and organizations mentioned in plain text.  You give it text and it spits out JSON, taking special effort to resolve the places to lat/lngs that makes sense.
  • Tabula: Need to extra a table from a PDF? Use Tabula to do it.  Try pdftables if you want to do the same in Python.  A web-based option is PDFTables (made by the ScraperWiki people).
  • OpenRefine: It has a little bit of a learning curve, but OpenRefine can handle large sets of data and do great things like cluster and eliminate typos.
  • Programming: If you must programming can help you clean data.  CSVKit is a handy set of libraries and command line tools for managing and changing CSV files.  Messytables can help you parse CSV files that aren’t quite formatted correctly.

I hope those are helpful in your data scraping and data cleaning adventures!

DataPop White Paper: Beyond Data Literacy

The Data-Pop Alliance recently released a “working draft” of a white-paper I co-authored: Beyond Data Literacy: Reinventing Community Engagement and Empowerment in the Age of Data.  The paper is a collaboration with folks there, and at Internews, and attempts to put the nascent term “data literacy” in historical context and project forward to future uses and the role of data in culture and community.  Data-Pop published some of the presentation on their blog.


The paper begin with some history – focusing on the anthropologist Claude Lévi-Strauss and his ideas about literacy being used as a weapon of those in power to ensure and educated work populace.  We move into an argument about “literacy in the age of data” being a better way to start asking questions that “data literacy”.  As I talk about often, we focus on how data should serve the purpose of greater social inclusion.  This requires a focus on the words we use to talk about this stuff (is. “information” or “data”?).  This is all built on a definition of data literacy that includes the “desire and ability to constructively engage in society through and about data”.

If you’re interested in some academic reading about the history and potential of this type of work, give it a read!  It will be especially relevant to those trying to craft policies or programs that support building people’s capacity to work with data to create change.

Big Data’s Empowerment Problem

Catherine D’Ignazio and I just presented a paper titled “Approaches to Big Data Literacy”  at the 2015 Bloomberg Data for Good Exchange 2015.  This is a write-up of the talk we gave to summarize the paper.

When we talk about data science for good, collaborating with organizations that work for the social good, we are immediately entered into a conversation about empowerment.  How can data science help these organizations empower their constituencies and create change in the world?  Catherine and I are educators, and strongly believe learning is about empowerment, so this area naturally appeals to us!  That’s why we wrote this paper for the Bloomberg Data for Good Exchange.

Data Literacy

We’ve been thinking and working a lot on data literacy, and how to help folks build their capacity to work with information to create social change.  We define “data literacy” as the ability to readwork withanalyze and argue with data.  So how do we help build data literacy in creative and fun ways?  One example is the activity we do around text analysis.  We introduce folks to a simple word-couting website and give them lyrics of popular musicians to analyze.  Over the course of half and hour folks poke at the data, looking for stories comparing word usage between artists.  Then they sketch a visual to share a story.

Photos of stories created by students showing the artist that talks about themselves the most, and the overlap in lyrics between Paul Simon and Kanye West.

Photos of stories created by students showing the artist that talks about themselves the most, and the overlap in lyrics between Paul Simon and Kanye West.

Another example are my Data Murals – where we help a community group find a story in their data, collaboratively design a visual to tell that story, and paint it as a community mural.

The Data Mural created by youth from Groundwork Somerville.

The Data Mural created by youth from Groundwork Somerville.

This stuff is fun, and makes learning to work with data accessible.  We focus on working with technical and non-technical audiences.  The technical folks have a lot to learn about how to use data to effect change, while the non-technical folks want to build their skills to use data in support of their mission.


However this work has been focused on small data sets… when we think about “big data literacy” we see some gaps in our definition and our work.  Here are four problems related to empowerment that we see in big data, related to our definition of data literacy:

  • lack of transparency: you can’t read the data if you don’t even know it exists
  • extractive collection: you can’t work with data if it isn’t available
  • technological complexity: you can’t analyze data unless you can overcome the technical challenges of big data
  • control of impact: you can’t argue for change with data unless you can effect that change

With these problems in mind, we decided we needed an expanded definition of “big data literacy”. This includes:

  • identifying when and where data is being collected
  • understanding the algorithmic manipulations
  • weighing the real and potential ethical impacts
Some extensions to define "Big Data Literacy".

Some extensions to our definition of data literacy , to support an idea of “Big Data Literacy”.

So how do we work on building this type of big data literacy?  First off we look to Freire for inspiration.  We could go on for hours about his approach to building literacy in Brazil, but want to focus on his “Population Education”.  That approach was about using literacy to do education and emancipation.  This second piece matters when you are doing data for good; it isn’t just about acquiring technical skills!


We want to work with you on how to address this empowerment problem, and have a few ideas of our own that we want to try out.  The paper has seven of these sketched out, but here are three examples.

Idea #1: Participatory Algorithmic Simulations

We want to create examples of participatory simulations for how algorithms function.  Imagine a linear search being demonstrated by lining people up and going from left to right searching for someone named “Anita”.  This would build on the rich tradition of moving your body to mimic and understand how a system functions (called “body syntonicity“).  Participatory algorithmic simulations would focus on understanding algorithmic manipulations.

Ideas #2: Data Journals

Data can bee seen as the traces of the interactions between you and the world around you.  With this definition in mind, in our classes we ask students to keep a journal of every piece of data they create during a 24 hour period (see some examples).  This activity targets identifying when and where data is being collected.  We facilitate a discussion about these journals, asking students which ones creep them out the most, which leads to a great chance to weigh the real and potential ethical implications.

Ideas #3: Reverse Engineering Algorithms:

We’ve seen a bunch of great work recently on reverse engineering algorithms, trying to understand why Amazon suggests certain products to you, or why you only see certain information on your Facebook.  We think there are ways to bring this research to the personal level by designing experiments individuals can run to speculate about how these algorithms work.  Building on Henry Jenkin’s idea of “Civic Imagination”, we could ask people to design how they would want the algorithms to work, and perhaps develop descriptive visual explanations of their own ideas.

Get Involved!

We think each of these three can help build big data literacy and try to address big data’s empowerment problem.  Read the paper for some other ideas.  Do you have other ideas or experiences we can learn from?  We’ll be working on some of these and look forward to collaborating!

Announcing DataBasic!

I’m happy to announce we received a grant from the Knight Foundation to work with Catherine D’Ignazio (from the Emerson Engagement Lab) on a new suite of tools called DataBasic!  Expect to see more here as we build out this suite of tools for Data Literacy learners over the fall.  Follow our progress over on


We propose to create a suite of focused and simple tools for journalists, data journalism classrooms and community advocacy groups. Though there are numerous data analysis and visualization tools for novices there are some significant gaps that we have identified through prior research. DataBasic is designed to fill these gaps for people who do not know how to code and provide a low barrier to further learning about data analysis for storytelling.

In the first iteration of this project we will build three tools, develop three training activities and run one workshop with journalists and students for feedback. The three tools include: (1) WTFcsv: A web application that takes as input a CSV file and returns a summary of the fields, their data type, their range, and basic descriptive statistics. This is a prettier version of R’s “summary” command and aids at the outset of the data analysis process. (2) WordCounter: A basic word counting tool that takes unstructured text as input and returns word frequency, bigrams (two-word phrases) and trigrams (three-word phrases) (3) TuffyDuff: A tool that runs TF-IDF algorithms on two or more corpora in order to compare which words occur with the most frequency and uniqueness.

Data, What is it Good For?

I recently led a short session at the inspiring Southern Poverty Law Center called “Using Data to Create Change: Real World Examples”.  Here is a short write-up of some of the examples I shared.

The hype around data has reached such heights that it is in danger of going into low-earth orbit! Being drenched in stories about the potentials for data to change your organization and your work, it is sometimes hard to pick apart the motivations and reasons to using data.  Unlike my blog title suggests, I’m not here to argue that data is good for “absolutely nothing”. I like to look at data as an asset for your organization, but focus in and talk about how it can help you in three concrete ways:

  • You can use data to improve internal operations
  • You can use data to spread the message
  • You can use data to bring people together

Here are four short stories to help pick these apart.  I live and work here in the US, so these case studies are all American.

Designing a Mural

Groundwork Somerville is a organization that works in my hometown here in Somerville, Massachusetts in the US.  One of their big projects involves reclaiming unused urban lots and helping youth build and maintain raised beds to grow vegetables.  They then sell these vegetables at cheap prices from a mobile market that visits multiple local sites weekly. For those of you in other countries, this is a big problem here in the US, where unhealthy food is generally far cheaper than healthy fresh food.

Created by Growndwork Somerville (August 2013)

Created by Growndwork Somerville (August 2013)

To build skills in their youth programs, share their work, argue for more support, and have fun, we worked with local youth to design and paint a Data Mural.  They looked at the urban landscape, quotes from youth in the program, public health data, and participation in the mobile market to craft a story and mural speak to the internal and externals impacts the program has.

We used this kind playful engagement of data to bring people together and spread the message.

Using Metrics to Drive Engagement 

Here I’m going to retell a story that is often pointed to, most succinctly in Beth Kanter’s Measuring the Networked Nonprofit.  This is the story of how online news site uses social media metrics and other data to move people up their ladder of engagement.  Grist tries to bring a light, playful, and new framing to issues that are important to folks who care about the environment. Folks that might not self-identify as “environmentalists” per say.


The ladder of engagement

Grist does deep dives into their web and social metrics to understand what is important to their readers from a short-tem and long-term point of view.  They try to respond to these interests with editorial decision-making and sometimes in near-realtime content generation. Grist uses a strong ladder of engagement to prompt people to engage and own the narratives of stories about environmental issues, knowing that that will make them more likely to act to solve problems.

This attention to metrics and constant checks of their ladder of engagement is a great example of using data to improve internal operations and spread the message.  Read more about this in the book Measuring the Networked Nonprofit (by Kanter and Paine).

Creating Insights and Action

Their third story I want to share is about a small company in Detroit called LoveLand Technologies.  Over the last few years Detroit has been a city in crisis, recording record foreclosure rates, stuck with barely functioning public utilities, and having to file for bankruptcy protection.  In this context LoveLand stared making some simple maps of property in tax-related distress and foreclosure.  These were maps of people losing their homes.

The LoveLand map of foreclosures in Detroit (circa 2014)

The LoveLand map of foreclosures in Detroit (circa 2014)

Before they knew it, their maps were being used in a variety of unforeseen ways. Government officials were relying on them as the data source of record.  Churches were using them to raise funds for their neighbors in need.  Folks with deep pockets were ready to give them money to do even more work around urban blight in the city.

Their data was being used to improve internal operations, spread the message, and bring people together!  If you want to learn more read Ethan Zuckermen’s liveblog of a talk Mike Evans did recently at the MIT Center for Civic Media.

Guiding Program Decisions

My last story is the most high tech. It comes from DataKind, and organization that pairs data scientists with nonprofits to think through and implement projects focused on data analysis.  GiveDirectly started working with DataKind to get help targeting their unconditional cash transfers to those the money could help the most.  They’re a very data-centric organization already, so working with DataKind volunteers on some advanced topics just made sense!


A screenshot of their UI identifying roof types from satellite images (from the DataKind blog)

Data scientists Kush Varshney and Brain Abelson worked with GiveDirectly to understand how satellite imagery could be analyzed by computers to identify areas where aid funds would best be directed.  Based on the existing research showed a strong correlation between a villages wealth and the number of iron (vs. thatch) roofs, they created an algorithm that attempts to count iron and thatch roofs in satellite imagery. It is important to note that it doesn’t quite work yet, but it is important to think about novel applications for data mining that can create new types of data to help your work. Hopefully they can continue to tune the algorithm to improve their results and turn into a useful tool.

This analysis and tool building is trying to improve internal operations so GiveDirectly can do their work better.  Watch their technical talk to learn more.

Wrapping Up

There are just a handful of my favorite stories to illustrate the variety of ways you can use data to help you make change in the world.  Are their counter-examples illustrating the perils and pitfalls of using data in any of these ways.  Of course. I strive to highlight those stories just as often… but that’s a list for a different blog post!  I hope these four help you start to think about creative and new ways your organization might be able to turn all the data hype into something useful.

For reference, here’s a link to the presentation that went along with this talk:


Architectures for Data Use

This is a summary of one section of my workshop on Data Architectures at the SSIR Data on Purpose workshop.

Data can be used for a variety of things.  In thinking about setting up architectures for data use within your organization, you need to focus on two main questions:

  • Does the data we have align with our goals?
  • How can we use data to further our mission?

Alignment with Your Goals

People see data everywhere now, and get overly excited about it. When you think about using data within your organization, you have to return to the roots of what your organization is all about and make sure the data is in alignment with that.

There are a few common patterns organizations fall into when using data. First, many collect data simply because it is easy to collect, without considering whether and how it can be used.  Second, many tend to focus on quantitative over qualitative data, when in fact the strongest arguments are often made using both.  You have to understand what kind of data you have before you can use it effectively:Data_Architecures_Workshop___SSIR_Data_on_Purpose

All these types of data need to align with your goals.  You can use data in a wide variety of your efforts, from inspiring more activism to changing behavior.  The key piece is your use of data must support those activities.

Using Data to Further Your Mission

Your data is not an end in itself.  It is an asset you can use to do your work more effectively.


You can use data in lots of ways to further your mission.  Three quick examples:

  • improve operations: you can monitor engagement on social media campaigns
  • spread the message: you can use data in your communications materials to advocate for change in new ways
  • bring people together: you can gather around the data to find stories (and paint murals)

Of course there are loads of other things you can do as well. The key here is that This framing encourages you to be goal-centric, rather than technology-centric (which is a big danger when working with data). You don’t want to get lost in the hype around the latest and greatest tools. That approach does help you advance your mission. A beautiful external-facing infographic that doesn’t fit into your ladder of engagement, or includes no call to action, is useless.  A dashboard showing key indicators doesn’t mean much if they aren’t the right key indicators.

I hope this quick intro helps ground some of the hype out there around data use, and help you figure out what architectures to support for data use within your organization.