You Don’t Need a Data Scientist, You Need a Data Culture

Most of the larger non-profit organizations we work with are scrambling to figure out how to deploy complex technologies like machine learning and “AI” in service of the social good. These include inspiring examples that range from poverty alleviation, to home fire prevention, to self-harm risk reduction.  These stories have spread widely and have come to define what a data-centric organization should be doing – namely complicated data science.  However, if you’re an organization thinking about how to use data better, this is not where you should start.  You don’t need a data scientist, you need a data culture.

Catherine D’Ignazio and I have built the DataBasic.io tools to focus on helping people creatively build their data literacy.  As more and more organizations have started using them, we’ve been pushed to think more deeply about what it means to take this approach to building a data culture.  This post lays out our latest thinking abut the building a data culture, and how to overcome barriers you’re likely to run into.

The key problem we see is that organizations working for the social good don’t feel empowered to work with data in a variety of ways. This is a rank-and-file staff problem, not a data scientist problem. We’ve come to work on this in three ways:WFP_DMC_building_a_data_culture.png

Organizations don’t feel confident that they can work with data at all, so to build a data culture we prioritize building confidence through small, focused activities. The technology that they think they need to work with data is daunting, expensive, and requires technical expertise, so our approach focuses on approaches that don’t rely on complex technology.  Organizations don’t have a good process for starting to work with data, so we introduce a step-by-step approach with hands-on activities.

We’re trying to help here by creating the “Data Culture Project” – you can expect to hear more about that early next year.  This gives organizations a lightweight, self-service curriculum or video-facilitated activities.  We’re piloting that with 30 organizations right now, to learn from how they approach running these over three months within their organizations.

What is a “Data Culture”?

This phrase is becoming a bit of a buzz-word right now. So what does it mean? After lots of conversations, with organizations big and small, we’ve narrowed down to this list:

  • Leadership prioritizes and invests in data collection, management and analysis/knowledge production.
  • Leadership prioritizes creative data literacy for the whole organization, not just IT and Evaluation.
  • Staff are encouraged and supported to access, combine and derive insight from the organization’s data.
  • Staff recognize data when they see it. They offer creative ways to use the organization’s data to solve problems, make decisions and tell stories.

This four-part definition focuses on leadership and staff responsibility very intentionally.  You need buy in across the organization to really make this work. We also focus on making sure data doesn’t get siloed into one department or another. Working with data is a core skill that can be valuable across an organization.

Why Build a Data Culture?

Why bother with building a data culture?  Over the last 10 years we’ve seen a lot of data projects in our workshops and partners. These tend to cluster around three purposes.

WFP_DMC_building_a_data_culture.png

Data is most often used to improve operations;  doing things like measuring delivery performance, changing how it works, and them measuring it again to see if it improved.  One the last years we see more and more uses of data to spread a message, giving rise to infographics and other formats where data is used to show impact of programs.  Data is less-often used to bring people together, which is the focus of my work on arts-based hands-on activities, data murals, and more.  We think this third purpose is central to building a strong data culture across your organization.

 

Barriers to Building a Data Culture

Of course, like any organizational change, there are barriers. We’ve listed 6 that we think are useful to have in mind while thinking about any efforts you are taking to build a data culture.

Barrier #1: Confusion

Most introductions to data are confusing and overly technical.

Complicated words can alienate people that are just entering the field of working with data.  Pick your words carefully to welcome them.  For instance, you could introduce the idea of “correlation” by talking about “connections” between pieces of data that move together.

Piaget, the great educational psychologist, introduced us to the idea that people will absorb new information by “assimilating” it into their existing mindset, or change their mental model to “accommodate” it.  If you know people’s background you can make your outreach more effective. You have to understand their existing mental models if you want to introduce new information. Your goal is not to turn everyone in the organization into data scientists. A data culture means people recognize data and are able to pinpoint new opportunities for deriving knowledge and insight from it.

Tips:

  • Avoid technical jargon
  • Meet people where they are

Barrier #2: Not Knowing Your Data

Sometimes you don’t even know the data you have.

At a recent workshop we were talking with a medium-sized environmental advocacy group and they lamented not having any data about participation at recent public events.  I mentioned that I had seen photos on facebook, and how that was data they could use. They were surprised and had ignored this set of data, yet it contained exactly the data they wanted.

Remember that data can be qualitative or quantitative.  If your development director shares photos and a headcount from your last fundraiser, that’s all data. Be creative about recognizing the data you have already.

It is hard to keep track of datasets within your organization that might be related to each other.  Identify a person and a technology that can be a central clearinghouse for data.  This could be as simple as a Word document with a bulleted list, or as complex as a internal data portal.

Tips:

  • Keep your eyes and ears open
  • Build a data catalogue, or library

Barrier #3: Organizational Silos

People will fight efforts to work across silos.

We were working with a large nonprofit to build a data culture across their organization, but they were stymied by people that thought they owned the data, and were hoarding it from others as a form of job security.  The only way we found to work on it was risky – to sneakily use it and then credit its successful use to the owner retroactively.  It helped, but we can do better than that.

Most organizations suffer from these silos – independent functional units that take pains to control a slice of the overall work. You have to acknowledge these walls in order to break them down.

When you have an example of a data-centric project that cuts across existing silos, hold it up as an example to success.  This is an opportunity to have leadership show buy-in and backing for a cross-sectional approach to data.

Tips:

  • Acknowledge your weaknesses
  • Highlight successes

Barrier #4: IT-Centric Thinking

Data gets locked away in the IT department.

Over and over we hear from organizations where IT is running Tableau trainings regularly and they just can’t understand why people aren’t adopting the tool and approach.  I’m like a broken record telling them that you need to separate the tool and the process – the tool training can be owned by IT, but the process training doesn’t need to be.

You need to make sure people don’t have to go to IT to pull out the latest numbers they need. Building a data culture means making sure every part of your organization can use data, for a variety of reasons.

Just because IT owns the data technology, it doesn’t mean they should own the process of creating a data culture.  Building this capacity is better housed across multiple departments, or within the office of a Chief Data Scientist.  That can lead to invitations to build data capacity that are more fun that just boring spreadsheet trainings.

Tips:

  • Data is for everyone
  • Create more invitations to work with data

Barrier #5: Irrelevance

Staff don’t connect to many high-level data dashboards.

High-level data summaries are great for leadership, but staff can’t always connect to them.  You need to integrate data into their day-to-day operations.  You can try ideas like mainstreaming quarterly data-reports from each department, or attaching data outcomes to program reviews. If staff don’t understand and the utility and use of data they are collecting, it just becomes boring homework they have to do. This hurts not only your data culture, but also the data quality!

Showing a number of summary of some data is great, but is just the start.  Asking “so what?” is when the real culture starts to emerge.  Actionable data can help you drive your organizational goal.  If people can’t answer the “so what” question, then they don’t have the right data. Engage staff in figuring out why the data they collect is useful; they are best positioned to answer the “so what?” question.

Tips:

  • KPIs aren’t for everyone
  • Remember to ask “So What?”

Barrier #6: Boredom

Data is seen as a boring chore.

Spreadsheet-driven activities are boring to the majority of people.  Use more fun activities, in novel settings, to bring a more creative approach to data. Make data sculptures in the lobby, or paint a data mural at your next retreat.  These approaches create multiple pathways into learning how to work with data.

Communicating in charts and graphs is the default for presentations.  However, these don’t tell a story.  Encourage your organization to put the data in context, and talk about impact, but focusing on how to tell a story with your data rather than just introducing how to do Pivot Tables. People like telling stories, and get interested and engaged in hearing them.

Tips:

  • Use creative data-centric activities
  • Tell stories with your data

Building Your Data Culture

Each organization is different.  Hopefully this high-level summary of some of our latest thinking helps inspire ideas what might work for you.  In future posts we’ll dig into more concrete ways to build a data culture, the motivations behind them, and how they are working for various partner organizations we work with.

This post is based on a presentation Catherine D’Ignazio and I gave to non-profit leaders convened by the Stanford Social Innovation Review. Thanks to Catherine D’Ignazio and Ethan Zuckerman for feedback and edits.

Fight the Quick Chart Buttons

I despise the “quick chart” buttons. This post explains why, and tries to help you go from making charts to telling stories.

Here’s an example of the quick chart buttons in Excel:

CreateHorizontalBarChart_png_733×514_pixels.png
Excel’s list of chart buttons doesn’t help you pick the right chart to show your data.  Caveat: newer versions try to help with a “Recommended Charts” option.

Most of our chart-making tools don’t help us pick the best chart to tell our data story, and this is a big problem for chart makers. They just offer up a set of options to let you quickly make a chart. That doesn’t help you put together a data story! We just end up with lots of bar charts and line charts 😦

I love chart picker guides like the PolicyViz’s Graphic Continuum, Abela’s Chart Suggestions, and the FT’s Visual Vocabulary.  These guides reframe the question of picking a chart as a question of identifying your story. That is a crucial distinction.

The visual depiction of information in a chart is an editorial process, not some objective representation of the data. The visual mapping of the data onto shape, color, position, and size are all subjective choices you should be making make. These should be conscious decisions, not at the mercy of some tytranical default button. The result of all these decisions should be a chart that is closer to a story then simple raw data.

Look at the difference between these two charts for an example:

compare-charts.png
Same data; different story.

The chart on the left might tell a story about Dragon Fruit underselling as compared to other fruits.  The chart on the right might tell a story about apples being a dominant player in the market that needs to be fought.  These are two very different stories; and all I did was change the color of one bar!

The key question is: what is your story? what chart can help you tell that story?

Anyway, back to the quick chart buttons. They don’t help you pick which chart to make! Bar charts are good for showing comparisons between a few categories within a dataset. What about when you want to show changes over time (line chart)? Or a distribution of two variables (scatter plot)?  Or the promotional share of one category compared to the total (pie chart)?

Different stories demand different charts.  So next time you’re putting a chart together, start by thinking about the type of data story you’re trying to tell. Then use a guide to find the right chart to show it. Don’t be seduced by the promised simplicity of the “quick chart” button!

Approaches to Teaching Data for Non-Profits

Recently The National Neighborhood Indicators Partnership and Microsoft Civic Technology Engagement Group launched a project to expand training on data and technology to improve communities.  I’m pleased they’ve included Data Therapy as one of the resources they highlight to help you think about building your data culture.  Check out their training guide and their catalog of resources!

training_pic

On a related note, if you are someone that does a lot of training and capacity building, or an organization that wants to be doing that, checkout the podcast and recording of a conversation about enabling learning with School of Data.

Making Tools More Learner-Friendly

I often advise learners to be careful with what tools they choose to spend time learning.  Some powerful ones have steep learning curves, full of jargon and technical hurdles.  Others are simple and self-explanatory, but can’t do more than one thing.  I’ve been trying to find better ways to connect with tool builders and talk to them about how they need to build learner-centered tools.

Catherine D’Ignazio and I put these thoughts together into a talk for OpenVisConf this year.  This is a super-dorky conference for data viz professionals… just the place to find more tool builders to talk to!  We put together an argument that data visualization tool as informal learning spaces.  Watch the video below:

New DataBasic Tool Lets You “Connect the Dots” in Data

Catherine and I have launched a new DataBasic tool and activity, Connect the Dots, aimed at helping students and educators see how their data is connected with a visual network diagram.

By showing the relationships between things, networks are useful for finding answers that aren’t readily apparent through spreadsheet data alone. To that end, we’ve built Connect the Dots to help teach how analyzing the connections between the “dots” in data is a fundamentally different approach to understanding it.

The new tool gives users a network diagram to reveal links as well as a high level report about what the network looks like. Using network analysis helped Google revolutionize search technology and was used by journalists who investigated the connections between people and banks during the Panama Papers Leak.

Connect the Dots is the fourth and most recent addition to DataBasic, a growing suite of easy-to-use web tools designed to make data analysis and storytelling more accessible to a general and non-technical audience launched last year.

As with the previous three tools released in the DataBasic suite, Connect the Dots was designed so that its lessons can be easily planned to help students learn how to use data to tell a story. Connect the Dots comes with a learning guide and introductory video made for classes and workshops for participants from middle school through higher education. The learning guide has a 45-minute activity that walks people through an exercise in naming their favorite local restaurants and seeking patterns in the networks that result. To get started using the tool, sample data sets such as Donald Trump’s inside connections and characters from the play Les Miserables have also been included to help introduce users to vocabulary terms and the algorithms at work behind the scenes. Like the other DataBasic tools, Connect the Dots is available in English, Portuguese, and Spanish.

Learn more about Connect the Dots and all the DataBasic tools here.

Have you used DataBasic tools in your classroom, organization, or personal projects? If so, we’d love to hear your story! Write to help@databasic.io and tell us about your experience.

Telling Your Story Well

I just hosted a workshop today at the Stanford Do Good Data / Data on Purpose “from Possibilities to Responsibilities” event.  My workshop, called “Telling Your Story Well”, focused on how to flesh out your audience and goals well so that you can pick a presentation technique that is effective.  We did some hands-on exercises to practice using those as criteria for telling your story well.

One key takeway is the reminder to know your audience and your goals before deciding how to tell your data-driven story.

Folks dove into the activity we did – remixing an infographic to target a specific audience and an achievable change.

FullSizeRender.jpg

For example, here’s a sketch of one group’s idea of an interactive data sculpture that dumps stuff on you based on how much water your purchases at a grocery store took to generate!

img_5295

Creating Ethical Algorithms – Data on Purpose Live Blog

This is a live-blog from the Stanford Data on Purpose / Do Good Data “From Possibilities to Responsibilities” event. This is a summary of what the speakers at the talked about, captured by Rahul Bhargava and Catherine D’Ignazio. Any omissions or errors are likely my fault.

Human-Centered Data Science for Good: Creating Ethical Algorithms

Zara Rahman works at both Data & Society and the Engine Room, where she helps co-ordinate the Responsible Data Forum series of events. Jake Porway founded and runs DataKind.

Jake notes this is the buzzkill session about algorithms. He wants us all to walk away being able to critically assess algorithms.

How do Algorithms Touch our Lives?

They invite the audience to sketch out their interactions with digital technologies over the last 24 hours on a piece of paper. Stick figures and word totally ok. One participant drew a clock, noting happy and sad moments with little faces. Uber and AirBnb got happy faces next to them. Trying to connect to the internet in the venue got a sad face.  Here’s my drawing.

Next they ask where people were influenced by algorithms. One participant shares the flood warning we all received on our phones. Another mentioned a bot in their Slack channel that queued up a task. Someone else mentions how news that happened yesterday filtered down to him; for instance Hans Rosling’s death made it to him via social channels much more quickly than via technology channels. Someone else mentioned how their heating had turned on automatically based on the temperature.

What is an Algorithm?

Jake shares that the wikipedia-esque definition is pretty boring. “A set of rules that precisely deinfes a sequence of operations”. These examples we just heard demonstrate the reality of this. These are automated and do things on their own, like Netflix’s recommendation algorithm. The goal is to break down how these operate, and figure out how to intervene in what drives these thinking machines. Zara reminds us that even if you see the source code, that doesn’t help really understand it. We usually just see the output.

Algorithms have some kind of goal they are trying to get to. It takes actions to get there. For Netflix, the algorithm is trying to get you to watch more movies; while the actions are about showing you movies you are likely to want to watch. It tries to show you movies you might like; there is no incentive to show you a movie that might challenge you.

Algorithms use data to inform their decisions. In Netflix, the data input is what you have watched before, and what other people have been watching. There is also a feedback loop, based on how it is doing. It needs some way to figure out it is doing a good thing – did you click the movie, how much of it did you watch, how many star did you give it. We can speculate about what those measurements are, but we have no way of knowing their metrics.

A participant asks about how Netflix is probably also nudging her towards content they have produced, since that is cheaper for them. The underlying business model can drive these algorithms. Zara responds that this idea that the algorithm operates “for your benefit” is very subjective. Jake notes that we can be very critical about their goal state.

Another participant notes that there are civic benefits; in how Facebook can influence how many people are voting.

The definition is tricky, notes someone else, because anything that runs automatically could be called an algorithm. Jake and Zara are focused in on data-driven algorithms. They use information about you and learning to correct themselves. The purest definition and how the word is used in media are very different. Data science, machine learning, artificial intelligence – these are all squishy terms that are evolving.

Critiquing Algorithms

They suggest looking at Twitter’s “Who to follow” feature. Participants break into small groups for 10 minutes to ask questions about this algorithm. Here are the questions and some responses that groups shared after chatting:

  • What is the algorithm trying to get you to do?
    • They want to grow their user base, and then shifted to growing ad dollars
    • Showing global coverage, to show they are the network to be in
    • People name some unintended consequences like political polarization
  • What activities does it use to do that?
  • What data drives these decisions?
    • Can you pay for these positions? There could be an agreement based on what you are looking at and what Twitter recommends
  • What data does it use to evaluate if it is successful?
    • It can track your hovers, clicks, etc. both on the recommendation and adds later on
    • If you don’t click to follow somewhere that could be just as much signal
    • They might track the life of your relationship with this person (who you follow later because you followed their recommendation, etc)
  • Who has the power to influence these answers?

A participant notes that there were lots of secondary outcomes, which affected other people’s products based on their data. Folks note that the API opens up possibilities for democratic use and use for social good. Others note that Twitter data is highly expensive and not accessible to non-profits. Jake notes problems with doing research with Twitter data obtained through strange and mutant methods. Another participant notes they talked about discovering books to read and other things via Twitter. These reinforced their world views. Zara notes that these algorithms reinforce the voices that we hear (by gender, etc). Jake notes that Filter Bubble argument, that these algorithms reinforce our views. Most of the features they bake in are positive ones, not negative.

But who has the power the change these things? Not just on twitter, but health-care recommendations, Google, etc. One participant notes that in human interactions they are honest and open, but online he lies constantly. He doesn’t trust the medium, so he feeds it garbage on purpose. This matches his experiences in impoverished communities, where destruction is a key/only power. Someone else notes that the user can take action.

A participant asks what the legal or ethical standards should be. Someone responds that in non-profits the regulation comes from self-regulation and collective pressure. Zara notes that Twitter is worth nothing without it’s users.

Conclusion

Jake notes that we didn’t talk about it directly, but the ethical issues come up in relation to all these questions. These systems aren’t neutral.

UN Data Forum: Data Journalism (live blog)

This is a liveblog written by Rahul Bhargava at the 2017 UN World Data Forum.  This serves as a summary of what the speakers spoke about, not an exact recording.  With that in mind, any errors or omissions are likely my fault, not the speakers. This was a virtual session, with all the speakers calling in via video.

Introductions 

John Bailer: New & Numbers is an old idea.  Cohn’s book targeted journalists to hep them communicate to a broader community. Alberto Cairo’s Truthful Art book is a more recent example of this.  John runs a Stats & Stories podcast to explore these questions as well.

Trevor Butterworth: Trevor is an Irish journalist with a background in the arts. He wrote for major publications as a freelancer about cultural issues, back when this was called “computer-assisted reporting”.

Rebecca Goldin: Trained as a mathematician, Rebecca worked as a professor of mathematics.  She reconnected to lok at how people talked about numbers and statistics.  Now she supports educational needs of journalists, and how people think and communicate about statistics.

Brian Tarran: A journalist by training, Brian received no training on numbers. He ended up working with the Royal Statistics Society and that’s how he ended up working on stats.

David Spiegelhaler: Coming from a mathematician and medical statisticians, he is now a Professor for the public understanding of risk.  His job is to do outreach to the press and public. David does statistical communication, focused on risk. Number are used to persuade people, so we need to do this better to inform people better to think slowly about a problem (instead of manipulating their emotions).

Idrees Kahloon: Idrees is a praticing data journalist at the Economist, having studied mathematics and statistics. At the Economist he works on building statistical models.

How to make sure what you’re doing will work with statistics?

Idrees: Runs into this quite a bit, sitting between academics and journalists. This means applying rigorous methods, but on a deadline.  Its hard to explain a logistical regression to the lay audience. You have to be statistically sound, but also explainable. The challenge is to straddle this boundary.

David: Influenced by the risk communication field, but there is no easy answer there.  So you decide what you want to do, and then test if it is working the way you want. Use basic visual best practices, and then the crucial thing is to test the materials. Evaluate it.

Brian: At Significance Magazine, a membership/outreach magazine, the goal is to bring people into statistics. There are guidelines to follow, around engagement and ease of reading. The goal is to encourage authors to draw analogies to things they understand.  One example is in an upcoming issue about paleo-climatology; focusing on climate proxies in recent history. The author explains this by comparing it to how Netflix creates recommendations to users. That kind of metaphor is the best way to get these things across.

Rebecca: As David hinted at, you have to know your audience. The first step is to understand who it is you are writing for, and what is their background. So perhaps instead of logistic regression, you might need to focus on explaining the outcome (ie. not the process). With journalists in a workshop, the main challenge for them is around understanding how to express uncertainty.  This is the greatest challenge that people face.  Pictures and stories are often the best techniques here, rather than technical language

Trevor: Our statistical understanding is very nascent. To build a better foundation, surveying journalists helps you understand what journalists do and do not know about science and statistics.  Journalists assume researchers know how to design a study and analyze results. You have to understand that isn’t necessarily the case. You have to ask basic questions about study design, data collection, and data analysis techniques.  One of the goals is to build a network of statisticians to help journalists do this.  So a parallel project is to help researchers understand these statistical concepts.

Examples of successful and/or unsuccessful communication? and why?

Trevor: Science USA created this network of statisticians at academic institutions around the US, and journalists are using this online widget to ask them questions.  That interaction is a great success to build on. Science that supports a policy is taken up by various constituencies, and filtered by values. When studies turn out to be poorly done, communicating that gets really hard. People who have adopted knowledge to promote it are not equipped to make judgements about what process of technique was wrong. So they try to shoot you down, from ad-homnym point of view. In the US talking about policy with evidence without becoming tribal has become too hard. So the question of “is this a good study” gets lost very quickly, replaced by a partisan/political interpretation of who you are, and your motives for critiquing a study.

Rebecca: When a journalist does have more than an hour to sort through a concept is when we have an opportunity for great success. For example, Rebecca worked with a journalist looking at false-postivies vs. false-negatives. The journalist created a graphic that ended up on 538.  The conversation helped her clarify what the mathematics would tell her.  Some failures involve when you’re speaking with a journalist that just can’t wrap their head around an idea.  When they can’t slow down enough to understand something like an inference. This is difference between writing about a certainty (which journalists want to do) and a quanitifed uncertainty. Other times the mathematics are just knowledge disconnects, like explaining a confidence interval without the listener understanding what an interval is. There are lots of requests coming in, which points to a shortage of people with these skills in the newsroom. So lots of people are recognizing this need.

Brian: The expertise didn’t exist in the newsroom 15 years ago.  In his first year, Brian wrote about councils surveying citizens about an issue. This ended up putting citizens and council at odds, because the journalists couldn’t explain what the survey told them, or better ways to do this. We just did a terrible job of explaining the fundamentals in a way that could generate bridges between people. For a success, in magazine for it is too hard to convey the details to help people do statistics themselves.  We need to show people how to think like a statistician.  This is about a process, and questions you ask.  There is an new column called “Ask a Statistician” which tries to get at this directly. Hopefully over time this will build to something great.

David: One success is keeping certain stories out of the news that don’t have good science behind them.   Another one is the translation of relative risk to absolute risk.  If there is a change in risk, you need to show the baseline risk. There was a story about eating a bacon sandwich, how risk of some disease increased it. The morning story was terrible, but in the evening after much promotion the story was told correctly, indicating this would only increase 1 out of 100 cases. Even thought the BBC training introduces this, the journalists cannot do it on their own. Another reported how a study said sex was decreasing in the UK, due to phones and technology. David made a joke about this being due to Game of Thrones, but a journlists didn’t get the joke and wrote up the headling “no sex by 2030 due to Game of Thrones”. This is the danger of clickbait, produced by secondary outlets republishing with a crazy headline.

Idrees: The polls in the last year is a great example of both how to do it well and poorly. There were many models in the US about the election outcome, where some set out what the uncertainty was (like 538 giving Trump a 30% chance of winning), but others did not (like the Princeton election commission). Some think it is ok to just report marginal error, and ignore if the sample is good.  Idrees shares a paper about 50,000 tweets about the death of Joe Cox.  To test this they gathered a population of tweets, sampled it, and measure how many were celebratory.  Their data shows this was an order of magnitude less.

Q&A

Responding to David and Rebecca’s comments, we’ve found that we need to separate percentages and chance. Has anyone come across guidelines about how to describe change? A lot believe you should do it in terms of “1 in 100” type language.

David: This is a disputed area. Using words like “probability” and “chance”, so people use an expected frequency – “of 100 people like you, 5 would have it”.  This is slightly better than “1 in 100” language. There is always metaphor and analogy involved. Using a phrase also depends on the imagery and appropriateness for the audience

Rebecca: When talking about 1 having something, and 99 not having something, you have to say “of people like you”.  This is a critical piece that stops people from arguing against these types of descriptions. You must express what the denominator is… precisely who we are talking about. Visual depictions can help this a lot.  Also comparing risks or frequencies can help. How does each option effect your risks and outcomes.  It is important to pair these.

For Trevor and Rebecca, who have been training journalists: what is the most important single skill for reporters to better work with data?

Trevor: To be pessimistic, most journalists can’t visualize the concepts in statistics.  Especially for probability, uncertainty, and distributions. You have to start with design of the data gathering effort. This leads to a certain approach of doing reporting. The best thing to do is to bring journalists and statisticians together.

Rebecca: In terms of basic numeracy, the most important thing is understanding absolute vs. relative risk.  They understand proportion and percentages, so they could understand this distinction ins a short amount of time.  So many studies do this now, and people know how to interpret it. The intuition is there. This is attainable.

Brian: Read the Tiger that Isn’t book. If everyone read it and appreciated the ways numbers could be misinterpreted, this would improve things a lot.

Idrees: The idea of being able to understand a distribution of outcomes. This is about getting across an expected value and a bell curve.  This is all tangled together though, so it is hard to understand one bit and not another.  Hard to see one silver bullet.

David: To agree with Rebecca, changing relative to absolute risk is vital.  Then doing it in whole numbers, and so on. Journalists are intelligent; they are used to critiquing and their intuition is good.  They often lack to confidence to go with their intuition when data comes in. They should go with their guts.

John: Look at some of the questions in the News & Numbers book mentioned earlier.

A key theme here has been about counting people who aren’t usually counted.  What alternative data sources do you use to capture and explain these populations.

David: Using mobile phone data is probably one piece of the discussion that is relevant.

John: The census in US tried to enumerate populations like homelessness with formal study design… like looking at a proxy of people receiving services related to their status.  Probably the audience is better informed than the panel.

A few years ago, we found that in 40% of journals data was incorrectly presented graphically.  We have to start really young to get people’s brains to start working differently. This goes beyond numeracy.

David: the Teaching Probability book is aimed at 10 to 15 year old.  It uses the metaphor of expected frequency as a basis.  If you do that it leads to probability.  Converting relative to absolute risk is included in this, based on the idea of what does this mean for 100 people.  In the UK probability has been taken out of the primary school curriculum. Recent psychological research says statistical literacy underlies general decision making skills; it is crucial.

Trevor: The kind of information literacy we teach children is quite poor. Cultural change is possible. The News & Numbers book, despite nailing the problems, had little effect on the culture of journalism. New outlets like Wonk blogUpshot, 528, Vox and others say cultural change around the importance of data is happening. There is a danger o naivete, suggesting the wrong idea that we don’t need statistics anymore because we have big data.

John: We need to be training the trainer, the help the teachers to be equipped to communicate these ideas.

Brian: At their local school they discuss improving the teaching of mathematics, but none of the teachers are confident enough to do this.  They need more confidence. People are too willing to accept the idea that you’re “bad at math”; we need to break that down.

Closing Remarks

Rebecca: The takeaway is to tell a story.  Veer a little from the technical truth to try and tell a story that frames the information in a way that is non-technical. Don’t be scared to say something a little bit incorrectly, to better convey what you want to say.  People will remember better what you say, and become more curious.

Idrees: Data journalism is kind of a new thing, so we will have wrinkles. If you write to an editor about something that is egregious, they actually listen.

Brian: We want to be telling a story, like a feature article not an academic paper. Tell a story the way you want to be told a story. Present your work in that way, with a story structure that feels good.

Trevor:Statistical should not be dry; try to have a real conversation.  Numbers don’t speak for themselves.  Also, recognize the limits of your own background. Think like a designer that communicates knowledge. The name of the game is collaboration.

David: Respect a journalistic approach. That means working with them, but at a minimum it means working out the crucial points, develop a story, and try it out with people.

John: This has been an outstanding conversation.