UN Data Forum: Data Journalism (live blog)

This is a liveblog written by Rahul Bhargava at the 2017 UN World Data Forum.  This serves as a summary of what the speakers spoke about, not an exact recording.  With that in mind, any errors or omissions are likely my fault, not the speakers. This was a virtual session, with all the speakers calling in via video.


John Bailer: New & Numbers is an old idea.  Cohn’s book targeted journalists to hep them communicate to a broader community. Alberto Cairo’s Truthful Art book is a more recent example of this.  John runs a Stats & Stories podcast to explore these questions as well.

Trevor Butterworth: Trevor is an Irish journalist with a background in the arts. He wrote for major publications as a freelancer about cultural issues, back when this was called “computer-assisted reporting”.

Rebecca Goldin: Trained as a mathematician, Rebecca worked as a professor of mathematics.  She reconnected to lok at how people talked about numbers and statistics.  Now she supports educational needs of journalists, and how people think and communicate about statistics.

Brian Tarran: A journalist by training, Brian received no training on numbers. He ended up working with the Royal Statistics Society and that’s how he ended up working on stats.

David Spiegelhaler: Coming from a mathematician and medical statisticians, he is now a Professor for the public understanding of risk.  His job is to do outreach to the press and public. David does statistical communication, focused on risk. Number are used to persuade people, so we need to do this better to inform people better to think slowly about a problem (instead of manipulating their emotions).

Idrees Kahloon: Idrees is a praticing data journalist at the Economist, having studied mathematics and statistics. At the Economist he works on building statistical models.

How to make sure what you’re doing will work with statistics?

Idrees: Runs into this quite a bit, sitting between academics and journalists. This means applying rigorous methods, but on a deadline.  Its hard to explain a logistical regression to the lay audience. You have to be statistically sound, but also explainable. The challenge is to straddle this boundary.

David: Influenced by the risk communication field, but there is no easy answer there.  So you decide what you want to do, and then test if it is working the way you want. Use basic visual best practices, and then the crucial thing is to test the materials. Evaluate it.

Brian: At Significance Magazine, a membership/outreach magazine, the goal is to bring people into statistics. There are guidelines to follow, around engagement and ease of reading. The goal is to encourage authors to draw analogies to things they understand.  One example is in an upcoming issue about paleo-climatology; focusing on climate proxies in recent history. The author explains this by comparing it to how Netflix creates recommendations to users. That kind of metaphor is the best way to get these things across.

Rebecca: As David hinted at, you have to know your audience. The first step is to understand who it is you are writing for, and what is their background. So perhaps instead of logistic regression, you might need to focus on explaining the outcome (ie. not the process). With journalists in a workshop, the main challenge for them is around understanding how to express uncertainty.  This is the greatest challenge that people face.  Pictures and stories are often the best techniques here, rather than technical language

Trevor: Our statistical understanding is very nascent. To build a better foundation, surveying journalists helps you understand what journalists do and do not know about science and statistics.  Journalists assume researchers know how to design a study and analyze results. You have to understand that isn’t necessarily the case. You have to ask basic questions about study design, data collection, and data analysis techniques.  One of the goals is to build a network of statisticians to help journalists do this.  So a parallel project is to help researchers understand these statistical concepts.

Examples of successful and/or unsuccessful communication? and why?

Trevor: Science USA created this network of statisticians at academic institutions around the US, and journalists are using this online widget to ask them questions.  That interaction is a great success to build on. Science that supports a policy is taken up by various constituencies, and filtered by values. When studies turn out to be poorly done, communicating that gets really hard. People who have adopted knowledge to promote it are not equipped to make judgements about what process of technique was wrong. So they try to shoot you down, from ad-homnym point of view. In the US talking about policy with evidence without becoming tribal has become too hard. So the question of “is this a good study” gets lost very quickly, replaced by a partisan/political interpretation of who you are, and your motives for critiquing a study.

Rebecca: When a journalist does have more than an hour to sort through a concept is when we have an opportunity for great success. For example, Rebecca worked with a journalist looking at false-postivies vs. false-negatives. The journalist created a graphic that ended up on 538.  The conversation helped her clarify what the mathematics would tell her.  Some failures involve when you’re speaking with a journalist that just can’t wrap their head around an idea.  When they can’t slow down enough to understand something like an inference. This is difference between writing about a certainty (which journalists want to do) and a quanitifed uncertainty. Other times the mathematics are just knowledge disconnects, like explaining a confidence interval without the listener understanding what an interval is. There are lots of requests coming in, which points to a shortage of people with these skills in the newsroom. So lots of people are recognizing this need.

Brian: The expertise didn’t exist in the newsroom 15 years ago.  In his first year, Brian wrote about councils surveying citizens about an issue. This ended up putting citizens and council at odds, because the journalists couldn’t explain what the survey told them, or better ways to do this. We just did a terrible job of explaining the fundamentals in a way that could generate bridges between people. For a success, in magazine for it is too hard to convey the details to help people do statistics themselves.  We need to show people how to think like a statistician.  This is about a process, and questions you ask.  There is an new column called “Ask a Statistician” which tries to get at this directly. Hopefully over time this will build to something great.

David: One success is keeping certain stories out of the news that don’t have good science behind them.   Another one is the translation of relative risk to absolute risk.  If there is a change in risk, you need to show the baseline risk. There was a story about eating a bacon sandwich, how risk of some disease increased it. The morning story was terrible, but in the evening after much promotion the story was told correctly, indicating this would only increase 1 out of 100 cases. Even thought the BBC training introduces this, the journalists cannot do it on their own. Another reported how a study said sex was decreasing in the UK, due to phones and technology. David made a joke about this being due to Game of Thrones, but a journlists didn’t get the joke and wrote up the headling “no sex by 2030 due to Game of Thrones”. This is the danger of clickbait, produced by secondary outlets republishing with a crazy headline.

Idrees: The polls in the last year is a great example of both how to do it well and poorly. There were many models in the US about the election outcome, where some set out what the uncertainty was (like 538 giving Trump a 30% chance of winning), but others did not (like the Princeton election commission). Some think it is ok to just report marginal error, and ignore if the sample is good.  Idrees shares a paper about 50,000 tweets about the death of Joe Cox.  To test this they gathered a population of tweets, sampled it, and measure how many were celebratory.  Their data shows this was an order of magnitude less.


Responding to David and Rebecca’s comments, we’ve found that we need to separate percentages and chance. Has anyone come across guidelines about how to describe change? A lot believe you should do it in terms of “1 in 100” type language.

David: This is a disputed area. Using words like “probability” and “chance”, so people use an expected frequency – “of 100 people like you, 5 would have it”.  This is slightly better than “1 in 100” language. There is always metaphor and analogy involved. Using a phrase also depends on the imagery and appropriateness for the audience

Rebecca: When talking about 1 having something, and 99 not having something, you have to say “of people like you”.  This is a critical piece that stops people from arguing against these types of descriptions. You must express what the denominator is… precisely who we are talking about. Visual depictions can help this a lot.  Also comparing risks or frequencies can help. How does each option effect your risks and outcomes.  It is important to pair these.

For Trevor and Rebecca, who have been training journalists: what is the most important single skill for reporters to better work with data?

Trevor: To be pessimistic, most journalists can’t visualize the concepts in statistics.  Especially for probability, uncertainty, and distributions. You have to start with design of the data gathering effort. This leads to a certain approach of doing reporting. The best thing to do is to bring journalists and statisticians together.

Rebecca: In terms of basic numeracy, the most important thing is understanding absolute vs. relative risk.  They understand proportion and percentages, so they could understand this distinction ins a short amount of time.  So many studies do this now, and people know how to interpret it. The intuition is there. This is attainable.

Brian: Read the Tiger that Isn’t book. If everyone read it and appreciated the ways numbers could be misinterpreted, this would improve things a lot.

Idrees: The idea of being able to understand a distribution of outcomes. This is about getting across an expected value and a bell curve.  This is all tangled together though, so it is hard to understand one bit and not another.  Hard to see one silver bullet.

David: To agree with Rebecca, changing relative to absolute risk is vital.  Then doing it in whole numbers, and so on. Journalists are intelligent; they are used to critiquing and their intuition is good.  They often lack to confidence to go with their intuition when data comes in. They should go with their guts.

John: Look at some of the questions in the News & Numbers book mentioned earlier.

A key theme here has been about counting people who aren’t usually counted.  What alternative data sources do you use to capture and explain these populations.

David: Using mobile phone data is probably one piece of the discussion that is relevant.

John: The census in US tried to enumerate populations like homelessness with formal study design… like looking at a proxy of people receiving services related to their status.  Probably the audience is better informed than the panel.

A few years ago, we found that in 40% of journals data was incorrectly presented graphically.  We have to start really young to get people’s brains to start working differently. This goes beyond numeracy.

David: the Teaching Probability book is aimed at 10 to 15 year old.  It uses the metaphor of expected frequency as a basis.  If you do that it leads to probability.  Converting relative to absolute risk is included in this, based on the idea of what does this mean for 100 people.  In the UK probability has been taken out of the primary school curriculum. Recent psychological research says statistical literacy underlies general decision making skills; it is crucial.

Trevor: The kind of information literacy we teach children is quite poor. Cultural change is possible. The News & Numbers book, despite nailing the problems, had little effect on the culture of journalism. New outlets like Wonk blogUpshot, 528, Vox and others say cultural change around the importance of data is happening. There is a danger o naivete, suggesting the wrong idea that we don’t need statistics anymore because we have big data.

John: We need to be training the trainer, the help the teachers to be equipped to communicate these ideas.

Brian: At their local school they discuss improving the teaching of mathematics, but none of the teachers are confident enough to do this.  They need more confidence. People are too willing to accept the idea that you’re “bad at math”; we need to break that down.

Closing Remarks

Rebecca: The takeaway is to tell a story.  Veer a little from the technical truth to try and tell a story that frames the information in a way that is non-technical. Don’t be scared to say something a little bit incorrectly, to better convey what you want to say.  People will remember better what you say, and become more curious.

Idrees: Data journalism is kind of a new thing, so we will have wrinkles. If you write to an editor about something that is egregious, they actually listen.

Brian: We want to be telling a story, like a feature article not an academic paper. Tell a story the way you want to be told a story. Present your work in that way, with a story structure that feels good.

Trevor:Statistical should not be dry; try to have a real conversation.  Numbers don’t speak for themselves.  Also, recognize the limits of your own background. Think like a designer that communicates knowledge. The name of the game is collaboration.

David: Respect a journalistic approach. That means working with them, but at a minimum it means working out the crucial points, develop a story, and try it out with people.

John: This has been an outstanding conversation.