Data journalism produced by two of the nation’s most prestigious news organizations — The New York Times and The Washington Post — has lacked transparency, often failing to explain the methods journalists or others used to collect or analyze the data on which the articles were based, a new study finds. In addition, the news outlets usually did not provide the public with access to that data.
In the study, media scholar Rodrigo Zamith, an assistant professor at the University of Massachusetts Amherst, examines the data journalism published by the Times and the Post during the first half of 2017. While previous research has focused on data journalism projects submitted for awards contests, Zamith wanted to assess newsrooms’ day-to-day data work.
He writes that the Times
and Post articles fall short of the
ideal of data journalism — to promote transparency and improve journalistic
accountability.
“The finding from this study that nearly 9 of 10 articles did not link to any of the datasets used and that fewer than 2 in 10 included a section for methodological details — two important aspects of disclosure transparency — highlights that even elite news organizations have a long way to go to realize that ideal,” Zamith writes in the study, which will be published in a forthcoming edition of the journalDigital Journalism.
He also notes that the two news outlets relied heavily on third-party evidence, seldom using data collected themselves. He examined 159 data journalism articles published by the Times or Post between Jan. 1, 2017 and June 30, 2017 and found that two-thirds relied on data from government sources. Only 10.4 percent of the Times’ articles and 6.5 percent of the Post’s articles drew on self-collected data.
For the purposes of the study, Zamith defined data journalism as “a
news item produced by a journalist that has a central thesis (or purpose) that
is primarily attributed to (or fleshed out by) quantified information (e.g.,
statistics or raw sensor data); involves at least some original data analysis
by the item’s author(s); and includes a visual representation of data.”
For years now, scholars have voiced concerns about journalists’ reliance on “information subsidies,” or prepackaged information that organizations, including public relations firms, provide to the news media for free, Zamith writes. He explains that the growing popularity of data journalism could make news outlets “even more vulnerable to a dependence on a new type of information subsidy — data — that can be exploited due to news organizations’ inability to collect their own large datasets.”
In an interview with Journalist’s Resource, Zamith
elaborated. He also stressed the importance of data journalists receiving
training on research methods.
“When you collect data yourself, it’s easy to see the limitations
of data,” he wrote via e-mail. “When you rely on outside data, those details
can get lost or simply not be adequately explained. Worse yet, it wouldn’t
surprise me if we saw more unethical people publish data as a strategic
communication tool, because they know people tend to believe numbers more than
personal stories. That’s why it’s so important to have that training on
information literacy and methodology.”
Zamith offered more details about the kind of research training he thinks journalists need.
“A class that covers basic sampling theory and data collection, instrument design, and basic data analysis would be sufficient for most data journalists right now,” he wrote to Journalist’s Resource. “If they need more than that, it would be sensible to partner with a researcher or university. However, schools can step up their undergraduate offerings to teach those core skills because data — whether it’s data journalism or not — is becoming more prevalent in society, and thus in journalism.”
Here are some other takeaways from Zamith’s 21-page study, titled
“Transparency,
Interactivity, Diversity, and Information Provenance in Everyday Data
Journalism”:
- While the Times had more journalists and nearly three times as many digital subscribers, the Post published more data journalism. The Times published 67 data journalism articles during the study period. The Post published 92.
- The Post was somewhat more transparent. It failed to provide links to any data in 82.6 percent of its data journalism articles. In 9.8 percent of its articles, it offered links to some data while in 7.6 percent of articles, it linked to all the supporting data. Meanwhile, the Times failed to provide links to data in 94 percent of its articles. The Times linked to some data in 6 percent of its articles and never provided public access to all the data it used.
- Less than 12 percent of Times articles provided supplemental methodological information either on the same page as the article or via an external link or pop-up. Less than 20 percent of Post articles did.
- The Times relied slightly more often on government sources — 67.2 percent of Times articles drew on government sources, compared to 65.2 percent of Post articles. The Times used data from other news organizations 6 percent of the time while the Post did that 4.3 percent of the time.
- While the two outlets dedicated a similar proportion of articles to the topics of politics, sports and national security and defense, the Post did more work on the economy. Approximately 18 percent of its digital journalism focused on the economy compared to approximately 9 percent of the Times’ articles. But a greater percentage of Times’ articles focused on education and health. For example, about 18 percent of Times’ stories were about health. Meanwhile, about 10 percent of Post stories were.
Looking for more on data journalism? Check out our tip
sheet on visualizing economic data and the free data
journalism syllabus we created for college instructors and faculty.