Posts Tagged ‘data journalism’

Seeing as tomorrow is Open Data Day and I claim to be a Data Journalist (I think JournoCoder is more suitable) here’s a little data food for journalistic thought.

RSS Feed of US Nuclear Reactor Events

Here is the site showing the US nuclear reactors power output status. Here is the scraper for that site written by ScraperWiki founder Julian Todd. Here is my script for catching the unplanned events and converting them to RSS format. And here is the URL you can use to subscribe to the feed yourself:

Oh, and a video (using the example to go through the ScraperWiki API)

Things have been quiet on the blog front and I apologize. What began as a tumultuous year with a big risk on my part has become even more turbulent. Happily with opportunities rather than uncertainties. Trips to Germany and the US have landed in my lap. Both hugely challenging and exciting.

I completed the Knight Mozilla Learning Lab successfully and have been invited to Berlin for the MoJoHackfest next week. I’m really looking forward to meeting all the participants and getting some in depth hands-on experience of creating applications built around a better news flow.

This is a level between the hack days ScraperWiki ran and the ScraperWiki platform development itself (I don’t play a part in this but work closely with those who do), which is more akin to the development newsroom.

My pitch for the Learning Lab, Big Picture, is asking a lot of developers coming with their own great ideas and prototypes. I would love to get some of the functionality working but that very much depends on the goodwill, skills and availability of a small group of relative strangers.

I have a tendency to bite off more than I can chew and ask a lot of people who have no vested interests in my development. I am acutely aware that I cannot build any part of the Big Picture project. That being said I have built a new project that can be added to with a basic knowledge of Python. I give you MoJoNewsBot:

If you want to know more about how the Special Advisers’ query was done read my ScraperWiki blog post. Also, I fixed the bug in the Goolge News search so the links match the headline.

Come October I will be heading to the US to help fulfill part of ScraperWiki’s obligations to the Knight News Challenge. I am honoured to be one of ScraperWiki’s first full-time employees and actually get paid to further the field of data journalism!

Being part of a startup has its risks. No one’s role is every fully defined. This really is a huge experiment and I’m not sure I can even describe what it is I am doing. I am not a noun, however. I am a verb. My definition is in my functionality and defining this through ScraperWiki, MoJo and any other opportunities that come my way will be the basis of this blog from now on. So my posts will be sporadic but I hope you look forward to them.

I’ve come specifically to the Open Knowledge Conference for the track on data journalism (although I’m very interested in the open data scene anyway). It was a call to action more than an educational exposition. Data journalism doesn’t have a set path nor definition which is why there’s a lot of journalism falling under the term ‘data journalism’ that are, underneath it all, very different species. Just as mathematics is composed of a ranges of disciplines yet most people encounter it as one overarching topic.

I’m having an amazing time in Berlin and I’m sure I’ve consumed more than I can digest in terms of data. But here are some points I noted from the speakers Simon Rogers, Stefan Candea, Caelainn Barr, Liliana Bounegru and Mirko Lorenz, which I’ve added my thoughts to here:

1. “There needs to be defined long-term goals for data journalism training as the field has widened” – I believe that the different disciplines are becoming evident as tools with wider uses are being tinkered with (I wouldn’t go so far as to say adopted), more so than the field has widened. I do not believe in long term goals either. To evolve into a specialist species one has to adapt to ones environment. Now the data environment is changing at a web rate which is far too fast for long term goals.

2. “It’s about stories AND words – it’s just another source” – Old school journalism used to rely on a network of sources. Data journalism relies on a network of resources. So all journalism today should rely on a network of sources and resources working in tandem, working together, in sync. Old school journalism applies today just as it always did. You need to be able to read and rely on the validity of your sources. You need to understand their agenda and their limitations. In the same way you need to be able to do all these things with data and the resources you are working with.

3. “Data for journalists is a great resource but not the golden bullet” – I agree. The golden bullet is the journalistic mindset. The ability to spot something that isn’t right, that shouldn’t be. This is one characteristic but with data journalism you’re using the other side of the brain. The ‘training’ that is needed is to learn to use your other numerical side as a resource also. If you don’t have a well tuned journalistic mindset you won’t be a good data journalist and I fear this mindset is being left at the door when journalists approach data (especially when being trained) because using the left hemisphere of their brain is so alien to them they feel they’re in a completely different microcosm.

4. “Not doing data journalism is not an option” – This was mentioned in reference to online journalism. I’m not sure I quite agree with this. I think there’s a lot of institutions where doing data journalism isn’t an option. For future survival, you’d be amazed how much traffic can be generated by a saucy picture and a splashy headline. Combine it with a social media savvy policy and you’ll find the serious side of data journalism will easily go amiss. Most news institutions are doing some form of superficial data journalism in the form of infographics or interactives. Javascript developers are quite common in the corner of the newsroom nearest the coffee machine, servers and exit. Social media has changed the way we view news but this did not come from within the journalistic institution. The change will only be implemented from the inside if it is pushed from the outside. This is why I am interested in open data. This is where I see (and hope) a symbiosis will form.

And here’s what Tim Berner-Lee, founder of the internet, said regarding the subject of data journalism:

Journalists need to be data-savvy… [it’s] going to be about poring over data and equipping yourself with the tools to analyse it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the country

How the Media Handle Data:

Data has sprung onto the journalistic platform of late in the form of the Iraq War Logs (mapped by The Guardian), the MP’s expenses (bought by The Telegraph) and the leaked US Embassy Cables (visualized by Der Spiegel). What strikes me about these big hitters is the existence of the data is a story in itself. Which is why they had to be covered. And how they can be sold to an editor. These data events force the journalistic platform into handling large amounts of data. The leaks are stories so there’s your headline before you start actually looking for stories. In fact, the Fleet Street Blues blog pointed out the sorry lack of stories from such a rich source of data, noting the quick turn to headlines about Wikileaks and Assange.

Der Spiegel - The US Embassy Dispatches

So journalism so far has had to handle large data dumps which has spurred on the area of data journalism. But they also serve to highlight the fact that the journalistic platform as yet cannot handle data. Not the steady stream of public data eking out of government offices and public bodies. What has caught the attention of news organizations is social media. And that’s a steady stream of useful information. But again, all that’s permitted is some fancy graphics hammered out by programmers who are glad to be dealing with something more challenging than picture galleries (here’s an example of how  CNN used twitter data).

So infographics (see the Stanford project: Journalism in the Age of Data) and interactives (e.g. New York Times: A Peek into Netflix Queues) have been the keystone from which the journalism data platform is being built. But there are stories and not just pictures to be found in data. There are strange goings-on that need to be unearthed. And there are players outside of the newsroom doing just that.

How the Data Journalists Handle Data:

Data, before it was made sociable or leakable, was the beat of the computer-assisted-reporters (CAR). They date as far back as 1989 with the setting up of the National Institute for Computer-Assisted Reporting in the States. Which is soon to be followed by the European Centre for Computer Assisted Reporting. The french group, OWNI, are the latest (and coolest) revolutionaries when it comes to new age journalism and are exploring the data avenues with aplomb. CAR then morphed into Hacks/Hackers when reporters realized that computers were tools that every journalist should use for reporting. There’s no such thing as telephone-assisted-reporting.  So some whacky journalists (myself now included) decided to pair up with developers to see what can be done with web data.

This now seems to be catching on in the newsroom. The Chicago Tribune has a data center, to name just one. In fact, the data center at the Texas Tribune drives the majority of the sites traffic. Data journalism is growing alongside the growing availability of data and the tools that can be used to extract, refine and probe it. However, at the core of any data driven story is the journalist. And what needs to be fostered now, I would argue, is the data nose of a (any) journalist. Journalism, in its purest form, is interrogation. The world of data is an untapped goldmine and what’s lacking now is the data acumen to get digging. There are Pulitzers embedded in the data strata which can be struck with little use of heavy machinery. Data driven journalism and indeed CAR has been around long before social media, web 2.0 and even the internet. One of the earliest examples of computer assisted reporting was in 1967, after riots in Detroit, when Philip Meyer used survey research, analyzed on a mainframe computer, to show that people who had attended college were equally likely to have rioted as were high school dropouts. This turned the publics’ attention to the pervasive racial discrimination in policing and housing in Detroit.

Where Data Fits into Journalism:

I’ve been looking at the States and the broadsheets reputation for investigative journalism has produced some real gems. What stuck me, by looking at news data over the Atlantic, is that data journalism has been seeded earlier and possibly more prolifically than in the UK. I’m not sure if it’s more established but I suspect so (but not by a wide margin). For example, at the end of 2004, the then Dallas Morning News analyzed the school test scores of the Texas Assessment of Knowledge and Skills and uncovered one school’s alleged cheating on standardized tests. This then turned into a story on cheating across the state. The Seattle Times piece of 2008, logging and landslides, revealed how a logging company was blatantly allowed to clear-cut unstable slopes. Not only did they produce and interactive but the beauty of data journalism (which is becoming a trend) is to write about how the investigation was uncovered using the requested data.

The Seattle Times: Landslides in the Upper Chehalis River Basin

Newspapers in the US are clearly beginning to realize that data is a commodity for which you can buy trust from your consumer. The need for speed seems to be diminishing as social media gets there first, and viewers turn to the web for richer information. News in the sense of something new to you, is being condensed into 140 character alerts, newsletters, status updates and things that go bing on your mobile device. News companies are starting to think about news online as exploratory information that speaks to the individual (which is web 2.0). So the The New York Times has mapped the census data in its project “Mapping America: Every City, Every Block”. The Los Angeles Times has also added crime data so that its readers are informed citizens not just site surfers. My personal heros are the investigative reporters at ProPublica who not only partner with mainstream news outlets for projects like Dollars for Doctors, they also blog about the new tools they’re using to dig the data. Proof the US is heading down the data mine is the fact that Pulitzer finalists for local journalism included a two year data dig by the Las Vegas Sun into preventable medical mistakes in Las Vegas hospitals.

Lessons in Data Journalism:

Another sign that data journalism is on the up is the recent uptake at teaching centres for the next generation journalist. Here in the UK, City University has introduced an MA in Interactive Journalism which includes a module in data journalism. Across the pond, the US is again ahead of the game with Columbia University offering a duel masters’ in Computer Science and Journalism. Words from the journalism underground are now muttering terms like Goolge Refine, Ruby and Scraperwiki. O’Reilly Radar has talked about data journalism.

The beauty of the social and semantic web is that I can learn from the journalists working with data, the miners carving out the pathways I intend to follow. They share what they do. Big shot correspondents get a blog on the news site. Data journalists don’t, but they blog because they know that collaboration and information is the key to selling what it is they do (e.g Anthony DeBarros, database editor at USA Today). They are still trying to sell damned good journalism to the media sector!  Multimedia journalists for local news are getting it (e.g David Higgerson, Trinity Mirror Regionals). Even grassroots community bloggers are at it (e.g. Joseph Stashko of Blog Preston). Looks like data journalism is working its way from the bottom up.

Back in Business:

Here are two interesting articles relating to the growing area of data and data journalism as a business. Please have a look: Data is the New Oil and News organizations must become hubs of trusted data in a market seeking (and valuing) trust.


Digital Editors Network


A two-day workshop for those who value turning data into stories with impact. Sign up here.


Thursday, 19 May at 9:30 AM – Friday, 20 May at 5:00 PM


100 Broadway
Media City
M502UW Salford


Because you’ll learn:

  • Collaboration tools for the newsroom team
  • Customizing search-and-retrieve data tools
  • Extracting data from documents
  • Data cleaning and formatting
  • An elementary introduction to scraping web sites for data
  • Using web “cloud” tools to clean and display data
  • Most important, how to tease meaning and STORIES out of data and then tell those stories in multiple ways

And the following will be there to teach you:

Actually being called Data Journalism that is.


It’s from and is headed by Kevin Anderson who I see at most of the events I attend.


An introduction to data journalism: Taming the numbers


Wednesday, 19th January at 10:00 am


Royal Society of Medicine

1 Wimpole Street

London W1G 0AE


Says it will cover:

  • The basics of data files and formats
  • Sources of data
  • How to collect your own data
  • Free and low-cost tools to analyse and visualise data
  • Editorial planning for data

Says it will leave you with an understanding of:

  • How to design news features with data in mind
  • How to extract data from PDFs
  • How to use Google Docs for data collection
  • How to visualise data, including charts, graphs and maps

That and it’s the first course I’ve found in London to actually call itself data journalism! I imagine the first port of call will be a definition.

Words are data also and this video by Sir Ken Robinson shows how animation can make real time visuals for stories which are information rich and picture poor. I really love this. Graphics, to a certain extent, makes the story more understandable and much more entertaining.

And for all those interested in the topic of education (which should be everyone) here’s Sir Ken Robinson’s TED talk. His angle on how education has been structured to almost stifle creativity, I think is particlarly relevent to new journalism. Particularly data journalism as the boundries are being extended by those trying something new and so could never take the form of teachers. You can’t teach something that will only take shape in a future no one can see.