Posts Tagged ‘open data’

Seeing as tomorrow is Open Data Day and I claim to be a Data Journalist (I think JournoCoder is more suitable) here’s a little data food for journalistic thought.

RSS Feed of US Nuclear Reactor Events

Here is the site showing the US nuclear reactors power output status. Here is the scraper for that site written by ScraperWiki founder Julian Todd. Here is my script for catching the unplanned events and converting them to RSS format. And here is the URL you can use to subscribe to the feed yourself:

Oh, and a video (using the example to go through the ScraperWiki API)

The functionality that has set the web world a blaze, created whole industries and churned out billionaires from fiddlers of code is ‘social’. It’s even shaken Google to its core. ‘Social’ has also made news organisations think ‘digital’, however the phoenix that will emerge from the burning embers of the newspaper industry is ‘open’. The functionality of Open Data will separate the losers from the winners in the digital news (r)evolution. Curation, aggregation, live are all currently thrown in the mix but no one overarching model has yet ignited the flames of public engagement.

So I want to talk about Open Data. But what is Open Data? The best I can offer you is the open definition from the Open Data Manual which reads: “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike.” For the best understanding of Open Data I would highly recommend you read a report by Marco Fioretti for the Laboratory of Economics and Management of Scuola Superiore Sant’Anna, Pisa entitled Open Data: Emerging trends, issues and best practices (2011).

This blog post will really be about how this report highlights the need, duty and opportunity for news to become part of this Open Data movement and, in my opinion, the news industry can be what Open Data needs to cultivate the ethos of information access amongst the public. The first thing the report happens upon under “Social and political landscape” is news; big news which many organisations struggled to maintain across news flows. These are the Spanish “Indignados” , the Arab Spring, the Fukushima nuclear accident and Cablegate. Whilst Marco admits that Wikileaks may have caused some hostility towards Open Data he notes that:

…while certainly both Open Data and Wikileaks are about openness and transparency in politics, not only are there deep differences between the two ideas but, in our opinion, the Wikileaks experience proves the advantages of Open Data.

Fighting for transparency through organisations who exist on the outer fringes or even outside of the law, create just another veil of secrecy. Indeed, recent events regarding the leak of unredacted Wikileaks data show how corrosive forcibly breaking through the layers of data protection can be for any organisation. Many within the news industry admire (praise is too strong a word) Wikileaks’ cause and argue that if journalism was performing its intended function then there would be no need for a Wikileaks.

Which brings me back to the newsroom. Unlike the web, the newsroom is not structured to handle large streams of data. The big data stories in the UK have been the Iraq War Logs, Cablegate and MPs expenses. These have been stories because the existence of the data itself is a story. Big data dumps can make headlines, masses of data being produced from the public sector daily need to be mined to find stories. Newsrooms don’t do that. Because as a journalist you have to pitch the ‘story’ to your editor, not content.

The news medium produces content for stories not stories from content. But the web feeds off content in the form of data. And online social networks are bringing the content to the user directly. News organisations need to work with this content, this data, these facts in plain sight as “unlike the content of most Wikileaks documents, Open Data are almost always data that should surely be open” and therein lies your public service responsibility. In the case of the data story on EU structural funds by the Bureau for Investigative Journalism and the Financial Times, an Italian reporter who picked up the story, Luigi Reggi writes:

The use of open, machine-processable and linked-data formats have unexpected advantages in terms of transparency and re-use of the data .. What is needed today is the promotion among national and local authorities of the culture of transparency and the raising of awareness of the benefits that could derive from opening up existing data and information in a re-usable way.

What distinguishes Open Data from “mere” transparency is reuse

The Open Data Movement has taken off. Of course a lot more needs to be done but the awareness and realisation of the need to publish public information is born of the web and will die with the web (i.e. never). Marco states that “In practice, public data can be opened at affordable costs, in a useful and easily usable way, only if it is in digital format … When data are opened, the problem becomes to have everybody use them, in order to actually realise Open Government.”

The relationship between media and state means that the traditional media bodies (broadcast and print) should be the ones to take that place. Why? Because it requires an organisational structure, the one thing the web cannot give to citizen journalists. It can give us the tools (print, audio and video upload and curation) but it cannot provide us with the external structures (editorship, management, legal, time and expertise) needed to unearth news not just package it. News organisations need to mine the data because structures are needed to find the truth behind data as it is not transparent to the average citizen. News needs to provide the analysis, insight and understanding.

There is not automatic cause-effect relationship between Open Data and real transparency and democracy … while correct interpretation of public data from the majority of average citizens is absolutely critical, the current situation, even in countries with (theoretical) high alphbetization and Internet access rates, is one in which most people still lack the skills needed for such analysis … It is necessary that those who access Open Data are in a position to actually understand them and use them in their own interest.

So why is ‘open’ the new ‘social’? Because services who make data open make it useful and usable. Open Data is about Open Democracy and allowing communities to engage through digital services built around the idea of openness and empowerment. News needs to get on board. But just as social was an experiment which some got right, so getting Open Data right will be the deal breaker for digital news. Just take a look at some of these:

And I’m sure there are many more examples out there. I’m not saying news organisations have to do the same. Open Data, as you can see, is a global movement and just as ‘social’ triggered the advance of web industry into the news industries’ territory so news should look to ‘open’ to claim some of that back.

#opendata from Open Knowledge Foundation on Vimeo.

You might be wondering what this short documentary has to do with journalism or even what open data has to do with journalism. No doubt you are aware that journalism has been facing a ‘crisis’ for a while now. Not just because of the recession and shrinking advertisers but because of the dominance of the web for getting information to people and allowing them to share amongst themselves.

Open data activists are working with the web to provide information in a way people can engage with and ultimately feel empowered by. Projects like FixMyStreet and Schooloscope are emblematic of this rise in civic engagement projects. Indeed, crime mapping in San Francisco led to local citizens demanding more policing in areas of high crime and a change in the policing schedule to reflect the hours when crime is at its highest.

News used to have some responsibility in this area of engagement but never quite understood the field or didn’t know quite what to do with it. Now they have lost complete control and the masters of the web platforms are again taking informational control of a growing area of interest. But news organizations are missing a very important trick. Data driven journalist, Mirko Lorenz, has written how News organizations must become hubs of trusted data in a market seeking (and valuing) trust.

Which is why I think anyone interested in the area of data journalism should watch this documentary, as not only should traditional media be training journalists to engage with this new streaming of social and civic data, but managers and execs should think about the possible shifting in the traditional media market away from advertising and towards the trust market.

And here’s what Tim Berner-Lee, founder of the internet, said regarding the subject of data journalism:

Journalists need to be data-savvy… [it’s] going to be about poring over data and equipping yourself with the tools to analyse it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the country

How the Media Handle Data:

Data has sprung onto the journalistic platform of late in the form of the Iraq War Logs (mapped by The Guardian), the MP’s expenses (bought by The Telegraph) and the leaked US Embassy Cables (visualized by Der Spiegel). What strikes me about these big hitters is the existence of the data is a story in itself. Which is why they had to be covered. And how they can be sold to an editor. These data events force the journalistic platform into handling large amounts of data. The leaks are stories so there’s your headline before you start actually looking for stories. In fact, the Fleet Street Blues blog pointed out the sorry lack of stories from such a rich source of data, noting the quick turn to headlines about Wikileaks and Assange.

Der Spiegel - The US Embassy Dispatches

So journalism so far has had to handle large data dumps which has spurred on the area of data journalism. But they also serve to highlight the fact that the journalistic platform as yet cannot handle data. Not the steady stream of public data eking out of government offices and public bodies. What has caught the attention of news organizations is social media. And that’s a steady stream of useful information. But again, all that’s permitted is some fancy graphics hammered out by programmers who are glad to be dealing with something more challenging than picture galleries (here’s an example of how  CNN used twitter data).

So infographics (see the Stanford project: Journalism in the Age of Data) and interactives (e.g. New York Times: A Peek into Netflix Queues) have been the keystone from which the journalism data platform is being built. But there are stories and not just pictures to be found in data. There are strange goings-on that need to be unearthed. And there are players outside of the newsroom doing just that.

How the Data Journalists Handle Data:

Data, before it was made sociable or leakable, was the beat of the computer-assisted-reporters (CAR). They date as far back as 1989 with the setting up of the National Institute for Computer-Assisted Reporting in the States. Which is soon to be followed by the European Centre for Computer Assisted Reporting. The french group, OWNI, are the latest (and coolest) revolutionaries when it comes to new age journalism and are exploring the data avenues with aplomb. CAR then morphed into Hacks/Hackers when reporters realized that computers were tools that every journalist should use for reporting. There’s no such thing as telephone-assisted-reporting.  So some whacky journalists (myself now included) decided to pair up with developers to see what can be done with web data.

This now seems to be catching on in the newsroom. The Chicago Tribune has a data center, to name just one. In fact, the data center at the Texas Tribune drives the majority of the sites traffic. Data journalism is growing alongside the growing availability of data and the tools that can be used to extract, refine and probe it. However, at the core of any data driven story is the journalist. And what needs to be fostered now, I would argue, is the data nose of a (any) journalist. Journalism, in its purest form, is interrogation. The world of data is an untapped goldmine and what’s lacking now is the data acumen to get digging. There are Pulitzers embedded in the data strata which can be struck with little use of heavy machinery. Data driven journalism and indeed CAR has been around long before social media, web 2.0 and even the internet. One of the earliest examples of computer assisted reporting was in 1967, after riots in Detroit, when Philip Meyer used survey research, analyzed on a mainframe computer, to show that people who had attended college were equally likely to have rioted as were high school dropouts. This turned the publics’ attention to the pervasive racial discrimination in policing and housing in Detroit.

Where Data Fits into Journalism:

I’ve been looking at the States and the broadsheets reputation for investigative journalism has produced some real gems. What stuck me, by looking at news data over the Atlantic, is that data journalism has been seeded earlier and possibly more prolifically than in the UK. I’m not sure if it’s more established but I suspect so (but not by a wide margin). For example, at the end of 2004, the then Dallas Morning News analyzed the school test scores of the Texas Assessment of Knowledge and Skills and uncovered one school’s alleged cheating on standardized tests. This then turned into a story on cheating across the state. The Seattle Times piece of 2008, logging and landslides, revealed how a logging company was blatantly allowed to clear-cut unstable slopes. Not only did they produce and interactive but the beauty of data journalism (which is becoming a trend) is to write about how the investigation was uncovered using the requested data.

The Seattle Times: Landslides in the Upper Chehalis River Basin

Newspapers in the US are clearly beginning to realize that data is a commodity for which you can buy trust from your consumer. The need for speed seems to be diminishing as social media gets there first, and viewers turn to the web for richer information. News in the sense of something new to you, is being condensed into 140 character alerts, newsletters, status updates and things that go bing on your mobile device. News companies are starting to think about news online as exploratory information that speaks to the individual (which is web 2.0). So the The New York Times has mapped the census data in its project “Mapping America: Every City, Every Block”. The Los Angeles Times has also added crime data so that its readers are informed citizens not just site surfers. My personal heros are the investigative reporters at ProPublica who not only partner with mainstream news outlets for projects like Dollars for Doctors, they also blog about the new tools they’re using to dig the data. Proof the US is heading down the data mine is the fact that Pulitzer finalists for local journalism included a two year data dig by the Las Vegas Sun into preventable medical mistakes in Las Vegas hospitals.

Lessons in Data Journalism:

Another sign that data journalism is on the up is the recent uptake at teaching centres for the next generation journalist. Here in the UK, City University has introduced an MA in Interactive Journalism which includes a module in data journalism. Across the pond, the US is again ahead of the game with Columbia University offering a duel masters’ in Computer Science and Journalism. Words from the journalism underground are now muttering terms like Goolge Refine, Ruby and Scraperwiki. O’Reilly Radar has talked about data journalism.

The beauty of the social and semantic web is that I can learn from the journalists working with data, the miners carving out the pathways I intend to follow. They share what they do. Big shot correspondents get a blog on the news site. Data journalists don’t, but they blog because they know that collaboration and information is the key to selling what it is they do (e.g Anthony DeBarros, database editor at USA Today). They are still trying to sell damned good journalism to the media sector!  Multimedia journalists for local news are getting it (e.g David Higgerson, Trinity Mirror Regionals). Even grassroots community bloggers are at it (e.g. Joseph Stashko of Blog Preston). Looks like data journalism is working its way from the bottom up.

Back in Business:

Here are two interesting articles relating to the growing area of data and data journalism as a business. Please have a look: Data is the New Oil and News organizations must become hubs of trusted data in a market seeking (and valuing) trust.


This is a fringe event to the E-Campaigning Forum run by Fairsay. Rolf Kleef (Open for Change) and Tim Davies (Practical Participation) are co-ordinating the day in a voluntary capacity, with support from Javier Ruiz (Open Rights Group)


The Open Data Campaigning Camp will immediately follow the annual E-Campaigning Forum (#ECF11), so will be targeted particularly at campaigners interested in increasing their understanding of how to engage with open data. They also invite developers and data experts interested in exploring the connections between data and campaigning.


Thursday, 24th March at 09:30 AM


St. Annes College


The day will start with an introduction to open data, the history of open data campaigning, and short presentations on finding and using data, and on publishing data, for advocacy and campaigning. Then there will be action-learning – with participants choosing projects to work on throughout the day – exploring open data for campaigning around a key themes including:

  • International Development
  • Environment and Climate
  • Public Spending Cuts

Projects might include: designing a campaign using open data; building a data visualisation; creating a data campaigning toolkit for local activists; creating a data-driven mobile app for campaigning; publishing a dataset for campaigners to create mash-ups with; exploring and updating data catalogues; and whatever other ideas you bring along.The great thing is is that there will be support on hand to introduce different ways of engaging with data. Sign up for it here.


I recently attended an Open Data Master Class and I would like to share my thoughts, not as an expert but as a novice looking in on the CAR/Hacks Hackers/data journalism embryo. This is really a reflection on the nuggets of advice offered by some chieftains in the global village of data miners.

Open is suddenly cool – reflections on words by Dr. Hanif Rahemtulla (Nottingham University):

Data as it stands is not freely available; it’s not truly open. Because only the people who know where to find it, how to use it and how to visualize it truly have access to data. Raw numbers are useless. If you don’t understand the nature of it you can’t mediate it. Yet there is an ongoing movement towards open data; in the UK, US, Canada, Australia, New Zealand, Ireland, Norway and even Kenya. ‘Open’ is suddenly cool because of open source. Because government policies towards data has change (not in regards to embassy cables mind). Now we have a possible chain; from information to data to website to app. Data can be made local and carry all the relevance that locality allows (not just geo but interest based). If we can connect enough people in this chain who have the relevant knowledge and expertise then we could see government as a platform for ‘citizen assembled data’. I’ve been concentrating on this assembly chain as I’m tied up with what’s bringing home the bacon. What was truly great about this data class was the practical aspect which I hope to build upon. But the more I delve into the world of data the more I begin to realize that it is not as tidy or elegant as I’d like it to be…

Linking is the future of open data – reflections on words by James Forrester (

Data can be made pretty now – think Information is Beautiful. But we need to create many pliable tools so that we can customize the view to the viewer and to the story itself (note I mean story and not data). Data needs to be editorialized. You can pull a lot of things out of data that aren’t true or worthwhile. But it’s detailed data that enables data miners to go further. data needs to come with meta data. Nothing should be in pdf.  Locked up data doesn’t serve anyone. We have to know what we are dealing with before we can make it useable and it’s this information that the government needs to be pressed to produce. They also need to find out what data all the various bodies and factions are doing and put them in one place! Only then can the links with value be established.

Making a web of data – reflections on words by Tim Hodson (Talis)

The simple fact of the matter is more tasks can be done by machines so making a web of data means not creating a different web or a new web but making the web we have better – scraping and linking. So once data is open the government needs to make it machine readable. Those interested in data mining – revealing the gem – need to provide the context by linking. But merging data bases is not fun so there’s not many people will to go down the mines. The node step to a web of data is at the government where they need to agree of format and standards. And they have been consulting web developers and making the right steps. It’s the next links in the chain that are in our hands and which I want to be part of.

Mind mapping – reflections on words by Chris Parker and Ian Holt (Ordnance Survey)

Publication is not the same as communication. Nearly all data is published but very little gets communicated. One of the simplest (I did it in a day) and best tools for communicating data is to map it. The world we live in is 4D not 3D, there is an extra informational dimension. But information is an organic entity that grows very quickly and dies very quickly. Applying data to a digital dimension can solve real problems but can cost a hell of a lot of time and money to maintain. I was talking to a developer at the Data Revolution event and he pointed me to an app called Layar. We were talking about #gmp24 and he wondered if a world could exist where if you saw an accident you wanted to report you could check your phone to see if someone else had already called in. No more ‘ I assumed someone else had done it’ excuses.

I’m not sure what this new age of digital information will look like but I sure as hell am going to do my best to be part of it.

If you want a summary of everything that was said go here. Here is a link to the practical and some of the slides.

I’ll be at this event all day tomorrow so expect your weekly diet of data, journalism and Gov 2.0. Here’s the blurb:

“The past few months have seen a number of high profile announcements on the release of central and local government data for free. The Prime Minister launched the portal to ‘open up data and promote transparency’ and the London Mayor announced the London Data Store to ‘give Londoners the change to find out more about how the city is run’. There is great excitement in the developer community and many new mash-ups and apps have been produced from the released data already.

The Horizon Digital Economy Research and Centre for Geospatial Science at the University of Nottingham in partnership with and GeoVation is proud to announce a FREE one-day Open Data Master Class at the Royal Geographical Society to reach a wide cross section of people (i.e., individuals, communities, grassroots organizations, NGOs to civil servants and professionals) who can benefit from a greater understanding of the opportunities around open data. Specifically, the one-day master class will provide individuals with the tools and techniques needed to use and analysis a range of Open Datasets that are of relevance and interest to them such as, for example, school census data, health care provision, crime statistics and transportation data.”

Here’s a preview of what I did during the practical but there were real gems of advice from all the speakers. The first point I’ll make is that this wasn’t a journalism event. There were a lot of government and corporation types. But these open data events will form the real bread and butter of what data journalists will have to deal with. I’ve spoken to the organizers and have arranged to be made aware of these events. I found a great pool of spatial analysis experts which I hope to tap into in order to generate meaty data driven stories. I’ll post an update when the presentations are put online. Until then head to Geocommons and have a play (and yes, the data needs cleaning!).