Here are the videos from the Data Journalism stream at the Open Knowledge Conference this year held in Berlin featuring Mirko Lorenz, Simon Rogers and Caelainn Barr amongst others.




And just so you know I will be heading back to Berlin at the end of September for the Knight-Mozilla Hackathon. Greatly looking forward to it as I’ll be getting hands on experience of platforming building for the news quick and dirty. I’m also very excited about meeting some of the lab folk face to face. Will keep you posted and blog from a journo perspective and how I think this type of creativity is changing news.

I’ve come specifically to the Open Knowledge Conference for the track on data journalism (although I’m very interested in the open data scene anyway). It was a call to action more than an educational exposition. Data journalism doesn’t have a set path nor definition which is why there’s a lot of journalism falling under the term ‘data journalism’ that are, underneath it all, very different species. Just as mathematics is composed of a ranges of disciplines yet most people encounter it as one overarching topic.

I’m having an amazing time in Berlin and I’m sure I’ve consumed more than I can digest in terms of data. But here are some points I noted from the speakers Simon Rogers, Stefan Candea, Caelainn Barr, Liliana Bounegru and Mirko Lorenz, which I’ve added my thoughts to here:

1. “There needs to be defined long-term goals for data journalism training as the field has widened” – I believe that the different disciplines are becoming evident as tools with wider uses are being tinkered with (I wouldn’t go so far as to say adopted), more so than the field has widened. I do not believe in long term goals either. To evolve into a specialist species one has to adapt to ones environment. Now the data environment is changing at a web rate which is far too fast for long term goals.

2. “It’s about stories AND words – it’s just another source” – Old school journalism used to rely on a network of sources. Data journalism relies on a network of resources. So all journalism today should rely on a network of sources and resources working in tandem, working together, in sync. Old school journalism applies today just as it always did. You need to be able to read and rely on the validity of your sources. You need to understand their agenda and their limitations. In the same way you need to be able to do all these things with data and the resources you are working with.

3. “Data for journalists is a great resource but not the golden bullet” – I agree. The golden bullet is the journalistic mindset. The ability to spot something that isn’t right, that shouldn’t be. This is one characteristic but with data journalism you’re using the other side of the brain. The ‘training’ that is needed is to learn to use your other numerical side as a resource also. If you don’t have a well tuned journalistic mindset you won’t be a good data journalist and I fear this mindset is being left at the door when journalists approach data (especially when being trained) because using the left hemisphere of their brain is so alien to them they feel they’re in a completely different microcosm.

4. “Not doing data journalism is not an option” – This was mentioned in reference to online journalism. I’m not sure I quite agree with this. I think there’s a lot of institutions where doing data journalism isn’t an option. For future survival, you’d be amazed how much traffic can be generated by a saucy picture and a splashy headline. Combine it with a social media savvy policy and you’ll find the serious side of data journalism will easily go amiss. Most news institutions are doing some form of superficial data journalism in the form of infographics or interactives. Javascript developers are quite common in the corner of the newsroom nearest the coffee machine, servers and exit. Social media has changed the way we view news but this did not come from within the journalistic institution. The change will only be implemented from the inside if it is pushed from the outside. This is why I am interested in open data. This is where I see (and hope) a symbiosis will form.

At the Microsoft Centre for the latest News Rewired courtesy of The first thing that strikes me from the day is Joanna Geary pointing out that now business strategy and journalistic code are becoming one entity. Now we can measure our audience we feel we can control our revenue and drive it up. But eyeballs are people too and when you start building relationships exciting things happen. This can become a serious business model. We can no longer judge our news sense by whether our editor thinks we’re right, it is now by how we service our community. We have to build loyalty about what we do not who we are. Things start to change when you acknowledge people.

Building an Online Community from Scratch

Don’t replicate, try and encourage. Fill a niche and give the community something they can use and link to everything else that is already being done. Online is not as intimidating as in person, so you can get closer and more personal with the people you work for. That means take criticism and act on it. Meeting each other face-to-face is important so organise bloggers meetups and tweetups. [Ed Walker]

5 Things to Avoid: i) not having a clear objective will fragment your audience – need to help the community do what it wants to do ii) don’t be obsessed with numbers – they are not directly relational to influence and interaction iii) don’t broadcast at your community – it’s not all about you iv) it’s not about the technology – it’s about the people and the important people in the community v) avoid not being a part of it yourself [Neil Perkin]

99% of community attempts end in failure! That’s because the community already exists. You have to find it and find out how to connect all the pieces better. You need to ask yourself why you need the community, what’s it for, where are you going with it and how are you going to do it. Build a community using 4 C’s: connection, conversation, consultation and collaboration. Most importantly, every individual can do it. [Anthony Thornton]

You ultimately need to know when something is over and when you need to implement change to survive. A community is a fluid entity and what works today might not work tomorrow.

Branding and Entrpreneuralism

Entrepreneurs can teach journalists, as can bloggers and citizen journalists. Not just about their topic of interest but about business also. The best entrepreneurs are those who have been left out in the cold by traditional media. They know all about the old and have a need to embrace the new. Start with what you know and explore what you need that the mainstream does not provide. Look at the new habits you’ve adopted. Odds are there are more people like you. Put all you habits in one place. Online is about accessibility. Don’t employ people, it’s too expensive! Online is a place for personality, so a small team works. You need to think big about a small audience. The web is about the niche but you can connect to build a web – this is a business with the possibility of big revenue. Think big or go home! [Rory Brown]

Even if you are an individual you need to build your own brand. Mobile devices can put the newsroom in the field. You can feed this live onto a site. The ability to be technical is the ultimate tool for the entrepreneurial journalist. You need to be visible online and offline. Identify key voices. A support network is vital. The internet makes democratised stories and there’s a good business model to be made from good stories. [Alex Wood]

There’s a big difference between personal and personalised. If people aren’t taking ownership of your content, if they’re not using it on their own sites or in their own social networking spaces then your content loses validity. Social media is a new tool but it’s about what you can do with it. What do you have that people can connect with and feel there’s something substantial to it. You need to be able to create conversation outside your own brand. People build their own spaces and they want to interact there, not in your space. There’s an interesting development between the digital and the personal world. People want to be themselves online not split between brands and platforms. So collaborate and meet people offline. [Molly Flatt]

Linked Data and the Semantic Web

Linked data is not really linked in the sense in which we know i.e. hyperlinks. And it’s not really data in that it’s not all numbers. It’s about linking sociological processes that is governed by us, by our patterns and by the measures we need to make decisions on our behaviours. The web cannot distinguish these things. Linked data is meant to identify distinct entities so that the web may somehow be able to distinguish our behaviours in an intelligible way. It connects things in such a way that the web understands how they are connected. It’s trying to make the web more intelligent, more like how we think. In that out thinking is unique in the speed with which we can make distinctions. [Martin Moore]

Facts are sacred, bad data is sacrilege. Most time is spent putting crappy data in to usable formats. What causes huge problems is semantics (Burma or Myanmar, Congo or Dem Rep Congo or DRC, Slough Council or Slough UA). COINS in an example of how not to release data. Just a load of CSV files with millions of items – completely unusable. A new role with data is to put information out there and see what we get back from our audience. A good example of structured datasets – the Iraq War Logs from Wikileaks. It’s not journalism in a traditional sense but it is journalism. Linked data tells a story. But you have to know how to look at data. It makes a practical difference day to day on how journalists do their jobs. [Simon Rogers]

Content models rely on tags. Tags are entering into linked data. It means people can reuse content. Book reviews carry ISBN so that the content API can be queried by ISBN. Artists have tags which are put into MusicBrainz ID so people ca link to what they know will be the music artists and not just gobbledy gook. Data published in XML and JSON. It’s about publishing more data in formats that are being used rather than coming up with new formats. Make sure you have the right license for re-use. You then get a great amount of engagement specifically around what you are doing. It’s still all about the story. [Martin Belam]

Traditional publishing processes tend to struggle over time. You need to maximise the assets you have within traditional journalism. Once you get the tags sorted the model will handle all the data. Teach the model to infer and the story writes itself! Make a model that handles linking. When you have consistent and coherent linking you get very good SEO. The semantic web is doing for data what the web has done for documents. It creates a map for citizens to navigate society. [Silver Oliver]

Final Words

Digital news and data journalism is not a threat to mainstream journalism. It is the next step that needs to evolve to cope with the new technologies that are become societal, almost sentient in their embeddability within the social fabric. People demand usability, portability and individuality. The tech market has clearly provided these and created a new social space which people prefer to communicate. Digital communication has become a part of our individual doxa. This has now become societal praxis. What is preventing the mainstream from evolving is the inflexible structure of business management and the difficulty in shifting work flow paradigms. The journalism work place needs to look not at the technological functions but at technological praxis. [datamineruk]

This event hosted by The Guardian. They say:

“The web not only gives easy access to billions of statistics on every matter – from MP’s expenses to the location of every public convenience in the UK – but also provides the tools to visualise said information, giving a clarity of voice and an equality of access to stories that pre-web could never have been told on such a scale.

But the data revolution has also brought with it the risk of confusion, misinterpretation and inaccessibility. How do you know where to look? What is credible or up to date? Official documents are often published as uneditable pdf files for example – useless for analysis except in ways already done by the organisation itself.”

This discussion will be chaired by an expert panel (people I know) consisting of David McCandless of ‘Information is Beautiful’ fame, Heather Brooke of FOI fame, Simon Rogers of Guardian DataBlog fame and Richard Pope of ScraperWiki fame.

Data journalism: our five point guide – Simon Rogers

None of this is new – need to visualize data to make a point. Table in the Guardian in May 1981 – data has always been around and needed to know the truth. If you don’t know what’s going on how can you change things in society.

Now, public spending visualizations. Beautiful but a lot of work. But then government requests it. Now we all have the tools. A lot doesn’t even involve hard core programming. Need to be inspired by telling stories. Story needs to drive the editorial need to use data.

Only computers will know what to ask e.g. Wikileaks data. Technical skills and design needed but can be built upon. Not all data is interesting. Need to have a nose for data to learn what will be good for a data driven story. Raw data is just numbers without the design to make it beautiful.

It’s about sharing. Data needs to be made as open as possible! People out there have much better knowledge than journalists sitting in the office. We need to harness that knowledge.

Information is Beautiful – David McCandless

You need to see patterns and connections that matter in the data. That is data journalism. You need to orientate your audience, take them on a journey.

Data is abstract. You need to contextualize to understand what it means. Need to make it relevant. If you make it beautiful/interesting everyone will love it. Looking at graph of most common break up time according to Facebook.

We’re saturated with data. Data is the new soil. Visualizations are the earthy blossoms!

We are saturated by data but if we use the right journalistic inkling we can grow beautiful stories. Our fears visualized using Google Insights. Check it out at Columbine shooting and violent video games co-dependent?

Data as a prism – use it to correct your vision. Can take all the other top ten military budgets and fit it into America’s. But it’s a vastly rich country it can fit in all the other four top economies. So military budget as % of GDP? Myanmar is the biggest. Biggest arny = China. But as % population = North Korea.

The internet is a visualization design medium. we’ve been drenched in it. We’re constantly hunting for patterns in a sea of information. We’ve all been trained by our use of the web. We’re all information curious.

Heather Brooke

“The only way I could get answers to my questions to public bodies was through data”. Police in her local area were not turning up, she wanted to know was it just her. Only way you could tell was through officials logs and not their word.

Once you ask data starts trickling out. But needed around 50 requests! And in the form of a complex spreadsheet. Riven with factual inaccuracies. Data is only as good and usable as the person who gathers/inputs it. The pubic can’t be trusted with the raw data – attitude got from public bodies. Need Freedom of Information Act.

Open data needs to start from the top – MPs expenses. A democratic state has a right to openness. We need true open data.

MPs expenses shifted everyone’s notion of who the government were actually working for. MPs felt their expenses were their data, not ours.

Simon Jefferies

Different structured forms are needed for different data. The structure gives in power. Data within data within context. Very rich stories. A new way of journalism. All users to interrogate data themselves. Information architecture!

You have to be sure your fact is right!

Richard Pope – ScraperWiki

Data is rarely useable for journalists. Data is collected with journalists or the public interest in mind. ScraperWiki wants to make data useable and collaborative.

There’s a blending of skills needed to do datajournalism. We need to democratise these skills to break a story.

These are early days but we can see that journalism is changing. A computer is another tool. When a journalist makes a call it’s not called ‘telephone-assisted-reporting’. It’s not new, we just need to learn to use more and more data. And we need to understand it.

This will not be a specialised area, it will just be reporting! It all comes down to asking the right questions.

Questions being tossed around panel. Will go to twitter and throw them out. Join in.

Numerical information is becoming more and more important in news reporting. This is not only due to interactive web abilities (I’ll write another post on this) but because big news is news of scale.

For instance, the floods in Pakistan are now being put into context with figures. The number of people suffering from the massive floods in Pakistan exceeds 13 million — more than the combined total of the 2004 Indian Ocean tsunami,  the 2005 Kashmir earthquake and the 2010 Haiti earthquake, the United Nations said Monday (09/08/10). These figures are headline news even though it’s not new. We’ve just been given new context.

The scale of disasters are usually revealed in the aftermath. The clean up. That point in time when news cameras tend to move on. When donations tend to peter. But putting an ongoing disaster in terms of recent ones where images are fresh in people mind and scale is concretized in a pocket of their cerebral lobes is an effective way of getting people to give and keeping the story in the news (which is obviously the UN’s agenda).

It’s a shame that news organizations don’t just do it on their own. It’s not that hard. In fact, I’m going to give it a try! But data of scale generally come from press releases as in this case. DIY data is the niche of a few good news orgianizations.

For instance, check out this visualization from the ever impressive Guardian Data Blog. Not only does it give a good comparison of the amount of money donated, it gives the funding per head of population. Because generosity is not just how much you give the how much of what you have that you can give. I also like the prettier (i.e. not just circles) Weather Crisis 2010 map.

Pictures truely paint a thousand words but interactives make 3D movies. And the best data is shared data! Thank you Simon Rogers.