Posts Tagged ‘datajournalism’

Although “data journalism” can encompass infographics, interactives, web apps, FOI, databases and a whole host of other numbering, coding, displaying techniques; the road less travelled-by has certain steps, turns and speed bumps. In that sense, here’s a list of things to tick off if you’re interested in going down the data journalism road:

  1. Know the legal boundaries – get to know the Data Protection Act 1998 and the sections on access to personal data and unstructured personal data held by authorities. Do not set foot on your journey without reading the section on exemptions relating to journalism. Use legislation as a reference by downloading the Mobile Legislate app.
  2. Look at data – get to know what is out there, what format it’s in and where it’s coming from. Places like, London Datastore, Office for National Statistics and Get the Data are good places to start for raw data but don’t forget, anything on the web is data. The best data are often hidden. Data can be text and pictures so even mining social media and catching the apps built from them can give you insight into what can be done with data.
  3. Read all about it – to make data and stats accessible you need to know how to frame them within a story. In that sense, you need to know how to undertand the stories they tell. That doesn’t mean going on a stats course. There are a lot of accessible reading material and I would recommend The Tiger That Isn’t.
  4. Get connected – find HacksHackers near you and join Meetup groups to point you in the right directions. Data journalists’ interests and abilities are unique to the individual (much like programmers) so don’t take text of advice as set in stone (the web changes too quickly for that!). Find your own way and your own set of people to guide you. Go to courses and conferences. Look outside the journalism bubble. Data is more than just news.
  5. Spread your bets – the easiest way to sort data is by using spreadsheets. Start with free options like Google Docs and OpenOffice. Industry standards include Microsoft Excel and Access. Learn to sort, filter and pivot. Find data you’re interested in and explore the data using your eyes balls. Know what each piece of software does and can do to the data before mashing it with another piece of software.
  6. Investigate your data – query it using the simple language SQL and the software MySQL. It’s a bit tricky to set up but by now you’ll know a hacker you can ask for help! Clean your data using Google Refine. There are tutorials and a help wiki. Know how these function not just how to navigate the user interfaces, as these will change. These products go through iterations much more quickly than the spreadsheet software.
  7. Map your data – from Google spreadsheets the easiest way to build a map is by using MapAList. There is a long list of mapping software from GeoCommons to ArcGIS. Find what’s easiest for you and most suitable for your data. See what landscapes can be revealed and hone in on areas of interest. Understand the limitations of mapping data, you’ll find devolution makes it difficult to get data for the whole of the UK and some postcodes will throw up errors.
  8. Make it pretty – visualize your data only once you fully understand it (source, format, timeframe, missing points, etc). Do not jump straight to this as visuals can be misleading. Useful and easy software solutions include Google Fusion Tables, Many Eyes and Tableau. Think of unique ways to present data by checking out what the graphics teams at news organizations have made but also what design sites such as Information is Beautiful and FlowingData are doing.
  9. Make your community – don’t just find one, build one. This area in journalism is constantly changing and for you to keep up you’ll need to source a custom made community. So blog and tweet but also source ready-made online communities from places like the European Journalism Centre, National Institute for Computer Assisted Reporting (NICAR), BuzzData and DataJournalismBlog.
  10. Scrape it – do not be constrained by data. Liberate it, mash it, make it useable. Just like a story, data is unique and bad data journalism comes with constraining the medium containing it. With code, there is no need to make the story ‘fit’ into the medium. “The Medium is the Message” (a la Marshall McLuhan). Scrape the data using ScraperWiki and make applications beyond story telling. Make data open. For examples check out OpenCorporates, Schooloscope and Planning Alerts. If you’re willing to give coding a try, this book called “Learning Python the Hard Way” is actually the easiest way to learn for the non-programmer. There is also a Google Group for Python Journalists you should join.
These are guidelines and not a map for your journey. Your beat, the data landscape, changes at the speed of web. You just need to be able to read the signs of the land as there’s no end point, no goal and no one to guide you.

So I say this is a journey, I say this is an experiment. In that case this blog is my journal/lab book. I am coming up to the end of the Knight-Mozilla Learning Lab and you’ll read my proposal shortly. Although it wasn’t coding-led, it did highlight the importance of coding for the future of journalism, and I was very happy to be invited to take part as it brought me back to my student days of lectures and homework. I’m starting to learn to code from the bottom up and UX and design are more top end programming. I wanted to leave that until later but the opportunity arose sooner.

Opportunities, that’s something I haven’t been short of of late. Conferences, webinars and workshops. I am now giving them rather than partaking. I’ve had to turn down an invite to the NUJ seminars. My blog has gone from sink to source. The top 2 search terms are ‘datamineruk’ and ‘data miner uk’. I never planned for all this and the 1,000+ Twitter followers. This was all supposed to be an internal tool, for learning. But learning has turned to teaching and teaching turns to knowing. And knowing is what turns you from a sink to a source. But the cycle must continue. And that’s my big concern. That’s why I am attracted to data journalism. You can never know it all.

Maybe that’s what detracts traditional journalists from the niche. Or maybe they don’t feel learning is part of the job. I am a typical nerd. I need to study. But I’m disillusioned with institutional education, so I’m setting up my own course work with no real qualifications other than satisfaction. For me, it’s about the knowing and my downfall is being able to mediate that. I come from a strong scientific background where discovery is my strong point and mediation my sore point. My communication was good when I was in science, but in journalsim it was used as stick to beat me. I didn’t see the point of making stories ‘punchy’ when they came from press releases.

I find myself now, a datajournalism advocate which puts me as a person in the limelight more than I had intended. I haven’t put my picture or CV on this site for a reason. It’s not about me. However, a really lovely journalism graduate interviewed me for her blog, datajournalismblog, so here I am:

It’s a very good site. I recommend you join. For most people who get into broadcast journalism it’s all about them and getting their face known. For me, it was because I am more articulate at conversation than writing as I’m used to academic writing. That and I liked the constructive nature of filming and editing. Although I am enjoying that aspect of programming more, and data mining using ScraperWiki. Or ‘the pursuit of facts in plain sight’ as I now like to call it thanks to Evan Hansen, Chief Editor of

That being said, the Knight-Mozilla Learning Lab has taught me some great lessons that apply to journalism as much as code. Chris Heilmann, Mozilla Developer Evangelist, said “The web is amazing but where is the amazing?“, the same can be said for journalism. Jesse James Garrett, cofounder of Adaptive Path, said “Good design has human experience as the outcomes and engagement as the goal“, so should journalism. Oliver Reichenstein, CEO of iA, said “Really understand what you need to do. If you don’t you can’t work” in terms of prototyping but the same can be said for journalism. Echoing my blog, Mohamed Nanabhay, Head of Online at Al Jazeera English, said “Any [news] technology project should solve problems journalists have, even [ones] they don’t know they have“. Reflecting my life mantra, Shazna Nessa, Director of Interactive at the Associated Press, said “Frustrations is part of the challenge, don’t let it poison your mission“.

Building a new product, working for a new business, exploring a new area of journalism means taking risks. I like taking risks. If you don’t take risks you can’t get lucky.

#opendata from Open Knowledge Foundation on Vimeo.

You might be wondering what this short documentary has to do with journalism or even what open data has to do with journalism. No doubt you are aware that journalism has been facing a ‘crisis’ for a while now. Not just because of the recession and shrinking advertisers but because of the dominance of the web for getting information to people and allowing them to share amongst themselves.

Open data activists are working with the web to provide information in a way people can engage with and ultimately feel empowered by. Projects like FixMyStreet and Schooloscope are emblematic of this rise in civic engagement projects. Indeed, crime mapping in San Francisco led to local citizens demanding more policing in areas of high crime and a change in the policing schedule to reflect the hours when crime is at its highest.

News used to have some responsibility in this area of engagement but never quite understood the field or didn’t know quite what to do with it. Now they have lost complete control and the masters of the web platforms are again taking informational control of a growing area of interest. But news organizations are missing a very important trick. Data driven journalist, Mirko Lorenz, has written how News organizations must become hubs of trusted data in a market seeking (and valuing) trust.

Which is why I think anyone interested in the area of data journalism should watch this documentary, as not only should traditional media be training journalists to engage with this new streaming of social and civic data, but managers and execs should think about the possible shifting in the traditional media market away from advertising and towards the trust market.




Hacks and Hackers Hack Day


Friday, 25 March


Viewing Theatre,
Pacific Quay,


Just read up on how #hhhCar went on the ScraperWiki blog. There were schools from space and a catering college with a Food Hygiene Standard rating of 2!




A Hacks and Hackers Hack Day – where journalists (hacks) and developers (hackers) spend the day working with data and scraping away to reveal what lies within.


Friday, 11th March at 09:00 am


Cardiff School of Creative & Cultural Industries
University of Glamorgan, Adam Street
CF24 2FN Cardiff


Come along and see! It’s an opportunity to work in a team you’ve probably never worked with before so you’d be amazed what you can make. Oh, and there’ll be prizes for the best projects!

So far I’ve called myself a data journalist. But then again Paris Hilton calls herself a business woman. From my previous post, you can see my interest growing. But I haven’t really done anything. I am trying to learn the skills. These skills aren’t actually known and neither is the job description. But the best way to learn is to do. So here’s something I actually worked on.

This is a visual made from the most inaccessible (both data and journalistically) PDFs of the National Asset Register. the information it contained was used for a Dispatches live debate and this repurposing was put into an article on the Channel 4 News website. I was fortunate enough to be part of the ScraperWiki team that took on the project and produced it in a matter of days. I have written a blog post on ScraperWiki here.

We also made a map of county council brownfield sites available for redevelopment which featured on Channel 4. I actually made the scraper for this data set as it was contained on Excel sheets by region on the Homes and Communities Agency website.  The links to all the scrapers and code can be found on the ScraperWiki blogpost.

These show something of what ScraperWiki can do. I particularly like the fact that the bubbles link back to the data in the PDF. I think if you engage people in a data driven story with simple and effective visuals then they can consume the raw data, and possibly provide better insights.

The map is good in that it allows users to get to their local data, where it matters to them. Local information from a global story. Yet what tickled my journalistic senses is the use of feedback. I suggested to the ScraperWiki team that we always have a feedback from what we do. We got responses on the story. Not great ones. One rather bluntly told us that a school was already being built on the land. The latest data available is only as recent as 2008.

But I engaged this user and found that the bluntness of the response was owing to the fact that the land was ‘once a well loved open space’. If he/she felt so strongly about this change of use I suggested requesting the consultation documents from the council under the Freedom of Information Act and pointed in the direction of WhatDoTheyKnow. Part of ScraperWiki’s remit is building a data democracy and data driven journalism should go some way to promoting an information democracy. I think news organizations fear this as their revenue is linked to their role as information gatekeepers. But social media and the web is breaking down this ideology.

Power comes in breaking down information structures. I wanted to do more with the asset bubbles. Looking back, the orbits are connected to the bureaucratic structure of the data. Given enough time, I would have liked the visual to build an asset pyramid. Where larger bubbles (those with assets of the highest value) would float to the top and lots of little bubbles would form the base of the pyramid. So by looking down you see the more asset intense areas of government which the country has invested in. But when you smash the bubble the components would then fall to their various levels. So by looking into the levels you see all the little areas, museums, barracks, hospital, etc, that equate to similar fixed asset investments. This would break through the departmental structure that was built in the chapters of PDFs.

For that, I need to learn to code. So I’d better get back. Again, if I’m able to do anything of interest you’ll hear about it!