10 Things to Get Started
Although “data journalism” can encompass infographics, interactives, web apps, FOI, databases and a whole host of other numbering, coding, displaying techniques; the road less travelled-by has certain steps, turns and speed bumps. In that sense, here’s a list of things to tick off if you’re interested in going down the data journalism road:
- Know the legal boundaries – get to know the Data Protection Act 1998 and the sections on access to personal data and unstructured personal data held by authorities. Do not set foot on your journey without reading the section on exemptions relating to journalism. Use legislation as a reference by downloading the Mobile Legislate app.
- Look at data – get to know what is out there, what format it’s in and where it’s coming from. Places like Data.gov.uk, London Datastore, Office for National Statistics and Get the Data are good places to start for raw data but don’t forget, anything on the web is data. The best data are often hidden. Data can be text and pictures so even mining social media and catching the apps built from them can give you insight into what can be done with data.
- Read all about it – to make data and stats accessible you need to know how to frame them within a story. In that sense, you need to know how to undertand the stories they tell. That doesn’t mean going on a stats course. There are a lot of accessible reading material and I would recommend The Tiger That Isn’t.
- Get connected – find HacksHackers near you and join Meetup groups to point you in the right directions. Data journalists’ interests and abilities are unique to the individual (much like programmers) so don’t take text of advice as set in stone (the web changes too quickly for that!). Find your own way and your own set of people to guide you. Go to courses and conferences. Look outside the journalism bubble. Data is more than just news.
- Spread your bets – the easiest way to sort data is by using spreadsheets. Start with free options like Google Docs and OpenOffice. Industry standards include Microsoft Excel and Access. Learn to sort, filter and pivot. Find data you’re interested in and explore the data using your eyes balls. Know what each piece of software does and can do to the data before mashing it with another piece of software.
- Investigate your data – query it using the simple language SQL and the software MySQL. It’s a bit tricky to set up but by now you’ll know a hacker you can ask for help! Clean your data using Google Refine. There are tutorials and a help wiki. Know how these function not just how to navigate the user interfaces, as these will change. These products go through iterations much more quickly than the spreadsheet software.
- Map your data – from Google spreadsheets the easiest way to build a map is by using MapAList. There is a long list of mapping software from GeoCommons to ArcGIS. Find what’s easiest for you and most suitable for your data. See what landscapes can be revealed and hone in on areas of interest. Understand the limitations of mapping data, you’ll find devolution makes it difficult to get data for the whole of the UK and some postcodes will throw up errors.
- Make it pretty – visualize your data only once you fully understand it (source, format, timeframe, missing points, etc). Do not jump straight to this as visuals can be misleading. Useful and easy software solutions include Google Fusion Tables, Many Eyes and Tableau. Think of unique ways to present data by checking out what the graphics teams at news organizations have made but also what design sites such as Information is Beautiful and FlowingData are doing.
- Make your community – don’t just find one, build one. This area in journalism is constantly changing and for you to keep up you’ll need to source a custom made community. So blog and tweet but also source ready-made online communities from places like the European Journalism Centre, National Institute for Computer Assisted Reporting (NICAR), BuzzData andDataJournalismBlog.
- Scrape it – do not be constrained by data. Liberate it, mash it, make it useable. Just like a story, data is unique and bad data journalism comes with constraining the medium containing it. With code, there is no need to make the story ‘fit’ into the medium. “The Medium is the Message” (a la Marshall McLuhan). Scrape the data using ScraperWiki and make applications beyond story telling. Make data open. For examples check out OpenCorporates, Schooloscope and Planning Alerts. If you’re willing to give coding a try, this book called “Learning Python the Hard Way” is actually the easiest way to learn for the non-programmer. There is also a Google Group for Python Journalists you should join.
These are guidelines and not a map for your journey. Your beat, the data landscape, changes at the speed of web. You just need to be able to read the signs of the land as there’s no end point, no goal and no one to guide you.
Note: if you use third party software you need to understand their legal guidelines and take down policies. Your publication may fight the public interest case to use data from Wikileaks, for example, but the third party may not legally be able to do so and so could take down your visual/application.