Posts Tagged ‘information is beautiful’

Although “data journalism” can encompass infographics, interactives, web apps, FOI, databases and a whole host of other numbering, coding, displaying techniques; the road less travelled-by has certain steps, turns and speed bumps. In that sense, here’s a list of things to tick off if you’re interested in going down the data journalism road:

  1. Know the legal boundaries – get to know the Data Protection Act 1998 and the sections on access to personal data and unstructured personal data held by authorities. Do not set foot on your journey without reading the section on exemptions relating to journalism. Use legislation as a reference by downloading the Mobile Legislate app.
  2. Look at data – get to know what is out there, what format it’s in and where it’s coming from. Places like, London Datastore, Office for National Statistics and Get the Data are good places to start for raw data but don’t forget, anything on the web is data. The best data are often hidden. Data can be text and pictures so even mining social media and catching the apps built from them can give you insight into what can be done with data.
  3. Read all about it – to make data and stats accessible you need to know how to frame them within a story. In that sense, you need to know how to undertand the stories they tell. That doesn’t mean going on a stats course. There are a lot of accessible reading material and I would recommend The Tiger That Isn’t.
  4. Get connected – find HacksHackers near you and join Meetup groups to point you in the right directions. Data journalists’ interests and abilities are unique to the individual (much like programmers) so don’t take text of advice as set in stone (the web changes too quickly for that!). Find your own way and your own set of people to guide you. Go to courses and conferences. Look outside the journalism bubble. Data is more than just news.
  5. Spread your bets – the easiest way to sort data is by using spreadsheets. Start with free options like Google Docs and OpenOffice. Industry standards include Microsoft Excel and Access. Learn to sort, filter and pivot. Find data you’re interested in and explore the data using your eyes balls. Know what each piece of software does and can do to the data before mashing it with another piece of software.
  6. Investigate your data – query it using the simple language SQL and the software MySQL. It’s a bit tricky to set up but by now you’ll know a hacker you can ask for help! Clean your data using Google Refine. There are tutorials and a help wiki. Know how these function not just how to navigate the user interfaces, as these will change. These products go through iterations much more quickly than the spreadsheet software.
  7. Map your data – from Google spreadsheets the easiest way to build a map is by using MapAList. There is a long list of mapping software from GeoCommons to ArcGIS. Find what’s easiest for you and most suitable for your data. See what landscapes can be revealed and hone in on areas of interest. Understand the limitations of mapping data, you’ll find devolution makes it difficult to get data for the whole of the UK and some postcodes will throw up errors.
  8. Make it pretty – visualize your data only once you fully understand it (source, format, timeframe, missing points, etc). Do not jump straight to this as visuals can be misleading. Useful and easy software solutions include Google Fusion Tables, Many Eyes and Tableau. Think of unique ways to present data by checking out what the graphics teams at news organizations have made but also what design sites such as Information is Beautiful and FlowingData are doing.
  9. Make your community – don’t just find one, build one. This area in journalism is constantly changing and for you to keep up you’ll need to source a custom made community. So blog and tweet but also source ready-made online communities from places like the European Journalism Centre, National Institute for Computer Assisted Reporting (NICAR), BuzzData and DataJournalismBlog.
  10. Scrape it – do not be constrained by data. Liberate it, mash it, make it useable. Just like a story, data is unique and bad data journalism comes with constraining the medium containing it. With code, there is no need to make the story ‘fit’ into the medium. “The Medium is the Message” (a la Marshall McLuhan). Scrape the data using ScraperWiki and make applications beyond story telling. Make data open. For examples check out OpenCorporates, Schooloscope and Planning Alerts. If you’re willing to give coding a try, this book called “Learning Python the Hard Way” is actually the easiest way to learn for the non-programmer. There is also a Google Group for Python Journalists you should join.
These are guidelines and not a map for your journey. Your beat, the data landscape, changes at the speed of web. You just need to be able to read the signs of the land as there’s no end point, no goal and no one to guide you.

This event hosted by The Guardian. They say:

“The web not only gives easy access to billions of statistics on every matter – from MP’s expenses to the location of every public convenience in the UK – but also provides the tools to visualise said information, giving a clarity of voice and an equality of access to stories that pre-web could never have been told on such a scale.

But the data revolution has also brought with it the risk of confusion, misinterpretation and inaccessibility. How do you know where to look? What is credible or up to date? Official documents are often published as uneditable pdf files for example – useless for analysis except in ways already done by the organisation itself.”

This discussion will be chaired by an expert panel (people I know) consisting of David McCandless of ‘Information is Beautiful’ fame, Heather Brooke of FOI fame, Simon Rogers of Guardian DataBlog fame and Richard Pope of ScraperWiki fame.

Data journalism: our five point guide – Simon Rogers

None of this is new – need to visualize data to make a point. Table in the Guardian in May 1981 – data has always been around and needed to know the truth. If you don’t know what’s going on how can you change things in society.

Now, public spending visualizations. Beautiful but a lot of work. But then government requests it. Now we all have the tools. A lot doesn’t even involve hard core programming. Need to be inspired by telling stories. Story needs to drive the editorial need to use data.

Only computers will know what to ask e.g. Wikileaks data. Technical skills and design needed but can be built upon. Not all data is interesting. Need to have a nose for data to learn what will be good for a data driven story. Raw data is just numbers without the design to make it beautiful.

It’s about sharing. Data needs to be made as open as possible! People out there have much better knowledge than journalists sitting in the office. We need to harness that knowledge.

Information is Beautiful – David McCandless

You need to see patterns and connections that matter in the data. That is data journalism. You need to orientate your audience, take them on a journey.

Data is abstract. You need to contextualize to understand what it means. Need to make it relevant. If you make it beautiful/interesting everyone will love it. Looking at graph of most common break up time according to Facebook.

We’re saturated with data. Data is the new soil. Visualizations are the earthy blossoms!

We are saturated by data but if we use the right journalistic inkling we can grow beautiful stories. Our fears visualized using Google Insights. Check it out at Columbine shooting and violent video games co-dependent?

Data as a prism – use it to correct your vision. Can take all the other top ten military budgets and fit it into America’s. But it’s a vastly rich country it can fit in all the other four top economies. So military budget as % of GDP? Myanmar is the biggest. Biggest arny = China. But as % population = North Korea.

The internet is a visualization design medium. we’ve been drenched in it. We’re constantly hunting for patterns in a sea of information. We’ve all been trained by our use of the web. We’re all information curious.

Heather Brooke

“The only way I could get answers to my questions to public bodies was through data”. Police in her local area were not turning up, she wanted to know was it just her. Only way you could tell was through officials logs and not their word.

Once you ask data starts trickling out. But needed around 50 requests! And in the form of a complex spreadsheet. Riven with factual inaccuracies. Data is only as good and usable as the person who gathers/inputs it. The pubic can’t be trusted with the raw data – attitude got from public bodies. Need Freedom of Information Act.

Open data needs to start from the top – MPs expenses. A democratic state has a right to openness. We need true open data.

MPs expenses shifted everyone’s notion of who the government were actually working for. MPs felt their expenses were their data, not ours.

Simon Jefferies

Different structured forms are needed for different data. The structure gives in power. Data within data within context. Very rich stories. A new way of journalism. All users to interrogate data themselves. Information architecture!

You have to be sure your fact is right!

Richard Pope – ScraperWiki

Data is rarely useable for journalists. Data is collected with journalists or the public interest in mind. ScraperWiki wants to make data useable and collaborative.

There’s a blending of skills needed to do datajournalism. We need to democratise these skills to break a story.

These are early days but we can see that journalism is changing. A computer is another tool. When a journalist makes a call it’s not called ‘telephone-assisted-reporting’. It’s not new, we just need to learn to use more and more data. And we need to understand it.

This will not be a specialised area, it will just be reporting! It all comes down to asking the right questions.

Questions being tossed around panel. Will go to twitter and throw them out. Join in.