Posts Tagged ‘visualization’


If you have been keeping an eye on my blog you’ll know I scraped Cabinet Office Spending data. Few journalists will look at the mountain of CSVs on government data. Even fewer will code enough to scrape them, although a lot of them want to do this and I believe it will address the former problem. More news institutions are interested in using data to create visualizations for their users. Give them something to play with and they spend more time on your site. So I’ve created my first visual from scraped data. Click on the image to be taken to the view page (sadly WordPress can’t embed ScraperWiki views).

With help from the amazing Tom Mortimer-Jones and Ross Jones at ScraperWiki, I made a word cloud with a date slider for all the companies the Cabinet Office give money to . This is realtime visualization (well as realtime as the data release). Here is the scraper where you can download all the data. I refined the supplier names using Google Refine and you can see this result in the ‘Refined’ table. I made the word cloud in this view. I summed up all the spending for each supplier in the SQL query and used the logarithmic value to be the font size of the supplier’s name in the word cloud. The final view then calls up the word cloud with for the date range selected on the date slider (code was nicked from this JQuery plugin) by plugging the request into the SQL query of the word cloud view. That might seem very confusing but this blog is for all my workings. The code is open so you can take a look. I am opensource.

I want this to be a preview of what is possible. All government bodies are now required by law to release spending data of over £25,000. That’s a lot of data from a lot of bodies. OpenSpending will be tackling this. My thoughts have been about trying to get journalists/bloggers/students learning to scrape. I figure the most useful type of scraping for journalists will be CSV scraping. So I want volunteers to take the journey I have done with this view and learn to scrape one spending dataset.

If I get 20 such people to work together to build a resilient scraper from a template then they can learn from each other i.e. when one person’s scraper breaks because a new bug has been introduced, no doubt one of the other 19 volunteers has come across and dealt with that same bug in their learning process. So by maintaining a community of scrapers the community will be learning to scrape. And the community can do more with the data. For example, by adding category columns such as central government, health, work and pensions, etc, these can be used as filters for the visualization (and to interrogate the data).

It’s an idea, for an experiment. I’ll let you know how I get on. In theory this view can be kept as up-to-date as the date!


So far I’ve called myself a data journalist. But then again Paris Hilton calls herself a business woman. From my previous post, you can see my interest growing. But I haven’t really done anything. I am trying to learn the skills. These skills aren’t actually known and neither is the job description. But the best way to learn is to do. So here’s something I actually worked on.

This is a visual made from the most inaccessible (both data and journalistically) PDFs of the National Asset Register. the information it contained was used for a Dispatches live debate and this repurposing was put into an article on the Channel 4 News website. I was fortunate enough to be part of the ScraperWiki team that took on the project and produced it in a matter of days. I have written a blog post on ScraperWiki here.

We also made a map of county council brownfield sites available for redevelopment which featured on Channel 4. I actually made the scraper for this data set as it was contained on Excel sheets by region on the Homes and Communities Agency website.  The links to all the scrapers and code can be found on the ScraperWiki blogpost.

These show something of what ScraperWiki can do. I particularly like the fact that the bubbles link back to the data in the PDF. I think if you engage people in a data driven story with simple and effective visuals then they can consume the raw data, and possibly provide better insights.

The map is good in that it allows users to get to their local data, where it matters to them. Local information from a global story. Yet what tickled my journalistic senses is the use of feedback. I suggested to the ScraperWiki team that we always have a feedback from what we do. We got responses on the story. Not great ones. One rather bluntly told us that a school was already being built on the land. The latest data available is only as recent as 2008.

But I engaged this user and found that the bluntness of the response was owing to the fact that the land was ‘once a well loved open space’. If he/she felt so strongly about this change of use I suggested requesting the consultation documents from the council under the Freedom of Information Act and pointed in the direction of WhatDoTheyKnow. Part of ScraperWiki’s remit is building a data democracy and data driven journalism should go some way to promoting an information democracy. I think news organizations fear this as their revenue is linked to their role as information gatekeepers. But social media and the web is breaking down this ideology.

Power comes in breaking down information structures. I wanted to do more with the asset bubbles. Looking back, the orbits are connected to the bureaucratic structure of the data. Given enough time, I would have liked the visual to build an asset pyramid. Where larger bubbles (those with assets of the highest value) would float to the top and lots of little bubbles would form the base of the pyramid. So by looking down you see the more asset intense areas of government which the country has invested in. But when you smash the bubble the components would then fall to their various levels. So by looking into the levels you see all the little areas, museums, barracks, hospital, etc, that equate to similar fixed asset investments. This would break through the departmental structure that was built in the chapters of PDFs.

For that, I need to learn to code. So I’d better get back. Again, if I’m able to do anything of interest you’ll hear about it!

Words are data also and this video by Sir Ken Robinson shows how animation can make real time visuals for stories which are information rich and picture poor. I really love this. Graphics, to a certain extent, makes the story more understandable and much more entertaining.

And for all those interested in the topic of education (which should be everyone) here’s Sir Ken Robinson’s TED talk. His angle on how education has been structured to almost stifle creativity, I think is particlarly relevent to new journalism. Particularly data journalism as the boundries are being extended by those trying something new and so could never take the form of teachers. You can’t teach something that will only take shape in a future no one can see.