Cabinet Office Spending Data Liberated

Posted: May 17, 2011 in Data Journalism, Good Data, My Data Journey
Tags: , , , , ,

Seeing as I like to fly in the face of tradition, I’m going to turn things on it’s head and write a blog post of how I did it before I publish what “it” actually is. That is, I have scraped all the Cabinet Office spending data, cleaned it up and extracted it. But before I tell you what I’ve found (indeed, I haven’t got around to that properly yet!), I’m going to tell you how I found it.

Firstly, I scraped this page to pull out all the CSV files and put all the data in the ScraperWiki datastore. The scraper can be found here. It has over 1,200 lines of code but don’t worry, I did very little of the work myself! Spending data is very messy with trailing spaces, inconsistent capitals and various phenotypes. So I scraped the raw data which you can find in the “swdata” tab. I downloaded this and plugged it into Google Refine. I used the text facet functions to clean up the suppliers’ names as best I could (I figured these were of the most interest and would be more suitable for cleaning). This can be done by going into the “Undo/Redo” tab and clicking on “Extract…”. Select the processes you want the code for, then copy the right hand box. I pasted this prepackaged code into my scraper.

So if you want the cleaned data make sure you select the “Refined” table by hitting the tab and selecting “Download spreadsheet (CSV)”. If you want to use the amount as a numerical field (it was not put in as such in the original!) to get totals for each supplier, for example, you’ll have to use the refined table as I had to code to get the “Amount” as numbers. Or if you know a bit of SQL and want to query the data from ScraperWiki you can use my viewer to be found here. Either way, here is the data. I have already found something of interest which I am chasing but if you’re interested in data journalism here is a data set to play with. Before I can advocate using, developing and refining the tools needed for data journalism I need journalists (and anyone interested) to actually look at data. So before I say anything of what I’ve found, here are my materials plus the process I used to get them. Just let me know what you find and please publish it!

————————

Here is a table of the top 10 receivers of Cabinet Office money. I’ve put the image in here but the original is a view that feeds off the scraper so as the data gets published, this table should update. So the information becomes living information not a static visual. The story is being told not catalogued.

Oh and V is V inspired youth volunteering. They received nearly £44 million over a nine month period. On their website they say they have received over £48 million from the private sector. I imagine £44 million of that has come straight from the Cabinet Office. The Big Society seems to be costing the government a lot of money at the moment even though they say it will be mostly funded by the private sector.

Comments
  1. nicolahughes says:

    One simple and obvious thing to look for is which supplier received the most from the Cabinet Office. The answer is V, V inspired youth volunteering. They received nearly £44 million over a nine month period. On their website (http://vinspired.com) they say they have received over £48 million from the private sector. I imagine £44 million of that has come straight from the Cabinet Office. The Big Society seems to be costing the government a lot of money at the moment even though they say it will be mostly funded by the private sector.

  2. […] you have been keeping an eye on my blog you’ll know I scraped Cabinet Office Spending data. Few journalists will look at the mountain of CSVs on government data. Even fewer will code […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s