Seeing as I like to fly in the face of tradition, I’m going to turn things on it’s head and write a blog post of how I did it before I publish what “it” actually is. That is, I have scraped all the Cabinet Office spending data, cleaned it up and extracted it. But before I tell you what I’ve found (indeed, I haven’t got around to that properly yet!), I’m going to tell you how I found it.
Firstly, I scraped this page to pull out all the CSV files and put all the data in the ScraperWiki datastore. The scraper can be found here. It has over 1,200 lines of code but don’t worry, I did very little of the work myself! Spending data is very messy with trailing spaces, inconsistent capitals and various phenotypes. So I scraped the raw data which you can find in the “swdata” tab. I downloaded this and plugged it into Google Refine. I used the text facet functions to clean up the suppliers’ names as best I could (I figured these were of the most interest and would be more suitable for cleaning). This can be done by going into the “Undo/Redo” tab and clicking on “Extract…”. Select the processes you want the code for, then copy the right hand box. I pasted this prepackaged code into my scraper.
So if you want the cleaned data make sure you select the “Refined” table by hitting the tab and selecting “Download spreadsheet (CSV)”. If you want to use the amount as a numerical field (it was not put in as such in the original!) to get totals for each supplier, for example, you’ll have to use the refined table as I had to code to get the “Amount” as numbers. Or if you know a bit of SQL and want to query the data from ScraperWiki you can use my viewer to be found here. Either way, here is the data. I have already found something of interest which I am chasing but if you’re interested in data journalism here is a data set to play with. Before I can advocate using, developing and refining the tools needed for data journalism I need journalists (and anyone interested) to actually look at data. So before I say anything of what I’ve found, here are my materials plus the process I used to get them. Just let me know what you find and please publish it!
Here is a table of the top 10 receivers of Cabinet Office money. I’ve put the image in here but the original is a view that feeds off the scraper so as the data gets published, this table should update. So the information becomes living information not a static visual. The story is being told not catalogued.
Oh and V is V inspired youth volunteering. They received nearly £44 million over a nine month period. On their website they say they have received over £48 million from the private sector. I imagine £44 million of that has come straight from the Cabinet Office. The Big Society seems to be costing the government a lot of money at the moment even though they say it will be mostly funded by the private sector.