Posts Tagged ‘Cabinet Office’

If you follow Scrape_No10 on twitter you will be receiving all the meetings, gifts and hospitalities received at No.10 by ministers, special advisers and permanent secretaries. If you follow #Scrape10 then you should be getting those as well as all the tweets relating to any item of data. The current database should be tweeted out by July.

The information contained in each tweet comes from the data published by the Cabinet Office. I scraped each data set and sorted the aggregated information chronologically, so that it can be tweeted out in the order in which the events happened (not every day was given). The links to the original data source and the scraped data are given at the bottom of this post.

With respect to what I have done, I would like to remind people that Freedom of Information does not equate to contextual knowledge regarding the information or making it useful. I am a data journalist. Data is my beat. But data is a public right. Not just your data but also the data of the people who work for you – government data.

Journalism involves information but also conversation. Each data entry now has the ability to start a conversation. Just use #Scrape10. If a tweet is interesting or someone somewhere has added a piece of news relating to the tweet, then #Scrape10 should trend and the tweet should be sent around to the community where the information matters. That’s the theory.

Information is now socially enabled and should be socially enabling. What you would like to know, what matters to you or your wider social community should not only be made available to you but should be made useable in a way that matters.

I am an experimental data journalist, playing with code. All my source code for collecting this data is open and you can download the entire dataset. The code to get it onto twitter is not available, as the publication of the authorisation keys would allow people to hack into the account. I have also written a scraper to store #Scrape10 tweets into a database everyday, so you can catch them all here if you want to.

You can also read a previous post on the Special Advisers’ gifts and hospitality dataset here.

Source Scraper
Permanent Secretaries’ Meetings Permanent Secretaries’ Meetings
Ministerial Meetings Ministerial Meetings
Ministerial Hospitality Ministerial Hospitality
Ministerial Gifts Ministerial Gifts
Special Advisers’ Gifts and Hospitality Special Advisers’ Gifts and Hospitality

Seeing as I like to fly in the face of tradition, I’m going to turn things on it’s head and write a blog post of how I did it before I publish what “it” actually is. That is, I have scraped all the Cabinet Office spending data, cleaned it up and extracted it. But before I tell you what I’ve found (indeed, I haven’t got around to that properly yet!), I’m going to tell you how I found it.

Firstly, I scraped this page to pull out all the CSV files and put all the data in the ScraperWiki datastore. The scraper can be found here. It has over 1,200 lines of code but don’t worry, I did very little of the work myself! Spending data is very messy with trailing spaces, inconsistent capitals and various phenotypes. So I scraped the raw data which you can find in the “swdata” tab. I downloaded this and plugged it into Google Refine. I used the text facet functions to clean up the suppliers’ names as best I could (I figured these were of the most interest and would be more suitable for cleaning). This can be done by going into the “Undo/Redo” tab and clicking on “Extract…”. Select the processes you want the code for, then copy the right hand box. I pasted this prepackaged code into my scraper.

So if you want the cleaned data make sure you select the “Refined” table by hitting the tab and selecting “Download spreadsheet (CSV)”. If you want to use the amount as a numerical field (it was not put in as such in the original!) to get totals for each supplier, for example, you’ll have to use the refined table as I had to code to get the “Amount” as numbers. Or if you know a bit of SQL and want to query the data from ScraperWiki you can use my viewer to be found here. Either way, here is the data. I have already found something of interest which I am chasing but if you’re interested in data journalism here is a data set to play with. Before I can advocate using, developing and refining the tools needed for data journalism I need journalists (and anyone interested) to actually look at data. So before I say anything of what I’ve found, here are my materials plus the process I used to get them. Just let me know what you find and please publish it!

————————

Here is a table of the top 10 receivers of Cabinet Office money. I’ve put the image in here but the original is a view that feeds off the scraper so as the data gets published, this table should update. So the information becomes living information not a static visual. The story is being told not catalogued.

Oh and V is V inspired youth volunteering. They received nearly £44 million over a nine month period. On their website they say they have received over £48 million from the private sector. I imagine £44 million of that has come straight from the Cabinet Office. The Big Society seems to be costing the government a lot of money at the moment even though they say it will be mostly funded by the private sector.

The Cabinet Office, in a move towards greater transparency, are attempting to publish all their data online. This isn’t really news but I don’t think news organizations are looking at this data so I’m scraping it and seeing what it has to offer. So as an exercise I’m scraping the page where ministerial gifts, hospitality, travel and meetings with external organisations are published as CSV or PDF. All this should be pretty much covered by Who’s Lobbying but I’m hoping to set up a little social media experiment (more on that to come). So here is all the data, set to scrape the site every month. You can download it all.

I whacked it into Google Refine to deal with the different spellings, nuances and the change in the format of the date. The date transformation option never seems to work for me in Refine so I exported it and opened it up in Excel to get the data out in chronological order. This may sound cumbersome to those who don’t work with data it’s actually quite quick and easy once you’ve tried it. Anyway, I looked at some of the more popular reasons for meeting ministers and grabbed a screen shot of the Excel table (Refine allows you to export a html table but I’ll have to get it to open up in Firefox so I can use my full page grab add-on).

I looked at the meetings for Big Society:

The major meeting with the Prime Minister and Deputy Prime Minister in May involved Young Foundation, Community Links, Antigone, Big Society Network, Balsall Health Forum, London Citizens, Participle, Talk About Local, CAN Breakthrough, Mayor of Middlesborough, Business in the Community, Esmee Fairbairn, Greener Leith, St Giles Trust, Big Issue Invest, Kids Company. Since then there has been a steady trickle of over 30 meetings with Nick Hurd, Oliver Letwin and Francis Maude about Big Society. Note that these are all Conservative MPs so the Big Society is already looking smaller along coalition party lines.

Sure, they have the titles to be involved but the trend in the data seems to be more about big financing. Meetings with the likes of Goldman Sachs, Barclays, British Banking Association and Co-op Financial Servies leads one to believe that Big Society is being outsourced to local communities but the big financing has to come from the top. In Building the Big Society, the Cabinet Office writes:

We will use funds from dormant bank accounts to establish a Big Society
Bank, which will provide new finance for neighbourhood groups, charities,
social enterprises and other nongovernmental bodies

What are ‘funds from dormant bank accounts’ and why didn’t they use these instead of looking to the government to bail them out? The banks and their reckless trading in toxic assets and credit default swaps led to a massive recession. This shed the light on reckless government borrowing and the massive deficit. This led to budget cuts to local services and the need for the Big Society. Which is now being funded by the banks! Am I missing something?

The next thing to look at from the data is the category ‘Introductory Meeting’:

Introductory meetings interest me as I imagine it pays to be at the back of a politicians mind. It must be worthwhile to have some ear time and get your points across. I’m sure not any old Joe can get an introductory meeting. There must be PR companies that specialise in getting these meetings (lobby firms) so it’s interesting how many large companies are going to appear on this list. In fact, the purpose for one meeting was put down as ‘Lobbying’ with UK Public Affairs Council. They have a register of firms and clients published in evil PDF (go figure). Will have to scrape that.

Lastly, I thought the ‘Renegotiation of Contract’ category might be of interest so here it is:

A lot of these are big technology companies yet the government is notorious for accumulating huge costs with little effectiveness when it comes to implementing new IT systems. I also wonder whether Vodafone’s tax dispute was known during the negotiation of their contract with the Cabinet Office.

I’m getting the data out so that anyone with inside knowledge can put two and two together to further the information. I’m churning the data in so that what can be churned out is journalism and not churnalism. That’s the idea anyway. Just looking at the data is a step in the right direction so anyone interested in data journalism, just keep on looking at what’s coming out. And I’ll try and put it into a context that has journalistic value.