Just to let you know that the Twitter account @Scrape_No10 which tweets out ministers’, special advisers’ and permanent secretaries’ meetings, gifts and hospitalities is back up and tweeting. You can read the post about its creation here and download all the data the account contains. This account needs more coding maintenance than the @OJCstatements account (read about it here) because the data is contained in CSV files posted onto a webpage. I code sentences to be tweeted from the rows and columns. The scraper feeding the twitter account feeds off 5 separate scrapers of the CSV files. Because of this, the account is more likely to throw up errors than the simple scraping of the Office for Judicial Complaints site.

So I decided, as I’m learning to code and structure scrapers, to run the scrapers manually every time the twitter account stops, fix the bugs and set the account tweeting again. There will be better ways to structure the scrapers but right now I’m concentrating on the coding.

Learning to scrape CSVs is very handy as lots of government data are released as CSV. That being said, there is CSV documentation/tutorial on ScraperWiki, although it is aimed at programmers. For those interested in learning to code/scrape I would recommend “Learn Python the Hard Way” (which is the easiest for beginners, it’s just ‘hard’ for programmers because it involves typing code!). For more front end work I have recently discovered Codecademy. I can’t vouch for it but it looks interesting enough. I have also put all the datasets for the @Scrape_No10 account on BuzzData as an experiment.

Data is the new word for information. But Information Journalist implies every other journalist is just a churnalist. Which is most definitely not the case. If data is anything in a database then I’m looking beyond that. For me data is any piece of information that can be turned to journalistic use. So rather than confine my scraping to CSVs and data releases, I can take anything from the web I think will be useful for the public to know.

Here’s something that is in the public domain but not the public sphere: Statements from the Office for Judicial Complaints where judges are reprimanded or struck off.  The OJC deals with complaints about the personal conduct of judges. Examples of possible personal misconduct might be use of insulting, racist or sexist language in court, or inappropriate behaviour outside the court such as a judge using their judicial title for personal advantage or preferential treatment. So they can be reprimanded and struck off for personal misconduct by the OJC but the OJC does not have the power to investigate or call into question any of their previous judgements.

So I’ve put all the statements with a link to the PDF documents detailing their case with the OJC on twitter. Any new statements should be picked up by my scraper (which will run daily) and then be tweeted out. If anyone who has dealt with a tweeted judge has something to add please reply to the tweet or use the hashtag #OJC.

If you follow Scrape_No10 on twitter you will be receiving all the meetings, gifts and hospitalities received at No.10 by ministers, special advisers and permanent secretaries. If you follow #Scrape10 then you should be getting those as well as all the tweets relating to any item of data. The current database should be tweeted out by July.

The information contained in each tweet comes from the data published by the Cabinet Office. I scraped each data set and sorted the aggregated information chronologically, so that it can be tweeted out in the order in which the events happened (not every day was given). The links to the original data source and the scraped data are given at the bottom of this post.

With respect to what I have done, I would like to remind people that Freedom of Information does not equate to contextual knowledge regarding the information or making it useful. I am a data journalist. Data is my beat. But data is a public right. Not just your data but also the data of the people who work for you – government data.

Journalism involves information but also conversation. Each data entry now has the ability to start a conversation. Just use #Scrape10. If a tweet is interesting or someone somewhere has added a piece of news relating to the tweet, then #Scrape10 should trend and the tweet should be sent around to the community where the information matters. That’s the theory.

Information is now socially enabled and should be socially enabling. What you would like to know, what matters to you or your wider social community should not only be made available to you but should be made useable in a way that matters.

I am an experimental data journalist, playing with code. All my source code for collecting this data is open and you can download the entire dataset. The code to get it onto twitter is not available, as the publication of the authorisation keys would allow people to hack into the account. I have also written a scraper to store #Scrape10 tweets into a database everyday, so you can catch them all here if you want to.

You can also read a previous post on the Special Advisers’ gifts and hospitality dataset here.

Source Scraper
Permanent Secretaries’ Meetings Permanent Secretaries’ Meetings
Ministerial Meetings Ministerial Meetings
Ministerial Hospitality Ministerial Hospitality
Ministerial Gifts Ministerial Gifts
Special Advisers’ Gifts and Hospitality Special Advisers’ Gifts and Hospitality