Here are some sites I’ve found that publish open data is useful formats, help with getting data, aid visualization of data and use data interesting ways:
Opening the data door
Data.gov.uk is your one stop shop for open data. Raw and in the flesh, this is the UK governments latest attempt at getting public data into the public domain. The search can be annoying to navigate with wide terms having to be used to pinpoint the data you want unless you know the rather wordy and rather protracted title civil servants give to documents these days. The government has crowd-sourced with this one resulting in apps made by web developers and SPARQL for web developers. If you are not one then data available pdf and/or xls is all it’s good for.
The Office for National Statistics publishes the data that makes it into the news headlines. Employment, economic growth and all the important stuff is on here first. This is probably the only site that newsrooms are familiar with. Although they are mostly interested in the press release and not the data itself. The Financial Times is the most obvious exception although it is now behind the paywall. Most likely because to put the time, effort and cash into good visuals and interactives money has to be made from the webpage. Good money.
The World Bank has global data. Along with country profiles and indicator data. Most of the good stuff will come out in a press release but comparisons and context can be got for particular stories. Don’t forget to widen your lens and look at global data and trends. It might not make for smashing news but it can add insight to a rather dull or dying story. Tell it globally if you’re on the web and the whole world is reading.
London datastore is cleaner and easier for browsing and searching. I really like this one as I live in London. I will be searching for news from this site but will not confine myself to the city. Hyperlocal sites in London should make good use of it, even if it’s just checking up to see if there’s something that can be added to an FOI or a local investigation. Hopefully local councils will adopt this layout for their open data initiatives.
The Guardian Data Store is a brilliant place to see how data can be used and visualized for the demanding day-to-day news outlet. Most of the visuals can be shared, downloaded and embedded. It is also a great resource to collect data as the data sheets are always available for you to download and play around with. I do wish the articles were longer and that the data store is given a higher place in the editorial pecking order. A great way to explore the news and gain valuable context which mainstream news usually misses.
If you’re looking for data on a particular topic then a search engine is of little use. Mostly because all the words and links associated with online data don’t actually describe the data at all, in a human way at least. This great little site from the Open Knowledge Foundation called Get the Data is a Q&A forum for those interested in data. As its members grow it becomes more useful so give it a shot and if you’ve got data on your mind then you might just have the answers to questions you never thought of asking.
NationMaster allows users to generate graphs based on numerical data extracted from the CIA Factbook and much more. It’s the best source for global data. The search function is fantastically useful for journalists. It takes a while to load but it’s well worth visiting for any global story.
Good uses of open data
OpenlyLocal is a project by Chris Taggart (@CountCulture) and it aims to make council data including upcoming meetings accessible to everyone. It is an ongoing project which aims to make local government more transparent. You can search by council, which are listed in regions. There is also a very nice list of hyperlocal sites for some of the areas with accompanying map. All of these have been set up by local citizen journalists and this springing up of hyperlocal site has been championed by @willperrin. According to OpenlyLocal’s open data scoreboard 19 out of 434 local authorities publish open data but only 10 are truly open. If you want that to change stay tuned to the blog as I will be reporting on the open data movement.
Where Does My money Go? is part of the Open Knowledge Foundation and deserves it’s own site as it handles the important task of analyzing and visualizing UK public spending. The project was a winner of the UK Government’s Show Us a Better Way competition. Click on the Launch Spending Dashboard and you can spend hours exploring this data. You can also get involved by gathering, cleaning or visualizing. These projects are a great place for the novice data miner.
They Work for You is for all those interested in UK politics. It’s a site which aims to keep tabs on UK parliaments and assemblies. Here you’ll find a list MPs, Lords, parliament debates, written answers and statements from parliamentary questions and a list of Public Bill committee debates. What I love about this site is the search functions in each section as well as a post code search. Very handy for finding out what’s going on in high places.
Schooloscope is a brilliant example of specific data being used in the public interest in a way that is the most useful. It’s still in progress but once officially launched, I imagine parents all over England will be flocking to this site. It is all built from public data and shows how a good idea can go a very long way.
This newsmap is a brilliant example of a scraper that is interactive and customizable. It is very cleverly done and beautiful to look at. It really makes users spend more time exploring the news.
Ways to get involved:
What Do They Know? is not just a good way to make Freedom of Information requests, it is the best way. Data that has to be unlocked should rightly be made public and this simple little site embraces that fact. Your request and reply is open. So what they know becomes what you know becomes what everyone should know. It’s a good place to look at other requests that can maybe be made into a story for your area. A good place to start to learn the tricks and trade of getting a reply to an FOI.
Help Me Investigate takes FOIs to a whole new level. It is a crowd-sourcing request forum to start, help or get help for an investigation. It’s handy just to ask people what they know or have heard. A nifty little idea and the first place you should go to if you think you’re onto something. It is all open mind so don’t let slip names, vicious rumors or privacy legislation breakers. It is great for finding a community who are interested in what you’re working on but as with all web resources do check out the ethical and community guidelines.
The Open Knowledge Foundation concerns all types of data and all types of persons. It is a not-for-profit organization based on crowd-sourcing people with knowledge, wisdom or just plain interest. Each project is made up of a collective of volunteers from all over the world. They say they seek to promote open knowledge because of its potential to deliver far-reaching societal benefits and are governed by various boards made up of professionals and academics. They are based in London.
Rewired State is a young fledgling project aimed at web developers. They are wanted to produce visualizations from government data. These are done through organized ‘hackdays’. I have been to one but not by Rewired State so I’ll leave my review for now. However, they do publish all their data for download so it is definitely a site worth checking out. If you do have any views or reviews of RS events do let me know.
Making data pretty
For a good look at creative and wonderful ways to explain, explore and play with data check out Information is Beautiful. It not only gives you great ideas for visualizations but also great ideas for stories as you can find out the original source of the visual and the team behind it. There’s a lot of quite up to date stuff which shows that it’s the slow uptake of news agencies to engage with data and interactives that is keeping such amazing works mostly out of the mainstream.
A quick and simple way to make word clouds or wordles is, well, Wordle. It’s much easier to adjust the look of your word cloud than using Many eyes. your wordle is public once you save it and you can’t control how anyone else uses it. Sadly, there is not html code to embed a wordle so to get it on you have to take a screen shot to save the image and upload.
A great prototype built by BERG for BBC is How Big Really. Here, you can transpose loads of things of scale onto a Google map in order to truly comprehend size. The most public use of this site has been to transpose the Gulf Oil Spill. You just input a place or post code, choose from the Saturn V launch pad to the Great Wall of China and explore dimensions like never before.
The Legislation.gov.uk site is rather dull as it’s name suggests. But it is an improvement on its predecessor The Office of Public Sector Information which was very dry. The government is doing its best to appear more trendy when it comes to web design. The good thing about this site is that it acknowledges devolution in that there are tabs for United Kingdom, Scotland, Wales and Northern Ireland. I do recommend anyone who is thinking about working with data and making FOI requests that they become familiar with legislation governing freedom of information and data protection.
MySQL (pronounced ‘my sequel’ is you want to appear knowledgable within the web sphere) is a database management system. It is usually used in conjunction with PHP in web development but for data mining purposes you only need the database manager. I first came across it at the Centre for Investigative Journalism summer school. The classes in computer-assisted-reporting were being taught by David Donald from the Centre for Public Integrity. The centre published Propublica online which, in my opinion is one of the best sources of genuine first class reporting. He recommended the book Learning MySQL published by O’Reilly Media. I bought it so watch the blog for progress, gripes and tips.
ScraperWiki is the latest business venture for data miners. It has a very good and experienced team behind it who organize Hackers and Hack Days; where journalists and web developers get together for a day to dig up and visualize potential story ideas. I have been to one of these events and will keep you posted. It offers scrapers so is very handy for people who prefer to do the visualizing rather than the cleaning and primping of data sets. For all you novices, what this service provides is a quick and accurate way to retrieve good data from sets which are updating or which have changes in them. Web developers can create and share their own scrapes. There is an option to have your scrape open or closed. For the novice data journalist looking to explore coding there are online tutorials provided. I’ll blog more about them as part of my data journey.
For a look at really advanced data usage you can’t go any further than Timetric. This is where the real data nerds reside but also where there’s a combination of data visualisation with journalism in mind. They supply the Guardian Datastore, so worth a good rummage around even if you feel you don’t have that much expertise yourself.