From Scraper to Map to Information

Posted: February 10, 2011 in Data Journalism, My Data Journey, Open Data Movement
Tags: , ,

The media work with information. Fact. But what exactly is information? I’m beginning to realise that information is not static. It just doesn’t exist in a concrete form anymore. So how are the media reacting to this? I’m not sure but I think their public, the community formerly known as the audience are reacting quicker and better.

This is not through any failing by the media but more to do with the monetization of social media. Technical gadgets have become part of our physical selves, software have become part of our mental selves and now social networking has become part of our societal selves. And society can accept and integrate this because businesses have found a way of making money from it.

The business structure of the media is an enigma in itself and the more it tends toward the business model the more the information journalists work with becomes warped into the social fabric of gossip, celebrity and shock.

But lets not get carried away, here is a new way of looking at information. A quick and easy way for you to make it useful. To make it explorable. I’ve just used three free things from the web.

Thing no.1 – ScraperWiki:

I found this data set on the site

It’s by Techbelly as I am only just learning to scrape. But you can request a scraper for any data you can find on the web here.

So I downloaded the CSV file you can see on the top right and imported it into Google docs. Some scrapers allow you to import into Google docs immediately but if not it’s just a matter of download and upload.

Thing no.2 – MapAList

The next easy step is to go to MapAList. This links up directly to your Google docs and will find your spreadsheets. The great thing is that you don’t need longitude and latitude (very few data sets give these) as you can use Google Maps to plot by address.

Once you have an account you hit ‘Create Map’. Choose your source type as Google Spreadsheet, select the file you uploaded as your spreadsheet and if you have more than once worksheet you can choose the one which has the addresses listed. You should be able to make sure you’ve chosen the right file by viewing a sample of the spreadsheet.

Just hit ‘Next’ at the bottom of the page and you’ll then choose the fields (i.e. the column of your spreadsheet) that will allow the data to be mapped. This data gave the address and postcode in separate fields which is great. It also give longitude and latitude however this is not given in the original source of the data which can be found as a search here. The first part is matching you columns to the fields MapAlist will read to map your data. So if you’ve got address, postcode and/or longitude and latitude as columns this should be easy enough. The second part is choosing what fields you’ve put in your columns you’d like to see on your placemarkers. This should be a matter of which parts of your data are most interesting. For tax exempt works of art I’ve picked the category of art and a description of the object as you can see. So hit ‘Apply’ and ‘Next.

The next step trouble shoots your data. As you can see only 1149 of 1177 records were geocoded. Now below this shot will be the entries that failed to be geocoded. By looking at them I can see that there were five records where no location was given. The remaining 23 missing entries were due to the fact that the art works were on loan to a public gallery. As the whole point of this exercise is to get people knocking on Lords, Ladies, Dames and Dukes doors, I thought I’d just go with the 1149 entries given.

The next step is to chose what your palcemarkers will look like. I’ve chosen to have a different placemarker for each category of art work as you can see:

The you put in a title and select more details depending on how you want your map presented.

You get a preview to make sure everything is as you like. Hit ‘Close this, and create new map’ and voila:

You can play around and get directions here. You can even embed it on a web page and send it to a friend using the share button at the bottom. Sadly, there’s no way to get it into a WordPress blog. But wait! There’s more. Download it as a KML using the download button at the bottom right.

Thing no.3 – Google Earth:

For more fun just open it up with Google Earth. That way, where there is more than one piece of art work at a particular location, Google Earth will split them up for you. It also means you can add layers from all your data sets. Now that’s what I call information!

For those not familiar with the scheme, if you have a work of art, you can register it under the scheme to avoid paying certain taxes – including inheritance tax – but under the condition that you look after it and make it available for public viewing. A few of the people signed up to the scheme are keener on the tax advantages and less on the public availability…

Points to note:

MapAList can’t handle more than 8,000 records. You can update a map when the Google Spreadsheet updates by going into ‘Manage’ and hitting the update option. That means any extra information that you add to a spreadsheet can go on the map without having to create a new one. But what would be ultimately useful is to have it update whenever the web page (the original source of the data) gets updated. Now that would be a good journalism tool!

I think this would be possible with ScraperWiki. I’m going to find out…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s