Posts Tagged ‘MapAList’

Although “data journalism” can encompass infographics, interactives, web apps, FOI, databases and a whole host of other numbering, coding, displaying techniques; the road less travelled-by has certain steps, turns and speed bumps. In that sense, here’s a list of things to tick off if you’re interested in going down the data journalism road:

  1. Know the legal boundaries – get to know the Data Protection Act 1998 and the sections on access to personal data and unstructured personal data held by authorities. Do not set foot on your journey without reading the section on exemptions relating to journalism. Use legislation as a reference by downloading the Mobile Legislate app.
  2. Look at data – get to know what is out there, what format it’s in and where it’s coming from. Places like Data.gov.uk, London Datastore, Office for National Statistics and Get the Data are good places to start for raw data but don’t forget, anything on the web is data. The best data are often hidden. Data can be text and pictures so even mining social media and catching the apps built from them can give you insight into what can be done with data.
  3. Read all about it – to make data and stats accessible you need to know how to frame them within a story. In that sense, you need to know how to undertand the stories they tell. That doesn’t mean going on a stats course. There are a lot of accessible reading material and I would recommend The Tiger That Isn’t.
  4. Get connected – find HacksHackers near you and join Meetup groups to point you in the right directions. Data journalists’ interests and abilities are unique to the individual (much like programmers) so don’t take text of advice as set in stone (the web changes too quickly for that!). Find your own way and your own set of people to guide you. Go to courses and conferences. Look outside the journalism bubble. Data is more than just news.
  5. Spread your bets – the easiest way to sort data is by using spreadsheets. Start with free options like Google Docs and OpenOffice. Industry standards include Microsoft Excel and Access. Learn to sort, filter and pivot. Find data you’re interested in and explore the data using your eyes balls. Know what each piece of software does and can do to the data before mashing it with another piece of software.
  6. Investigate your data – query it using the simple language SQL and the software MySQL. It’s a bit tricky to set up but by now you’ll know a hacker you can ask for help! Clean your data using Google Refine. There are tutorials and a help wiki. Know how these function not just how to navigate the user interfaces, as these will change. These products go through iterations much more quickly than the spreadsheet software.
  7. Map your data – from Google spreadsheets the easiest way to build a map is by using MapAList. There is a long list of mapping software from GeoCommons to ArcGIS. Find what’s easiest for you and most suitable for your data. See what landscapes can be revealed and hone in on areas of interest. Understand the limitations of mapping data, you’ll find devolution makes it difficult to get data for the whole of the UK and some postcodes will throw up errors.
  8. Make it pretty – visualize your data only once you fully understand it (source, format, timeframe, missing points, etc). Do not jump straight to this as visuals can be misleading. Useful and easy software solutions include Google Fusion Tables, Many Eyes and Tableau. Think of unique ways to present data by checking out what the graphics teams at news organizations have made but also what design sites such as Information is Beautiful and FlowingData are doing.
  9. Make your community – don’t just find one, build one. This area in journalism is constantly changing and for you to keep up you’ll need to source a custom made community. So blog and tweet but also source ready-made online communities from places like the European Journalism Centre, National Institute for Computer Assisted Reporting (NICAR), BuzzData and DataJournalismBlog.
  10. Scrape it – do not be constrained by data. Liberate it, mash it, make it useable. Just like a story, data is unique and bad data journalism comes with constraining the medium containing it. With code, there is no need to make the story ‘fit’ into the medium. “The Medium is the Message” (a la Marshall McLuhan). Scrape the data using ScraperWiki and make applications beyond story telling. Make data open. For examples check out OpenCorporates, Schooloscope and Planning Alerts. If you’re willing to give coding a try, this book called “Learning Python the Hard Way” is actually the easiest way to learn for the non-programmer. There is also a Google Group for Python Journalists you should join.
These are guidelines and not a map for your journey. Your beat, the data landscape, changes at the speed of web. You just need to be able to read the signs of the land as there’s no end point, no goal and no one to guide you.

The media work with information. Fact. But what exactly is information? I’m beginning to realise that information is not static. It just doesn’t exist in a concrete form anymore. So how are the media reacting to this? I’m not sure but I think their public, the community formerly known as the audience are reacting quicker and better.

This is not through any failing by the media but more to do with the monetization of social media. Technical gadgets have become part of our physical selves, software have become part of our mental selves and now social networking has become part of our societal selves. And society can accept and integrate this because businesses have found a way of making money from it.

The business structure of the media is an enigma in itself and the more it tends toward the business model the more the information journalists work with becomes warped into the social fabric of gossip, celebrity and shock.

But lets not get carried away, here is a new way of looking at information. A quick and easy way for you to make it useful. To make it explorable. I’ve just used three free things from the web.

Thing no.1 – ScraperWiki:

I found this data set on the site

It’s by Techbelly as I am only just learning to scrape. But you can request a scraper for any data you can find on the web here.

So I downloaded the CSV file you can see on the top right and imported it into Google docs. Some scrapers allow you to import into Google docs immediately but if not it’s just a matter of download and upload.

Thing no.2 – MapAList

The next easy step is to go to MapAList. This links up directly to your Google docs and will find your spreadsheets. The great thing is that you don’t need longitude and latitude (very few data sets give these) as you can use Google Maps to plot by address.

Once you have an account you hit ‘Create Map’. Choose your source type as Google Spreadsheet, select the file you uploaded as your spreadsheet and if you have more than once worksheet you can choose the one which has the addresses listed. You should be able to make sure you’ve chosen the right file by viewing a sample of the spreadsheet.

Just hit ‘Next’ at the bottom of the page and you’ll then choose the fields (i.e. the column of your spreadsheet) that will allow the data to be mapped. This data gave the address and postcode in separate fields which is great. It also give longitude and latitude however this is not given in the original source of the data which can be found as a search here. The first part is matching you columns to the fields MapAlist will read to map your data. So if you’ve got address, postcode and/or longitude and latitude as columns this should be easy enough. The second part is choosing what fields you’ve put in your columns you’d like to see on your placemarkers. This should be a matter of which parts of your data are most interesting. For tax exempt works of art I’ve picked the category of art and a description of the object as you can see. So hit ‘Apply’ and ‘Next.

The next step trouble shoots your data. As you can see only 1149 of 1177 records were geocoded. Now below this shot will be the entries that failed to be geocoded. By looking at them I can see that there were five records where no location was given. The remaining 23 missing entries were due to the fact that the art works were on loan to a public gallery. As the whole point of this exercise is to get people knocking on Lords, Ladies, Dames and Dukes doors, I thought I’d just go with the 1149 entries given.

The next step is to chose what your palcemarkers will look like. I’ve chosen to have a different placemarker for each category of art work as you can see:

The you put in a title and select more details depending on how you want your map presented.

You get a preview to make sure everything is as you like. Hit ‘Close this, and create new map’ and voila:

You can play around and get directions here. You can even embed it on a web page and send it to a friend using the share button at the bottom. Sadly, there’s no way to get it into a WordPress blog. But wait! There’s more. Download it as a KML using the download button at the bottom right.

Thing no.3 – Google Earth:

For more fun just open it up with Google Earth. That way, where there is more than one piece of art work at a particular location, Google Earth will split them up for you. It also means you can add layers from all your data sets. Now that’s what I call information!

For those not familiar with the scheme, if you have a work of art, you can register it under the scheme to avoid paying certain taxes – including inheritance tax – but under the condition that you look after it and make it available for public viewing. A few of the people signed up to the scheme are keener on the tax advantages and less on the public availability…

Points to note:

MapAList can’t handle more than 8,000 records. You can update a map when the Google Spreadsheet updates by going into ‘Manage’ and hitting the update option. That means any extra information that you add to a spreadsheet can go on the map without having to create a new one. But what would be ultimately useful is to have it update whenever the web page (the original source of the data) gets updated. Now that would be a good journalism tool!

I think this would be possible with ScraperWiki. I’m going to find out…