Here’s another tool for the amateur data-miners tool kit. I’m hoping it’ll be a pick axe more so than a stick of dynamite. It’s from Google so I’m hopeful. It’s called Google Refine and was previously known as Google Gridworks.
And the blurb sounds promising: “Google Refine is a power tool for cleaning up raw data, making it consistent, linking it to data registries like Freebase, augmenting it with more data from other data sources, transforming it into the required format for other tools to consume, and contributing it back to some data sources like Freebase. Google Refine is not a web service but a desktop app that runs on your own computer, so you can process sensitive data with privacy.”
I will hopefully tackle my first data sheet sometime in the early new year (or so goes the plan). I’m planning on learning Python and so will try and have my own set of scraped data to clean (oh the joy!). Meanwhile I will be posting visualization techniques and those working on new and wonderful ways to give data a voice. So stay tuned.
Here are the latest tutorials: