If you don’t already know, this year brings a fresh new challenge to a journo-coder wannabe who calls herself DataMinerUK. I am a Knight-Mozilla fellow at the Guardian and am looking to learn to code and make news in the open. As such I have moved this blog to a self hosted website: http://datamineruk.com/ where I can embed iframes.

It is long overdue and although I found managing a blog much easier, my goal to build news using open frameworks means I need a proper platform. It should look and feel pretty much the same. I’m not looking to become a hot shot developer and build an entire content management system (which the Guardian has done) but to work on content. Let’s use digital tool to find the message as well as build the medium.

My focus is on content. I’d describe myself more as a journalist than a coder. But with a lot of help from the Interactives team at the Guardian and my fellow fellows, I can harness the power of open source, open journalism and open news to be you content in weird and wonderful ways. What they’ll be is anyones guess, yours as well as mine.

So stay tuned!

This post is for December’s Carnival of Journalism where we were asked to write what would be the best present from programmers/journalists that Santa Claus could leave under your Christmas tree? Stick people courtesy of xkcd.

My  Christmas wish from both programmers and journalists would be the realisation that both cultures are dependent on the goodwill and morals of their community in order to avoid corruption and ultimately disintegration of the integrity of their relative professions. No institution can govern fast enough to police either party. And from both I would like to see a sense of solidarity in our relative hopeless cases, as illustrated below. (Also I would like more coding lessons – teach a man to fish and whatnot…)

The Crisis

The Confrontation

The Cure?

Seeing as tomorrow is Open Data Day and I claim to be a Data Journalist (I think JournoCoder is more suitable) here’s a little data food for journalistic thought.

RSS Feed of US Nuclear Reactor Events

Here is the site showing the US nuclear reactors power output status. Here is the scraper for that site written by ScraperWiki founder Julian Todd. Here is my script for catching the unplanned events and converting them to RSS format. And here is the URL you can use to subscribe to the feed yourself:


Oh, and a video (using the example to go through the ScraperWiki API)

ScraperWiki has had a little facelift of late so I’ve done a couple of screencasts (one more to come). Enjoy!

So things have gone quiet on the blog front but screamingly loud behind the scenes. I have met some amazing and inspiring people in both the US and Canada. A lot of them have been taken aback by my journey which I have documented here and which I should update on my timeline.

I have called myself a ‘human experiment’. I am trying to create a so-called ‘Data Journalist’ and left the newsroom to retrain. Whilst working with a startup I have come to realise that I am not an experiment, I am a startup. A living, breathing, learning, iterating, startup.

I have used the web to test my seed idea by finding organisations like HacksHackers and attending conferences like NewsRewired. I have used this blog and my journey to seek validation. I have acted to develop my business acumen, creating point stories, Twitter accounts and bots.

I now have ‘Angel’ funding provided by the Knight Foundation and invested by Mozilla. This means I have to go into testing. Across the Atlantic I was doing a lot of outreach. I believed in ScraperWiki more than myself, in some ways I still do. They are a proper startup, a business, an institution, and I believe in their power as a tool for social good. And that will never die. And so the startup will never fail in my eyes.

As the tech startup joke goes: “A million guys walk in to a Silicon Valley bar. None of them buy anything. The bar is declared a rousing success.” Business is a hard world. Your passion for business is fundamentally what is needed to drive even the best ideas. My passion is not in business development, it’s in news development. And good news will always find a way to survive. I am not looking to succeed as a business but as a startup. So how can a startup not be a business?

In the same vein, how can an experiment not have a tangible result? I was never going to get to a stage when I could say ‘I have succeeded’. My experiment was to see if learning some programming (that journey still continues) and gathering news differently, would be a viable route into a newsroom (when newsrooms are hemorrhaging people and resources) in order to produce news and not just talk about it. I wondered, can I restructure myself faster and in the right direction of the evolving news industry by using the lean startup mode of discovery.

Now that has been verified, it’s back to the hypothesis. It’s back to testing and now I have a lab that is willing to take me on, The Guardian. And I couldn’t ask for a better lab. But it’s up to me to continue the research.

I was blown over by the ‘congratulations’ I received at the announcement of my fellowship (which was uploaded here, here and here) but the fellowship is not an acknowledgement of what I have done, but an opportunity to test out what I can do. It’s the end of one leg of my journey and the beginning of an even more daunting road.

My journey is made less daunting by the fact that I have travel partners in Cole Gillespie, Dan Schultz, Laurian Gridinoc and Mark Boas (although their skills and experience make for quite intimidating company). I have carriage provided by The Guardian and service by Knight.

So what is my success? What is my lesson learnt? I took the road less traveled by and that has made all the difference:

To gain knowledge, insight and foresight into the developing media landscape, the best forms of education lie outside the classroom. I am a huge proponent of self-learning through experimentation. So I constantly go to events, lectures, hackathons and conferences.

I have recently been to HackHackersNYC and HacksHackerTO, as well as universities and newsrooms in the US. I find myself preaching the data journalism cause but also looking to learn more (code, as with journalism, is all about continuous learning).

An amazing opportunity that rolls everything into one brilliant bonanza of creativity, collaboration and coding is the Mozilla Festival taking place in London, UK on 4-6 November. The theme is Media, Freedom and the Web and if that isn’t enough to entice you I suggest you take a look at the line up as well as the star attendees.

ScraperWiki and DataMinerUK will be there as part of the Data Driven Journalism Toolkit. So come along if you wanna dig the data and do a whole lot more!

Posted: October 28, 2011 in Events
Tags: , , ,

So I’m in the US, preparing to roll out events. To make decisions as to where to go I needed to get data. I needed numbers on the type of people we’d like to attend our events. In order to generate good data projects we would need a cohort of guests to attend. We would need media folks (including journalism students) and people who can code in Ruby, Python, and/or PHP. We’d also like to get the data analysts, or Data Scientists, as they are known as, particularly those who use R.

So with assistance from Ross Jones and Friedrich Lindenberg, I scraped the locations (websites and telephone numbers if they were available) of all the media outlets in the US (changing the structure to suit its intended purpose i.e. why there are various tables), data conferences, Ruby, PHP, Python and R meetups, B2B publishers (Informa and McGraw-Hill) and top 10 journalism schools. I added small sets in by hand such as HacksHackers chapters. All in all, nearly 12,800 data points. I had never used Fusion Tables before but I have heard good things. So I mashed up the data and imported it into Fusion Tables. So here is it (clicki on the image as sadly wordpress.com does not support iframes):

Click to explore on Fusion Tables

Sadly there is a lot of overlap so not all the points are visible. Google Earth explodes the points on the same spot however it couldn’t handle this much data when I exported it. Once we decide where best to go I can hone in on exact addresses. I wanted to use it to pinpoint concentrations, so a heat map of the points was mostly what I was looking for.

Click to explore on Fusion Tables

Using Fusion Tables I have then break down the data for the hot spots. I’ve looked at the category proportions and using the filter and aggregate, made pie charts (see New York City for example). The downside I found with Fusion Tables is that the colour schemes cannot be adjusted (I had to fix them up using Gimp) and the filters are AND statement (no OR statement option). The downside with US location data is the similarity of place names across states (also having a place and state name the same), so I had to eye up the data. So here is the breakdown for each region where the size of the pie chart corresponds to the number of data points for that location. It is relative to region not across.

Of course media outlets would outnumber coding meetups, universities and HacksHackers Chapters, but they would be a better measure of population size and city economy.

What I’ve learnt from this is:

  1. Free tools are simple to use if you play around with them
  2. They can be limiting for visual mashups
  3. The potential of your outcome is proportional to your data size, not your tool functionality (you can always use multiple tool)
  4. To work with different sources of data you need to think about your database structure and your outcome beforehand
  5. Manipulate your data in the database not your tool, always keep the integrity of the source data
  6. To have data feed into your outcome changes your efforts from event reporter to source
This all took me about a week between doing other ScraperWiki stuff and speaking at HacksHackers NYC. If I were better at coding I imagine this could be done in a day no problem.