Posts Tagged ‘HacksHackers’

To gain knowledge, insight and foresight into the developing media landscape, the best forms of education lie outside the classroom. I am a huge proponent of self-learning through experimentation. So I constantly go to events, lectures, hackathons and conferences.

I have recently been to HackHackersNYC and HacksHackerTO, as well as universities and newsrooms in the US. I find myself preaching the data journalism cause but also looking to learn more (code, as with journalism, is all about continuous learning).

An amazing opportunity that rolls everything into one brilliant bonanza of creativity, collaboration and coding is the Mozilla Festival taking place in London, UK on 4-6 November. The theme is Media, Freedom and the Web and if that isn’t enough to entice you I suggest you take a look at the line up as well as the star attendees.

ScraperWiki and DataMinerUK will be there as part of the Data Driven Journalism Toolkit. So come along if you wanna dig the data and do a whole lot more!

Advertisements

So I’m in the US, preparing to roll out events. To make decisions as to where to go I needed to get data. I needed numbers on the type of people we’d like to attend our events. In order to generate good data projects we would need a cohort of guests to attend. We would need media folks (including journalism students) and people who can code in Ruby, Python, and/or PHP. We’d also like to get the data analysts, or Data Scientists, as they are known as, particularly those who use R.

So with assistance from Ross Jones and Friedrich Lindenberg, I scraped the locations (websites and telephone numbers if they were available) of all the media outlets in the US (changing the structure to suit its intended purpose i.e. why there are various tables), data conferences, Ruby, PHP, Python and R meetups, B2B publishers (Informa and McGraw-Hill) and top 10 journalism schools. I added small sets in by hand such as HacksHackers chapters. All in all, nearly 12,800 data points. I had never used Fusion Tables before but I have heard good things. So I mashed up the data and imported it into Fusion Tables. So here is it (clicki on the image as sadly wordpress.com does not support iframes):

Click to explore on Fusion Tables

Sadly there is a lot of overlap so not all the points are visible. Google Earth explodes the points on the same spot however it couldn’t handle this much data when I exported it. Once we decide where best to go I can hone in on exact addresses. I wanted to use it to pinpoint concentrations, so a heat map of the points was mostly what I was looking for.

Click to explore on Fusion Tables

Using Fusion Tables I have then break down the data for the hot spots. I’ve looked at the category proportions and using the filter and aggregate, made pie charts (see New York City for example). The downside I found with Fusion Tables is that the colour schemes cannot be adjusted (I had to fix them up using Gimp) and the filters are AND statement (no OR statement option). The downside with US location data is the similarity of place names across states (also having a place and state name the same), so I had to eye up the data. So here is the breakdown for each region where the size of the pie chart corresponds to the number of data points for that location. It is relative to region not across.

Of course media outlets would outnumber coding meetups, universities and HacksHackers Chapters, but they would be a better measure of population size and city economy.

What I’ve learnt from this is:

  1. Free tools are simple to use if you play around with them
  2. They can be limiting for visual mashups
  3. The potential of your outcome is proportional to your data size, not your tool functionality (you can always use multiple tool)
  4. To work with different sources of data you need to think about your database structure and your outcome beforehand
  5. Manipulate your data in the database not your tool, always keep the integrity of the source data
  6. To have data feed into your outcome changes your efforts from event reporter to source
This all took me about a week between doing other ScraperWiki stuff and speaking at HacksHackers NYC. If I were better at coding I imagine this could be done in a day no problem.

So I’m back from Berlin and in the US. I met some amazing people at the Knight Mozilla Hacktoberfest, a 4 day hackathon with people from all over the world and from all walks of life. It was the most fun I’ve had all year and I’ve made some friends for life. The project ideas were brilliant and the discussion inspiring. To have the news partners (Al Jazeera, BBC, Guardian, BostonGlobe and Zeit) be active participants was a great move on Mozillla’s part. To have big news organisations look outside for ideas and solutions shows they realise news is out there, not solely within structured organisations.

I remember first seeing a blog post about this partnership process and thinking: “Wow, I wish I could apply. Shame I’m not a developer”. I went along to the application process out of curiosity and thankfully my creative juices got the best of me.

Even then, my scepticism told me not to expect any part of my MozNewsLab pitch, the Big Picture, to be built in 4 days and so I made a little side project, MoJoNewsBot. On the third day of the hackathon I presented my data stream connected chat bot via the Big Discussion part of Big Picture. Thanks to an amazing participant, David Bello, we got a conference with website submission, approval and iframe designed and coded in two days. I only found out before presenting that he is in management at a university in Colombia and doesn’t code for a living. I was truly blown away by how an idea; developed, designed and pitched, can be made reality owing solely to the good will of someone who “plays” with code.

You can keep track of both projects, Big Picture and MoJoNewsBot on the Mozilla wiki. I’m looking to make the first and third part of Big Picutre with further help and advice from the participants. Thanks to the magic of GitHub and DotCloud, I have a local version of Big Picture running on my computer. I’m going to learn JavaScript and add to/clean up Big Picture before I present it formally on my blog. As for my chat bot, I need to add error messages and tidy up the code a bit. Then I’ll relocate him from the #botpark to #HacksHackers on IRC. During events in the US I’m going to add more modules with interesting data for journalists to reference.

To all my viewers, whoever you are, I recommend you hop on the MoJo bandwagon next year. It’ll be the ride of your life! Almost as eventful as driving the ScraperWiki digger 😉

Just a couple of notices regarding journalism skills to be got in the real world, not just the virtual.

HacksHackers London meetup tomorrow on their 1 year anniversary. Be there or be an equilateral quadrangle. Speakers include Alec Muffett from Green Lane Security and Martin Belam from The Guardian.

Stanford University School of Engineering are offering a free online course “Introduction to Databases“. I am signing up for it and I suggest you do too. There is a branch of data journalism known as database journalism. They have been resigned to the world of B2B journalism, maintaining their databases for use as their main revenue source. I believe database skills are going to come in handy for mainstream journalists and understanding databases is the key to unlocking your data journalism skills. And it’s free! This may not work well in the virtual world but I suggest forming study groups with other like-minded folk.

A documentary coming out 23rd September called “Page One”  follows a year at the New York Times and chronicles the impact of new media on the newsroom. Invite potential study buddies to watch it and fool them into thinking they’ll have fun. Here’s the trailer:

An article titled “Editors: ‘Traditional skills more important than new media’” in the Press Gazette today shows how little ‘editors’ understand what new media is. New media is a not a reporting tool, it is a platform. Of course using new media should never be high on a list of priorities for journalistic skills. New media is not a ‘thing’ to be learnt like interview skills and video shooting. New media ia a medium where skills are applied. So by using new media you’re not learning new media but learning to apply skills without relying on the slim chance of being employed in a traditional newsroom. The modern journalism CV should not have “I can use Twitter and Facebook” on it but your CV should be online links to what you have done, along with the audience you have reached through fostering your own news platform. New media is an opportunity to show what traditional skills you have applied to the real world. That is why it is so important.

Although “data journalism” can encompass infographics, interactives, web apps, FOI, databases and a whole host of other numbering, coding, displaying techniques; the road less travelled-by has certain steps, turns and speed bumps. In that sense, here’s a list of things to tick off if you’re interested in going down the data journalism road:

  1. Know the legal boundaries – get to know the Data Protection Act 1998 and the sections on access to personal data and unstructured personal data held by authorities. Do not set foot on your journey without reading the section on exemptions relating to journalism. Use legislation as a reference by downloading the Mobile Legislate app.
  2. Look at data – get to know what is out there, what format it’s in and where it’s coming from. Places like Data.gov.uk, London Datastore, Office for National Statistics and Get the Data are good places to start for raw data but don’t forget, anything on the web is data. The best data are often hidden. Data can be text and pictures so even mining social media and catching the apps built from them can give you insight into what can be done with data.
  3. Read all about it – to make data and stats accessible you need to know how to frame them within a story. In that sense, you need to know how to undertand the stories they tell. That doesn’t mean going on a stats course. There are a lot of accessible reading material and I would recommend The Tiger That Isn’t.
  4. Get connected – find HacksHackers near you and join Meetup groups to point you in the right directions. Data journalists’ interests and abilities are unique to the individual (much like programmers) so don’t take text of advice as set in stone (the web changes too quickly for that!). Find your own way and your own set of people to guide you. Go to courses and conferences. Look outside the journalism bubble. Data is more than just news.
  5. Spread your bets – the easiest way to sort data is by using spreadsheets. Start with free options like Google Docs and OpenOffice. Industry standards include Microsoft Excel and Access. Learn to sort, filter and pivot. Find data you’re interested in and explore the data using your eyes balls. Know what each piece of software does and can do to the data before mashing it with another piece of software.
  6. Investigate your data – query it using the simple language SQL and the software MySQL. It’s a bit tricky to set up but by now you’ll know a hacker you can ask for help! Clean your data using Google Refine. There are tutorials and a help wiki. Know how these function not just how to navigate the user interfaces, as these will change. These products go through iterations much more quickly than the spreadsheet software.
  7. Map your data – from Google spreadsheets the easiest way to build a map is by using MapAList. There is a long list of mapping software from GeoCommons to ArcGIS. Find what’s easiest for you and most suitable for your data. See what landscapes can be revealed and hone in on areas of interest. Understand the limitations of mapping data, you’ll find devolution makes it difficult to get data for the whole of the UK and some postcodes will throw up errors.
  8. Make it pretty – visualize your data only once you fully understand it (source, format, timeframe, missing points, etc). Do not jump straight to this as visuals can be misleading. Useful and easy software solutions include Google Fusion Tables, Many Eyes and Tableau. Think of unique ways to present data by checking out what the graphics teams at news organizations have made but also what design sites such as Information is Beautiful and FlowingData are doing.
  9. Make your community – don’t just find one, build one. This area in journalism is constantly changing and for you to keep up you’ll need to source a custom made community. So blog and tweet but also source ready-made online communities from places like the European Journalism Centre, National Institute for Computer Assisted Reporting (NICAR), BuzzData and DataJournalismBlog.
  10. Scrape it – do not be constrained by data. Liberate it, mash it, make it useable. Just like a story, data is unique and bad data journalism comes with constraining the medium containing it. With code, there is no need to make the story ‘fit’ into the medium. “The Medium is the Message” (a la Marshall McLuhan). Scrape the data using ScraperWiki and make applications beyond story telling. Make data open. For examples check out OpenCorporates, Schooloscope and Planning Alerts. If you’re willing to give coding a try, this book called “Learning Python the Hard Way” is actually the easiest way to learn for the non-programmer. There is also a Google Group for Python Journalists you should join.
These are guidelines and not a map for your journey. Your beat, the data landscape, changes at the speed of web. You just need to be able to read the signs of the land as there’s no end point, no goal and no one to guide you.

Who?

Hacks/Hackers London

What?

Journalists call themselves “hacks,” someone who can churn out words in any situation. Hackers use the digital equivalent of duct tape to whip out code. Hacker-journalists try and bridge the two worlds.

When?

Wednesday, 22nd June 2011 at 07:00 pm

Where?

The Shooting Star
125-129 Middlesex St
London E1 7JF

Why?

James Ball, from The Guardian, will be talking about complex topics made comprehensible by infographics; network analysis showing the secret influencers behind-the-scenes (a technique central to exposing extraordinary rendition), and more. James, a data journalist working on the Guardian’s investigations desk, will draw on data from WikiLeaks (where he worked on the embassy cables), Ghost Plane, and more – takes a look at four of the best data analysis tricks, and when their use might confuse, mislead or even kill your audience.

Neil Smith, an ex-police officer and fraud investigator-turned private investigator and trainer, will be running through some of the tools and advice he has collated on his internet investigation site Open Source Intelligence, as well as explaining some key ways to use social media to dig up information.

Who?

Hacks/Hackers London

What?

Journalists call themselves “hacks,” someone who can churn out words in any situation. Hackers use the digital equivalent of duct tape to whip out code. Hacker-journalists try and bridge the two worlds.

When?

Wednesday, 25th May 2011 at 07:00 pm

Where?

The Shooting Star
125-129 Middlesex St
London E1 7JF

Why?

Kevin Marsh, Director of a new journalism education venture, OffspinMedia, will be giving a talk entitled: “It’s time we gave news audiences what they need, not what they want”.  He argues that the more traditional journalism chases what news audiences want, the less it delivers what they need to play an informed, decisive role in determining their own environment and futures. How in a world of social networking and real-time feedback do journalists deliver what communities need, rather than what individuals want?

Glyn Wintle will be giving a talk entitled: “Hacking for Good – White Hats and Web Security”. He makes a living from technical consulting, programming and
security work. He will be explaining penetration testing, ethical hacking, and why telling the world about serious security problems in common software is a good idea.