Archive for July, 2011

I’m over the half-way through the Knight Mozilla Learning Lab with lofty goals of changing the way we experience news online. I came up with my idea after filming a rant by my friend and colleague. I threw down something very quickly on paper and have since made wire frame mock-ups using MockFlow (thanks Chris Keller for suggesting that software).

I have no experience in UX (user-experience) and no design skills to speak of. I’ve just started learning to programme but my objective was never to build sites or applications, but to get at data for story generation and knowledge in the newsroom.

The learning lab so far has included some really big hitters and even though I don’t write directly about their words of wisdom (my criteria for forming my software proposal has come from blogs and an article by Jeff Jarvis, who will be giving the last webinar), their advice manifests itself as the cogs in my idea-making machine. So I give you my idea so far, for which I will have to write a proposal. It’s a bit long and rambling but I don’t like to edit out the person so much. It is my ‘workings’ which will be refined and made succinct in the traditional media fashion.

Here is Phil Gyford’s blog post, Andy Rutledge’s blog post, Brad Colbow’s blog post, Anil Dash’s blog post and Jeff Jarvis’ article. Highly recommended reading. Also, here is the blog post about the calculated and strategic killing of future party leaders by someone who’s thinking was so frighteningly designed he could not be insane. Here also is a blog post about comments on the Norwegian massacre in Hebrew which sheds another horrific light on the anti-Muslim sentiment world wide. This is a particularly good example of how I need user-generated-content to access even parts of the web which would not otherwise be available to me.

This week, Shazna Nessa Director of Interactive at AP, spoke about making changes in the newsroom and working with staff that have a multitude of different skills. Now I work with geeks. They make my world a better place for me to live in. I’ve been introduced to MakerNight where I’m building a hamster feeder that looks for a twitter hashtag, and GeekUp where mostly we go to the pub! That being said, I presented “The Big Picture” at the last GeekUp in Liverpool to get some geeky guidance. Here’s the result, pardon the modulating audio, I didn’t have a mic:

I see ‘The Big Picture’ as a major collaborative effort between the public, the newsroom community managers, journalists (as they’ll know the topic and should be amongst the invited guests), and experts in the field who may even be the presenter. Now the minimum-viable-person for this project is Joe Blogs so the minimum technical skills are required. Just as a discussion show requires producers, journalists, researchers, directors and studio-hands, so everyone should be involved in providing the big picture.

This is Cuddles. He’s a Russian Dwarf hamster. We have 2 males. Cuddles is the beta male. He was fat and lazy which made him very docile. You’d like Cuddles. He’s a good little hamster. He’s been made docile by the fact that he’s bullied by the alpha male.

We initially called the alpha male ‘Dimples’ but he turned out to be really evil. So we renamed him Morbo. This is after the news presenter from Futurama: Morbo the Annihilator. Because Morbo incessantly bullies Cuddles, he’s broken out in boils and is under weight. Whenever we give him food, Morbo steals it from him. Morbo often pushes him over and presses him until he squeaks. In fact, we’re going to separate them even though Russian Dwarf hamsters are supposed to be social creatures.

To mark this point of separation I have written a piece of code to explain why it is necessary for poor Cuddles to be taken away from his brother. It’s also a exercise for me to learn about classes in Python but nevertheless it is very poignant. Here it is:

It’s on ScraperWiki so you can play around with the parmeters. It takes an initial hamster weight of 300g and assumes he loses 2g every time he’s chased by Morbo and gains 5g every time he eats. Morbo bullies him up to 10 time a day. It shows you how long he has to live. You can change these parameters to see how that changes his life expectancy.

Here’s a part of the output from my command line:

I’m a journalist learning to code and this is my story telling through the medium of Python.

This will explain a part of the code:

Here’s a video of my friend Francis saying how online news is s**t. He started giving out so, having a background in broadcast journalism, I filmed him for the purpose of the Knight-Mozilla News Challenge. Now Francis doesn’t have a TV or radio and doesn’t read the newspaper (he gets The Economist). For daily news, his laptop does all. It is his only outlet to the world wide media monster. Yet he thinks no one has got online news right:

So here is Today’s Guardian. I would highly recommend you read Phil Gyford’s post about his creation. He has tried to recreate a printed newspaper for online and it’s beautiful in its simplicity. Why try and reinvent the wheel? The newspaper structure, design and user experience has been fine tuned for centuries now. It is the oldest medium. Why are we trying to kill it with graphics and clicks and a maze of navigation tunnels in an attempt to decipher what is relevant?

This obsession with personalised personalisation is detracting from the fact that people want to get together and combine their knowledge to understand what is new in the world and how it affects them. Online news separates you from the crowd, it isolates your knowing by assuming an entry point (and placing background and foreground somewhere within each article) and it gives you no relevancy by limiting conversation to a stream of comments from unidentified sources at the bottom of each article. The article is not the right space. The principle element of news should be the story, not the article (there’s a difference). The article is no longer the atomic unit of news. So why are we trying to put everything there?

In Phil’s sense of Friction, Readability and Finishability; why don’t we try and take news pre-mediation? Why don’t we take it back to the conversation? You engage, you communicate, you understand. Social reduces friction, people’s understanding is what is most readable, and the conversation gives you something tangible in your head. It creates a “thing” which you can take away and personalise in your own head in the form of enlightenment.

In that vein, I give you my proposal for the Knight-Mozilla News Challenge. I call it: “The Big Picture” and it is a mashup of the studio discussion, Storify, Big Blue Button, a Reader and Phil’s creation. Sounds crazy right? But the key point is: the conversation is the navigation and you comment with content to create an editorial crowd-sourced democracy for a news issue.

Here’s a quick video going through my drawings I threw down later that night after filming the above video (I’ll try and make a better proposal MVP, not sure a prototype can be made in time):

Here are pictures of my musings late one night:

The Big Idea

The Big Bucket (of content)

The Big Discussion

The Big Picture

Within a very short period of time the term ‘journalist’ has changed in meaning drastically. Or did it even have a meaning to begin with? At a panel on the phone hacking scandal at the Centre for Investigative Journalism Summer School, Gavin Millar QC, said that a journalist is like a terrorist, we have no legally defined term for what they are!

The term ‘citizen journalist’ has grown from the web, from the free and global publishing platforms that are blogs, twitter and Facebook (and much more than those). The enabling ability of the web, its ability not just to spread information but to upload pictures, video and words; has shaken the traditional media model. The press journalist is no longer the gatekeeper of information. Just look at the Ryan Giggs superinjunction scandal.

So what do we mean when we say “The Press”? One of the consequences of the News of the World phone hacking scandal (which includes a lot more publications and not just News International publications) is that we are going to get a Press Inquiry. I say we, because we don’t know whether any regulatory outcome will include bloggers and twitterers i.e. us, the public who have a social space on the web and use it to communicate within the open public sphere.

Another word that is taking on a new and worrying meaning is the term ‘hacker’. Now a lot of weight and attention has been given to the citizen journalist/web journalist/blogger phenomenon. The circle within which my journalistic persona travels is that of hacks/hackers. I am part hacker. I am a data journalism advocate for a developer platform called ScraperWiki. And I am very concerned about how this tumultuous time in journalism history will define the word ‘hack’ and all its related synonyms.

Wikipedia has one definition of ‘Hacker’ as “a subculture of innovative uses and modifications of computer hardware, software, and modern culture”. I sit on the edge of this and want to look further into the nucleus as a possible future for online news and newsgathering. ScraperWiki is one of a core set of online tools being used by the Open Data community. The people who are part of this community, I flatter myself to be included, are ‘hackers’ by the best definition of the word. The web allows anyone to publish their code online so these people are citizen hackers.

They are the creators of such open civic websites as Schooloscope, Openly Local, Open Corporates, Who’s Lobbying, They Work For You, Fix My Street, Where Does My Money Go? and What Do They Know? This is information in the public interest. This is a new subset of journalism. This is the web enabling civic engagement with public information. This is hacking. But it is made more important by the fact that not everyone can do it, unlike citizen journalism.

I have a twitter account, @Scrape_No10, tweeting out meetings, gifts and hospitality at No.10. I made a twitter account, @OJCstatements, which tweets out statements by the Office for Judicial Complaints regarding judges who have been investigated over personal conduct included racism, sexism and abuse of their position. This information is on the web so it is in the public domain. But it is not in the public sphere because the public don’t check the multitude websites that may have information in the public interest. So I have put it on the platform where it could be of most use to the public.

In that sense, I feel journalists need to be ‘hackers’; they need to hack. Information in the public interest is not often available to the public. More and more government data is being put on the web in the form of PDFs and CSVs. Now, under the Freedom of Information Act 2000, the government doesn’t have to answer your request directly if the information is published online or will be published online. That means that with more and more information being put in the form of spreadsheets or databases, the public are going to be pointed to a sea of columns and rows rather than given direct answers. So journalists need to get to grips with data to get the public their answers.

But as we know, any journalistic endeavour is open to abuse. So where do we draw the line? Even with citizen journalism, the Ryan Giggs ousting online, has blurred the boundary between the right to get private information out in the open and the right to privacy of the individual. Now deleting the voice messages of a missing girl is clearly overstepping the bounds in a horrendous way. The public will never forgive such behaviour but invading politicians’ privacy for the purpose of uncovering corruption often is.

The argument can be made that information on the web is public information and can be used freely in a journalistic endeavour. But that isn’t always the case. The British and Irish Legal Information Institute portal, BAILII, does not allow scraping. Learning to scrape is my journalistic endeavour at the moment. Scraping is the programming form that takes information from the web and pares it down into its raw programmatic ingredients. So it can be baked into something more digestible to the public.

Now I would love to the make the legal system more digestible but I can’t. It’s because BAILII have their own databases of information that they sell to private companies. And scraping can reform these. One of which is a database of court fines that they sell to a multitude of credit card companies. So we pay for the judicial system and if we’re fined by it they have the right to make money from the data to affect our credit rating. This makes the information they put online locked into the format they chose to put it in, a complicated and convoluted web portal.

But equally, what about unearthing a deleted tweet or matching social media accounts through email address which are not disclosed but which could be guessed at? Linking online personas that are set up to be separate? Not accessing their private emails, not getting past any firewall that requires a password, but using details behind the front end of the web to dig deeper into their online connections. The question is not where do we draw the boundary but can we. Or even, should we.

It’s not the technique that should be outlawed; it should be the endeavour. Please don’t let the News of the World define ‘hacking’. In the Shakespearean sense of “That which we call a rose by any other word would smell as sweet”, we should define journalism not by a word but by what it smells like. Something stank about the first phone hacking enquiry in 2009. Nick Davies smelt it and followed his nose. And that’s the definition of journalism.

The above article, by me, appeared in an edited form on the openDemocracy website. They say: “openDemocracy publishes high quality news analysis, debates and blogs about the world and the way we govern ourselves. We are not about any one set of issues, but about principles and the arguments and debates about those principles. openDemocracy believes there is an urgent need for a global culture of views and argument that is: i) Serious, thoughtful and attractively written; ii) Accessible to all; iii) Open to ideas and submissions from anywhere, part of a global human conversation that is not distorted by parochial national interests; and iv) Original and creative, able to propose and debate solutions to the real problems that we all face.” For further reading re who holds court data you should read this article.

Here’s an introduction to a little thing called ScraperWiki. Please watch it so that you don’t develop bad habits that could annoy the programmers you want helping you!

This is the exercises for the workshop I ran at the CIJ summer school. It is also on the ScraperWiki blog where I’ll be posting the answers.

OBJECTIVES FOR THIS WORKSHOP

  • Have your own ScraperWiki account and understand all the features and navigation of the site
  • Scrape twitter for different terms and export the dataset
  • Query datasets on ScraperWiki using SQL and learn about our API
  • Create a table from the data in a scraper and understand how views work
  • The main objective is to understand ScraperWiki capabilities and potential!

Exercise 0. Make an account

Go to http://scraperwiki.com and make an account. You’ll need this to save your scrapers. We encourage you to use your full name, but it’s not obligatory!  Before you start you may like to open a 2nd tab/window in your browser as that will allow you to go back and forth between the blog instructions and the exercise!   The exercises are all written using the Python Programming language.

Outcome:

You’ll have your very own ScraperWiki account (lucky you!). Welcome to your dashboard. Please fill out your profile when you have time. You’ll automatically get email updates on how your scrapers are doing. We’ll show you all the features you have, where your data goes and how to schedule your scraper.

The following scrapers you’ll be copying and pasting can be found on my profile page:http://scraperwiki.com/profiles/NicolaHughes/

Exercise 1. Basic data scraping

We’ll start by looking at a scraper that collects results from Twitter. You can use this to store tweets on a topic of your choice into a ScraperWiki database.

1. Edit and run a scraper

  1. Go to http://scraperwiki.com/scrapers/basic_twitter_scraper_2/.
  2. Click Fork Scraper. You’ve now created a copy of the scraper. Click Save Scraper located on the right side of the window to save the scraper.
  3. Position your cursor on Line 11 between the two open speech marks ‘ ‘ –  This will allow you to edit the query to a Twitter search term of your choice.  For example, if you want to find tweets about an event type in the event hashtag e.g. QUERY=’#cij2011’.
  4. Click Run and watch the scraper run.
  5. Save the scraper. Click Save Scraper located on the right side of the window to save the scraper.

2. Download your scraped data

  1. On the top right-hand corner there are 4 tabs – you are in the Edit (Python) tab, click the Scraper tab.  You will see all of your data in tabular format.
  2. Click on ‘Download spreadsheet (CSV)’ to download and open your data as a spreadsheet.  If you have Microsoft Excel or Open Office you can analyse that data using standard spreadsheet functions.
  3. Double-click on your spreadsheet to open it.

Outcome:

Congratulations! You have created your very own twitter scraper (*applause*) that doesn’t depend on the twitchy twitter API.  You’ve scraped your data and added it to the data store. You’ve also taken the data into a csv file.  The scraper is scheduled to run daily and any new data will be added automatically. Check out the video on how to change the schedule.

Exercise 2. Analysing some real-life data

In this exercise, we’ll look at a real-life use of ScraperWiki.

The Press Complaints Commission doesn’t release the details of the complaints in a way that is easy to analyse, and it doesn’t release many statistics about its complaints.

It would be interesting to know which newspaper was the most complained about.

However, as one of our users has scraped the data from the PCC site it can be analysed – and crucially, other people can see the data too, without having to write their own scrapers.

Here is the link: http://scraperwiki.com/scrapers/pcc-decisions-mark-3/

There’s no need to fork it. You can analyse the data from any scraper, not just your own.

As the Open Knowledge Foundation say, ‘The best use of your data is one that someone else will find’.

Instead, we are going to create a new view on the data.  However we are not going to create a view from the beginning – we will fork a view that has already been created!  Go to http://scraperwiki.com/views/live_sql_query_view_3/, fork the view and save it to your dashboard.  You can also change the name of the view by clicking on the top line beside your name “Live SQL Query View” and change it to “My analysis of the PCC Data”.  Save it again by clicking ‘Save’.  Take a few moments to study the table and pay particular attention to the column headings.

  1. There are four tabs on the top right hand corner of the screen – click the ‘View’ tab.  This will take you to the ScraperWiki test card.  Click the centre of the test card “Click to open this view”.
  2. Using this SQL query view, find out which publications get the most complaints.
  3. Place the cursor in the ‘SELECT’ box, delete the ‘*’ and click on the word ‘publication’ which appears on the 2nd line of the yellow box (tip: the yellow box contains all of the column headings in the table) to the right of the column, and is positioned on the line under swdata .   This will transfer the word  ‘publication’ into the  your SELECT box.  Position your cursor to the right of the word ‘publication’ and type in ‘, count(publication)’. This creates a column that contains the a count of the number of times a publication appears in the dataset that was created from by the original scraper.
  4. Place your cursor in the ‘GROUP BY’ box, and repeat the process above to select the word ‘publication’ from the yellow box to the right. The GROUP BY statement will group together the publications and so give you the aggregated number of times each publication has appeared in the PCC.
  5. Place your cursor in the ‘ORDER BY’ box, remove the existing text and then type ‘count(publication) desc’. The ORDER BY keyword is used to sort the result-set. This will put the result-set in order of how many times the publication has received a complaint with the most complained about publication appearing at the top.
  6. In the ‘LIMIT’ box type ‘10’ to see the top 10.
  7. Hit the ‘Run SQL query’ button and see your results.  At the bottom of the column the last item is ‘full query’ – yours should read as follows:  (it may be truncated)

SELECT publication, count(publication) FROM swdata GROUP BY publication ORDER BY count(publication) DESC LIMIT 10

It should look like this!

This is a simple query showing you a quick result.  The query is not being saved.   You can use the Back and Forward browser tabs to see both screens however as soon as you go back to the test card and “Click to see view’  – the query will be reset.

For some really great SQL tutorials check out: http://www.w3schools.com/sql/default.asp

Your Challenge:

Now lets find out who has been making the most complaints – not receiving them.  You will be altering the query to find out which complainants have made the most complaints.

Outcome:

So you have just learned a bit about SQL, which is very simple and very powerful. You can query any open data set in ScraperWiki to find the story. Welcome to the wonderful realm of data journalism. Hop aboard our digger and explore.

Exercise 3. Making views out of data

Making a view is a way to present raw data within ScraperWiki.  It allows you to do things with the data like analyse it, map it, or create some other way of showing the data.

For example, in the previous exercise we used the view to create a simple SQL Query.  The view produced a table but it did not save the results of the query.    In this exercise we are going to make a table that gives you the latest result of your query every time you open it up!  In effect you will be creating your own live league table.

  1. Fork and save to your dashboard: http://scraperwiki.com/views/top_10_receivers_of_cabinet_office_money/
  2. On line 59 change “cabinet_office_spend_data” to “pcc-decisions-mark-3”. This is re-directing the view code to the original PCC scraper.
  3. On line 60 change the var sqlselect = “ SELECT publication, count(publication) FROM swdata GROUP BY publication ORDER BY count(publication) DESC LIMIT 10 ” query to the SQL query you worked out in the previous exercise.
  4. Click the ‘PREVIEW’ button which is positioned to the right of the orange documentation button above the ‘console’ window to make sure you’re getting the table you want.
  5. Click the X on the top right hand corner of the View Preview Window.
  6. The Heading still refers to the old scraper!  Go to the edit view and click on line 21 and replace the old title with ‘Top Complaints by Publication to the PCC’
  7. Save the View and Preview.
  8. Go to line 18 and replacehttp://scraperwiki.com/scrapers/cabinet_office_spend_data/”>Cabinet Office Spend Data</a> with http://scraperwiki.com/scrapers/pcc-decisions-mark-3/”>Press Complaints Commission Data</a>. You can also change the ‘this page’ hyperlink from <a href=”http://www.cabinetoffice.gov.uk/resource-library/cabinet-office-spend-data”> to <a href=”http://www.pcc.org.uk/”&gt;.
  9. Save the view and preview.
  10. Click View to return to the test card and scroll to the bottom of the screen where you will see the paragraph heading – ‘Add this view to your web site’.   You could copy the code and add it to your web site and have the data linked directly. If you don’t have a site, this view is where you can return to to get your live league table that will update with your data (wow)!
ScraperWiki Testcard

Note: there is no data stored in a view, only a link to a scraper that is looking at the data in the data store. This is why you can quickly and easily alter a view to look at another scraper at any time.  So you van build a viewer once and use it many times.

Check out what the Media Standards Trust made from our PCC scrapers! They made the Unofficial PCC.

Your Challenge:

Build a table of the top 10 receivers of Cabinet Office Money! The original scraper here: http://scraperwiki.com/scrapers/cabinet_office_spend_data/.

So go to your SQL viewer and change the scraper name to “cabinet_office_spend_data” i.e. the name of the scraper for our internal API is the string of letters after ‘scrapers/’ in the URL of the scraper.

Create your query (Hint: you’ll want to look at the ‘Refined’ table as that has the names of the suppliers cleaned to be the same for each spelling, just hit the word ‘Refined’ and the table that is selected for your query will appear in yellow. You’ll also want to use ‘sum’ instead of ‘count’ to sum up the money not how many times the supplier got paid). Then make your table view.

If you want to keep your PCC table just fork it, change the name in the title, save it and change the code to look at “cabinet_office_spend_data” and your query.

Hint: the original table you forked will have the answers!

Also check out this view of the same data set: http://scraperwikiviews.com/run/cabinet_spending_word_cloud_date_slider/

And this view of HMRC spending data that was made in a couple of hours from forking my code: http://scraperwikiviews.com/run/hmrc_spending_pie_chart_date_slider/

Outcome:

You’ll have a live league table that feeds off your scraped data! You’ll see how our setup allows you to constantly view changing data on the web and tell you the stories as they are developing. So you can get data out but you can keep it in and have lots of different views interrogating your data within ScraperWiki.

So let’s work on the journalism in ‘data journalism’ and not just the data. It’s not what can do for your data; it’s what your data can do for you!

So let’s start scraping with ScraperWiki!

Note: Answers in screencast form will be put on the ScraperWiki blog in two weeks’ time. So no excuses!

Burt Herman, former bureau chief and correspondent for The Associated Press, CEO of Storify and founder of Hacks/Hackers gave the following webinar to the participants of the Knight-Mozilla Learning Lab:

 

Here is my take on how the elements needed to build a business for a newsroom are also what you need to make building a virtual newsroom your business.

Follow your passion

My passion is data journalism, data and journalism in equal measure. Burt established his career before he took the big step of starting his own business. I had yet to start my career when I decided to make data journalism my business. My foray into coding is not to build an end-of-line product to mediate information but to build machines for the factory. To build things for journalists to machine read information – to unearth stories that cannot be got by the human eye. It may be a quick way to gauge where Cabinet Office money is going (click on the image to get to the view). Every time the data is updated the visual will update automatically.

Or a way to get information to the public in a way where you can catch the conversation, such as judges who have been reprimanded over personal conduct (read the blog post).

Or an email alert system to bring potential stories to the journalist.

Burt already had a name for himself in journalism. I have not. I’m not looking to make my name; I’m looking to make things that help me find stories that might not otherwise be told.

Build a community

Around this time last year I started this blog and my twitter account not to broadcast what I know but to act as a semantic sink for all things data journalism. So I could find the people who can educate me by what they publish: my data miners. As much as I was working for them, they were working for me. They led me to the Hacks/Hackers community and ultimately to the ScraperWiki team.

Build a team

ScraperWiki is not my team (as much as I love them). The ScraperWiki community is my team. The Australian building planning alerts is on my team. The Icelander looking into foreclosures is on my team. I can see their code; I know what they build; I can ask them for help. My scrapers are my team (and I’ve built those!).

Just build it

Just scrape it. Data in the public interest is public data. Now I can write a scraper in a day or add little things on. And that’s how I’m learning to code but every piece of code I write has to have a journalistic purpose (what ever way you define that!).

Listen to your users

Listening to the stream of information delivered to me by my data miners is what led me to take a leap of faith and leave CNN to join ScraperWiki. I would never have been able to judge, even from within a news organisation, that data journalism was worth pursuing. But I was able to glean this by tracking the metadata from my blog and my twitter footprint.

Stay flexible

Use backend, barebones code. Make it open. Mould it to the purpose of your journalistic endeavour. Here’s what I want to make (which is limited by what I can make!).

I thoroughly enjoy pottering along to hack days, workshops, meet-ups and data journalism camps. Anywhere that brings developers and journalists together is the place to be. The underlying trend I find in attendance (besides developers) is freelance journalists and teachers (university and industry). Now I love learning a bit too much having spent far too much time at university but I do admire those with passion in the teaching profession. I can’t get enough of the learning environment. Especially when it is collaborative and crowd sourced!

So it is with great pleasure that I have been invited to the Knight-Mozilla Learning Lab. It is run through P2PU and the webinars will be held using Big Blue Button. There are 60 participants in total and from their profiles, I’m sure nothing short of magic will come of this challenge. I am amazed and delighted that such a skilled set of individuals want to turn their talent to journalism and build atop the media platform.

They are all much more highly skilled at computer programming than myself and I am happy to be invited to contribute my opinions on the stupendous series of webinars they have lined up. I do plan on building something with my meagre hacking skills and will keep you informed on the highs and lows (expect lots of lows). Weekly assignments (homework, beautiful wonderful homework – yes I am a nerd) will involve blogging on the webinars which is right down my data mine (why does that sound wrong!?). What I mean is, it is in keeping with the theme of my blog however I try to lean away from opinion pieces as I claim no authority on such a nebulous subject. So any post I tag with MozNewsLab will be my reflections, opinions and ruminations. That being said, here are some of the speakers of the webinars I (and you) have in store (via Phillip Smith):

  • Aza Raskin is a renowned interface designer who recently held the position of Creative Lead for Firefox. He is currently the co-founder of Massive Health, and probably up to many other design-meets-entrepreneurial things.
  • Burt Herman is an entrepreneurial journalist. He is the CEO of Storify and a co-founder of Hacks/Hackers.
  • John Resig is a programmer and entrepreneur. He’s the creator and lead developer of the jQuery JavaScript library, and has had his hands in more interesting open source projects that you can shake a stick at. Until recently, John was the JavaScript Evangelist at Mozilla. He’s currently the Dean of Open Source and head of JavaScript development at Khan Academy.
  • Chris Heilmann is a geek and hacker by heart. In a previous life, he was responsible for delivering Yahoo Maps Europe and Yahoo Answers. He’s currently a Mozilla Developer Evangelist, focusing on all things open web, HTML5, and working open.
  • Jeff Jarvis is the author of What Would Google Do? He blogs about media and news at Buzzmachine.com. He is associate professor and director of the interactive journalism program and the new business models for news project at the City University of New York’s Graduate School of Journalism.

Data is the new word for information. But Information Journalist implies every other journalist is just a churnalist. Which is most definitely not the case. If data is anything in a database then I’m looking beyond that. For me data is any piece of information that can be turned to journalistic use. So rather than confine my scraping to CSVs and data releases, I can take anything from the web I think will be useful for the public to know.

Here’s something that is in the public domain but not the public sphere: Statements from the Office for Judicial Complaints where judges are reprimanded or struck off.  The OJC deals with complaints about the personal conduct of judges. Examples of possible personal misconduct might be use of insulting, racist or sexist language in court, or inappropriate behaviour outside the court such as a judge using their judicial title for personal advantage or preferential treatment. So they can be reprimanded and struck off for personal misconduct by the OJC but the OJC does not have the power to investigate or call into question any of their previous judgements.

So I’ve put all the statements with a link to the PDF documents detailing their case with the OJC on twitter. Any new statements should be picked up by my scraper (which will run daily) and then be tweeted out. If anyone who has dealt with a tweeted judge has something to add please reply to the tweet or use the hashtag #OJC.