How are we going to improve government opendata?

The last week has been exciting, with new ideas and suggestions at every corner.  So many of you have sent comments, emails, we've had face-to-face chats and meetings... It's been amazing. Thank you for all your input.  (And keep it coming!)

The biggest question is, "How are we going to do this?"  To begin to answer that, I thought we should get a bit more into "What do we need to do?  Where are those pesky gaps in the opendata process we need to fill?"

The diagrams below are ones I've been scrawling in my notebook at least once a day for the past few weeks.  Because I like you all, I thought I'd spare you my handwriting and give it to you in a prettier format.

The opendata vision

Diagram_-_the_ideal

Diagram of the government opendata vision (click to enlarge)

This diagram shows you the vision of government opendata; the way it's supposed to run.  Various parts of government (local and central, plus all the non-departmental public bodies) publish their datasets, either on their own websites or on data.gov.uk.  Developers come along and code neat apps and visualisations to do fun things with the data, and users/citizens are able to be more informed about their government's activities, more closely connected to the resources in their neighbourhoods, and able to mash up the data with activities in their own lives (or other sectors).

The problem we have in realising the opendata vision

As the previous post outlined, this process isn't exactly flowing the way it is meant to.  Differences in formatting and, most importantly, undefined codes and values mean that much of the published data can't be analysed or compared to any other data.  This frustrates us.

Diagram_-_the_problem

The diagram showing the problem (click to enlarge)

The solution we are working towards

In order to fix these problems, we have to build a few things:

  1. A process or mechanism to reformat the problem datasets into something we can work with.  (RDF? I know a lot of you aren't thrilled with that.  We'll sort this out at the planning workshop.)
  2. A site/module by which to crowdsource the missing metadata
  3. Tools for querying, filtering and searching across all the government datasets (even the ones we haven't had to tidy) and APIs to release the data to developers
  4. A web site with a simple search box, to provide basic answers to questions like "How many toilet rolls has my organisation purchased in the past year?"  (This part will be primarily aimed at government users, to encourage them to see value in the project and help us with the metadata -- but will be open to all.

The diagram now looks like this:

Diagram_-_the_project

What this project should do for opendata (click to enlarge)

The things we need to build are in green.

And what's next?

As always, comments very welcome!  I'll be getting further into the "How do we do this?" bits in posts to come over the next few days-- I have a lot more information and structure which should make it much easier to see how we'll break down the tasks and actually build this thing. 

Also more info on the planning workshop.  Stay tuned!

Cheers, all!