This post explains a project that has been incubating for a few weeks now. It's a community effort to fill in the holes in the government opendata process, and to make an interface to search across all government datasets. This project is not for profit, is for the common good-- and we need your help.
The history
Fresh from my last project at the Department of Health, I found myself chatting over coffee to frustrated friends in the civil service who want to help with the opendata activities. "We keep getting questions about what various codes mean in COINS, for example," they said. "We have the answers, and we want to help, but they aren't always asking the right people. We need need an easy way to get them the information -- and to do it once, for everyone."
I sympathised with them, listened carefully, and made a mental note: there's a gap between the government opendata vision and the reality. The datasets are often released full of unintelligible codes, information that the developers outside government (building apps and visualisations) would love to have. This makes sense to me: I've seen budget codes, cost centre codes, programme codes in my various government roles... I can imagine that analysis would be complicated if you didn't have a legend or translations for them.
I've also seen datasets that need explanations. "We undercounted people using this service last year. We figured out why and we fixed it, but some of the numbers, historically, will be misleading. We need to make that clear to anyone using the data." Hmm... so the people in government who work with this data know valuable information about it. Metadata, if you will. Interesting quandary.
Moving on, I then found myself having coffee with developers who have been valiantly wrangling with the government datasets. (I get to drink a lot of coffee when I'm catching up, between projects. It's quite fun.) "We are having a horrible time finding out what each code means," they said. "We think other developers are having this problem too, and it would be great if we could join up efforts."
Now you're talking. I'm starting to see a solution here -- and I think we, as a community, can help.
Crowdsourcing metadata
The first thing we need is a tool to crowdsource metadata about government data. This should allow those who know something about the data (civil servants, local government officers, etc.) to easily mark it in such a way that everyone can see and use their knowledge.
Essentially, we will be adding to the datasets as they come out of government, so that everyone who wants to use them will have better data to work with.
Looking across all of government
Here's the project's next leap in logic: If we will make this metadata crowdsourcing possible for all government datasets (why not dream big, right?), shouldn't we be able to analyse them all together? I'd love to know how my street's rubbish compares to national averages, or how an NHS trust's spend on cleaning correllates to their MRSA infection rates. The fun is in comparing across datasets.
(These datasets should include: spending data, geographic data, headcounts, street-level crime, house prices, etc. You name it, it should be in here.)
So... we really should make all datasets comparable. It would be fantastic if they will all play nicely with each other. (RDF anyone?) Looks like we'll have some work to do there, as well.
How will all this be used?
We'll build an API for apps and visualisations. (This is both in the spirit of openness, and as an incentive to people who want to use it. If you need tidier data, come help us make it happen!)
We will also make a web interface: a very simple search that will let you ask basic questions of the data. "How many children are in school in Kirklees?" or "How much money does the entire government spend on toilet rolls?" This site should be an everyday tool for grannies, researchers, school children, public servants... a very diverse group of people. (This presents us with a pretty sizeable UI challenge, which should be fun.)
Who is involved?
Pretty much everyone you can think of. As a community, open-source project, this should be the brainchild of a lot of people, all of whom have answers for some small part of the puzzle. (My job, thus far, is as organiser. I'm happy to pull everyone together and create a structure for this to happen, but I'm no more important than anyone else here!)
For this to work, we need quite a lot of people to
- help design and build the pieces (developers, programmers, data structure experts, RDF enthusiasts, UX designers)
- make sure it fits with the way government handles data, and the ways they think about exceptions and comments on the data (civil servants, local government officers, maybe suppliers)
- help the community around this project (developers, users, government, volunteers) stay engaged and excited (community managers, open-source project managers, potential users)
- raise the profile of this work across the country: with government, schools, universities, the volunteer/community organisation sector, the media, private sector companies, etc. (comms and marketing professionals, journalists, civil servants, local government officers)
- keep the project plugged into and fitting the needs of the rest of the opendata/developer community (anyone involved in existing projects, programmes, and foundations)
- keep an eye on the wider, democratic issues around publishing government data. Are we making things better? How can we use this work to improve government? (policy thinkers, democracy enthusiasts, community builders and leaders)
There are probably others whose help we could use too... If you're not on this list and would like to join us, please let me know (in the comments or by email). Whatever your idea, we probably need you!
How can I help?
You can help this project in many ways!
- Right now, it would be great if you could spread the word about this work. We need as many people as possible to be aware of it. We are gathering ideas; the more the merrier.
- We will have a day-long kick-off workshop soon to hash out the details of the project. Please comment on this post to register your interest in coming along, or talk to me (@hadleybeeman).
- Also, this project desperately needs a name. Thoughts? Suggestions? Please put name suggestions in the comments, and we'll all vote.
Thank you!!