Effective citizen budget monitoring requires data – and lots of it – but just as important the data needs to be accessible and comprehensible. Thanks to the European Transparency Initiative driven forward by Commission Vice-President Siim Kallas, the EU has come a long way in a short time on accepting the principles of budget transparency, in particular through the new rules requiring the publication of end beneficiaries of EU funds. Where the Commission’s transparency advocates appear to have taken their eye off the ball when it comes to how these new rules have been implemented.
Taking the case of EU farm subsidies, the implementing rules require each member states to maintain a website database where citizens can search for recipients of farm subsidies and find two pieces of information: how much they get and the municipality where they are located. It is a shame that there is no requirement to explain why the money was paid. As Agriculture Commissioner Mariann Fischer Boel told a farmsubsidy.org audience in July 2006,
“Telling the public about who gets how much money is only half of the story. The other half is explaining what the money is for.”
Gaps in the data aside, the main problem with the ‘web platform approach’ favoured by the Commission and many other public institutions is that the data is locked up in a website and so it’s not possible to analyse the data in its entirely. For instance, to find out simple things like which region got the most – or the least – in farm subsidies, or who are the largest recipients or how the payments are distributed. These basic tasks of analysis require having the entire dataset in one place, so it can be analysed using any one of a number of statistical software applications. With web platforms, the data is presented in the way that the government wishes it to be presented. The citizen, and the external budget monitor, is disempowered.
It would be a trivial task for the government to publish alongside its web platform, the raw data file, but only in a few cases have member states chosen to do this (e.g. the UK, Czech Republic and Belgium). The Commission’s own Financial Transparency System is guilty of the same sin of only providing a restricted web platform and not making the entire dataset available.
There are reasons why governments might remain keen to build a web platform, for instance they may not think that there is sufficient civil society interest and capacity to do the job better, and they may wish to present the data in context and alongside explanations and provide tools for interactivity and feedback from citizens. All well and good. But if a government goes down this path, it should be required to create an open system with public access to each layer of the website – data, analysis and presentation – as described very well by Richard Allan, Chair of the UK Government’s Power of Information Taskforce (see also the illustration of such an open system below)
Faced with an inpenetrable web platform, the prospective citizen budget monitor’s only option is to ‘screen-scrape’ the government website. Screen-scraping involves generating an automated routine that queries every single record in the database and records the output into a data file. It’s a highly skilled activity and the domain of a small handful of computer programmers. Nils Mulvad and Simon Roe of farmsubsidy.org are both proficient at screen-scraping. So too is Julian Todd, whose work includes tracking votes in the UK Parliament and administering a healthy dose of transparency to the United Nations and Richard Pope, who runs PlanningAlerts.com – a fantastic site that helps people find out about applications for new buildings near where they live. But their labours would not be necessary if governments took the simple and costless step of publishing the raw datasets that lie behind their web platforms in a simple, machine-readable data format (such as XML or CSV).
As well as the barrier of restrictive web platforms, another big obstacle to accessing budget data in the EU is the way so many public authorities choose to publish data in formats that are not accessible for analysis and re-use. Governments in Spain are very fond of publishing their CAP payment data in PDF files that can run to thousands of pages and are next to impossible to convert back to a data file. Many governments are publishing details of expenditure under Structural and Cohesion Funds and the European Fisheries Funds in the form of PDF files. It is very difficult – and sometimes close to impossible – to extract the underlying data from a PDF file. The cynical part of me thinks that perhaps this is an intentional decision on the part of governments who wish to comply with the letter of the laws on transparency but want to make sure that citizens are prevented from analysing the data for themselves.
The current edition of the Economist features an article praising budget transparency in the United States but also describing how it is possible for a politician to come unstuck if their transparency measures are not implemented properly by officials:
“On the campaign trail Sarah Palin sometimes bragged that she had, as the reforming governor of Alaska, put the state’s books online. She did sign the legislation, but the result is a clunky collection of spreadsheets and PDFs.”
So what can be done? Well, for a start, those in the Commission responsible for the European Transparency Initiative should set down some technical guidelines, starting with the requirement all data should be provided in a raw form, fully machine-readable. What does that mean? Tom Steinberg of mySociety suggested to me the following, which I think is at the very least a good place to start:
“Data must be made available in electronic formats that separate each piece of information into discrete, appropriately categorised units which can be automatically imported by a computer into a database”
That means No to PDFs, No to clunky and inaccessible web platforms, No to Word Documents. It means Yes to XML feeds and Yes to CSV files. Following these rules actually requires less work on the part of governments and should cost taxpayers less money. In this new era of budget pressure, that’s something that everyone can welcome.
The same principles should apply to civil society websites that reuse government data. Farmsubsidy.org is now three years old and currently being rebuilt from the ground up, with faster search performance, new tools for user generated content and a fully-featured API that will allow anyone to make use of the underlying dataset for their own purposes. It’s our intention that the site will meet the very highest standards of accessibility, performance and openness.