Votes don’t belong in a relational database

· long story short

Patrick Nielsen Hayden of Electrolite has been hammering away at the Diebold voting machines issue recently. This seems to be another one of those slowly unfolding stealth scandals (like the profiteering by Halliburton in Iraq) that hasn’t yet come up with a hook that will make ordinary people see the danger. Who will tell the people?

In Hold it right there, a new troubling issue is brought to light, beyond the already troubling issues of the Diebold machines using proprietary (not open-source) technologies, the software being unauditable, the potential vulnerability to hacking, the lack of a paper trail, and the anomalous results (at far variance to last-minute polling, exacerbated by the folding-up of the media’s exit-polling consortium).

This new issue? Why is Diebold using Microsoft Access, a relational database, to store votes? I realize this gets into a technical area that most of us don’t feel qualified to count on, but the simplest explanation of a relational database is that it used to correlate information between two or more data tables. However, there’s no reason why the software company providing the voting infrastructure should need to correlate any information beyond who got what vote:

Commenting on the general bogglement over the revelation that Diebold’s e-voting systems rely on Microsoft Access, Jon Meltzer writes:

The real issue isn’t Diebold trying to maximize its profit by using cheap labor and software tools; it’s the very concept of an unauditable voting system. The problem would be no less severe if they were using a secure, unhackable implementation.

Erik Olson, who does this stuff for a living, asks what suddenly seems like a rather pertinent question: why on earth are they recording votes in a relational database at all?

There aren’t supposed to be any relations in voting. […] What other data are they creating relations to? This is even more contrary to the purpose of a voting machine than simple security.

At the end of a vote, the machine needs to produce the following data.

FOO xxxx votes
BAR xxxx votes
QUX xxxx votes
ALL yyyy votes

The precinct is a set field, determined by where the machine is set. Every other relation, other that “foo gets a vote,” is antithetical to the secret ballot process, and should never be collected. Not time, not date, not who, where, why, whatfor, nothing! Give me a camera in the polling place–not in the booths, mind you–and a very accurate clock on the voting machine and the camera, and save the time voted with the vote, and I can tell you how almost every person in that polling station voted. Save machine number with that vote as well, and that becomes every voter. Period.

The fact that they are using a RDBMS is a declaration that they intend to treat voting as a relational database.

There’s more in Erik’s full comment, over in this thread. Meanwhile:

“Every other relation, other that “foo gets a vote,” is antithetical to the secret ballot process, and should never be collected.”

Right. Whether Access is on the voting machine itself, or being used on the voting data somewhere else, why on earth is it in use at all? The “relations” involved in vote-recording are completely trivial. The only sensible reasons to use a relational database are if you’re planning to record data you shouldn’t record, and to do things with it that you shouldn’t do. [Electrolite]

This all worries me a great deal.