Building a Voter File Part 2: Appending Overview

Cross-posted from Overdetermined.net. Find the latest entries in the series there!

Once the data is (yes, is, prescriptivists--I went there) in a standardized format, we move from the realm of "interesting" into "faintly creepy".  The information from Secretaries of State or state parties is generally pretty innocuous--name, address, maybe phone number or age.  The appended consumer data, on the other hand, is more unsettling.  There's nothing on there that would do real damage if anyone knew it--no credit card numbers, nothing that people could use to steal your identity--but it can be kind of strange to think who realizes that you own two dogs and a cat.

Most of this consumer data is gathered by for-profit companies, who then retail it to both the state parties and the for-profit companies that are creating these files (if you take a look at our resources page, InfoUSA is one such vendor).  They get their information anywhere they can--state licensing agencies (think it might be worthwhile for the McCain campaign to know who has a gun license?), magazine subscription lists, grocery store value card memberships...basically, if you have to fill out a form for it, somebody wants it, and will get it unless prevented by law. 

Moreover, based on this consumer information, it's possible to predict other characteristics (within limits, which I'll go into in a later post).  For example, the RNC might conduct a truly massive poll that measured all kinds of behavior--TV habits, income, type of location, and lots of other things besides.  Based on that poll, they might determine that there's a high correlation between a given cluster of characteristics and certain behaviors.  For instance: only a survey can tell you how much radio someone listens to.  But it's possible to know for everyone where they live, their age, and whether or not they own a boat.  If all males 54-65 who have boat licenses listen to Rush Limbaugh, it can be a good predictor. 

This use of consumer data is at least a partial definition of the oft-abused term "microtargeting" (this WaPo article, although overwrought, is a good introduction).  Rest assured I'll have more to say on the topic in the future; but this is the overview.  Stick around; tomorrow, I'll go into how this data gets used.