Bill Lovett

Database Nation by Simson Garfinkel

Posted on January 1st, 2006

Book cover Read more at Amazon

By now this book is a few years stale, but its outline of privacy lost is still plenty relevant. The outline is drawn from all the downsides of technology's benefits. For instance: the ability to save every email you've ever sent or received. Having all those messages at your fingertips is great and all, but it's also risky.

You can dodge the issue by insisting that a Gmail or Yahoo email account will never be breached, or that even if it was there would be so much inanity that it wouldn't really matter. After reading Database Nation, the naivete of that kind of thinking really shines through.

Garfinkel isn't suggesting we go Luddite. His message is more along the lines of, "It's 10:00, do you know where your privacy is?". It's not something that's here today and gone tomorrow-- you loose a little here and a little there.

Garfinkel brings out some great terminology to help explain all this. For example, the notion of a "data shadow" on page 70:

Alan Westin coined the term data shadow in the late 1960s. Westin, a professor at Columbia University in New York, warned that credit records, bank records, insurance records, and other information that made up America's emerging digital infrastructure could be combined to create a detailed digital dossier. The metaphor, with its slightly sinister feeling, was uncannily accurate: just as few people ware aware of where their shadows fall, few data subjects in the future, Westin conjectured, would be able to keep track of their digital dossiers.

In the three decades that have passed since then, the data shadow has grown from an academic conjecture to a concrete reality that affects us all.

We stand at the brink of an information crisis. Never before has so much information about so many people been collected in so many different places. Never before has so much information been made so easily available to so many institutions in so many different ways and for so many different purposes.

Unlike the email that's stored on my laptop, my data shadow is largely beyond my control. Scattered across the computers of a hundred different companies, my shadow stands at attention, shoulder-to-shoulder with an army of other data shadows inside the databanks of corporations and governments all over the world. These shadows are making routine the discovery of human secrets. They are forcing us to live up to a new standard of accountability. And because the information that makes up these shadows is occasionally incorrect, they leave us all vulnerable to punishment or retaliation for actions that we did not even commit.

It's probably impossible to feel out out your data shadow in full detail because significant portions of it are the private domains of companies and government. On the other hand, you could put Google and Yahoo to work finding every page that mentioned you, either by name or by URL or by your company. I think there's been casual interest in that sort of thing so far-- as in, Hey look, I was mentioned on this site today, isn't that nice-- but it could be interesting to set up a definitive record of every mention you ever got. If you were a celebrity or public figure, you'd have a calendar of all your appearances. This would be like that, but more extensive and less predictable. Identity surveillance, rather than ego surfing.

You could extend that surveillance beyond the Internet too by taking a closer look at your junk mail. At one point in the book, Garfinkel talks about using pseudonyms to track the movements of your identity from one company to the next. For example, at my last job I signed up for a trade magazine subscription and included my company name. Although I left that job 2 years ago it still shows up on mailing labels now and then. I'm wondering whether it might even be worthwhile to catalog all the pieces of mail I get over a certain period of time, just as an experiment.

I'd be the last person to use the term "blogosphere" with a straight face, but I do like "datasphere". Garfinkel defines it as "a body of information that describes the Earth and our actions upon it", but I think that's a bit too grandiose. A personal datasphere might be better thought of as all the information that describes your identity. Keeping track of it allows you to monitor your privacy. And that's more and more necessary, as Database Nation points out:

Building the world's datasphere is a three-step process-- one that we've been blindly following without considering its ramifications for the future of privacy. First, industrialized society creates new opportunities for data collection. Next, we dramatically increase the ease of automatically capturing information into a computer. The final step is to arrange this information into a large-scale database so it can be easily retrieved at a moment's notice. Once the day-to-day events of our lives are systematically captured in a machine-readable format, this information takes on a life of its own. It finds new uses. It becomes indispensable in business operations. And it often flows from computer to computer, from business to business, and between industry and government.

Other notable material from the book:

p. 114

In the novel Snow Crash, Neal Stephenson imagines people called gargoyles who walk about, record everything that they see, and upload the information into the Central Intelligence Corporation's massive databanks, hoping that somebody else will find the information useful and buy it.

Tell me that's not a perfect description for anyone who uses del.icio.us or another socially-motivated bookmark service.

p. 155

Clearly, both marketers and consumers want to stop the flow of junk mail and phone calls. The fight that's unfolding is about means, not ends. A growing number of consumers are fighting for restrictions on aggressive marketing practices. But the target marketing industry is taking a different approach. It's using vast reservoirs of personal information to hone its advertising campaigns. It employs deceptive practices to coax sensitive information out of parents and children so that each can be targeted from birth. And it's working to turn every surface and every moment into a marketing opportunity, so that consumers never miss a chance to be properly informed.

p. 247

One of the key mechanisms the technological vanguard has proposed for dealing with information overload is the intelligent agent. The idea of such a program is that it would know your interests and desires and use that information to filter the flood of data pouring into your life, so you only see what you want to see. Although different technologies have been proposed for creating these so-called agents, one of the first technologies to reach the market is called collaborative filtering.

p. 252

The Electronic Privacy Information Center's Marc Rotenberg predicts that next-generation agents will scan the world for personal information about an individual, then construct a predictive model for use by marketers and others. Rotenberg calls this the the extraction of self.

The extraction of self is one of the greatest threats posed by computers to personal privacy and human identity. The profile could know every document you've ever read, every person you've ever known, every place you've ever been, and every word you've ever said that has been recorded. Your identity would no longer exist just inside of you, but in the model. "It would know more about you than you know about your self," Rotenberg says. "At such a point, we don't lose just individuality, we lose the individual."

Back to the index of all blog entries