Archiving Email to a Database
Posted on January 2nd, 2006
I've been reading about the MyLifeBits Project recently and watching a couple of video interviews that show the software in action. Email is one of many things being archived here. What impresses me about this system is the level of recall it gives you-- the idea that even the most vague idea of what you're looking for is enough clue for the system to both locate it and correlate it to other things and present the results back to you in a useful fashion.
Each type of "thing" that goes into MyLifeBits can be thought of as a collection. Your email is a collection. All the books you own, the documents you've written, photos you've taken, et cetera, are all collections. MyLifeBits says, Give me everything and let me take care of the details. The user's only chores are putting stuff in and getting stuff out. The computer handles the goo in the middle.
During one of the interviews I was watching, the idea of "data silos that can't talk to each other" came up. As in, I can't tell you what music I was listening to during my last freelance project because iTunes' database is worlds apart from the database where I track my hours. Likewise with email. Everything is its own little world and serviced by an application that's tailored to a discrete set of needs. Cross application communication isn't usually one of them.
I don't know if I could, or would want to, completely mimic the MyLifeBits effort. But I do like the idea of getting data silos talking to each other. Seeing as how it's the end of the year and all, and I've got a lot of email lying around that ought to get archived, getting that particular collection in shape seems like it would be a good start.
Unfortunately the Maildir based, IMAP driven setup that I have doesn't really lend itself to cross application communication. Finding old email is a real pain in the ass. Several months ago I started forwarding incoming mail to a Gmail account, thinking I could avoid the search problem by getting Google to take care of it for me. It does an admirable job, but after looking at MyLifeBits I see that the cost for that search capability is separation. I can't query my Gmail archive and do crazy correlations with other collections of personal data the way I could if I had everything "in house". And keeping things in house leaves me with less than perfect search options. For a while I had an indexing program that crawled through my email on a scheduled basis and allowed fast searches, but I ultimately dumped it because it sometimes choked on messages unexpectedly and also had a query syntax that I never bothered to learn. Way too much effort.
Why not funnel incoming email into a relational database though? I looked around and didn't see much mention of people doing this. It would solve the search problem, because I could write ad-hoc queries. It would be accessible over the web, so I wouldn't be bound to a specific desktop running specific software. I like it.