Trawling the mail archives

A cheap archive is only as good as getting back information from that archive.

We built a Mail Archiving solution using a spare VM box, disk space, OpenBSD and Postfix, and Procmail. but it isn’t that useful if all you’re going to do is put to tape and tell everyone you have the archive.

How do you actually make use, trawl, the archives and retrieve information from the archive when users have a bad mail day and need to retrieve mail that you have hidden on that tape?

The basics of our configuration is we have a separate machine (the archiving box.)

Postfix recieves mail

Postfix as the Mail Transport Agent (MTA) is configured to:

  • Forward all messages to their destination
  • Accept all mail from mail server(s)
  • BCC Deliver a copy of the message to a local account
  • Forward all mail to the next destination: oblivion

Procmail for archiving

For our local mail delivery, the local user account forwards processing to procmail.

Procmail stores the messages in a predefined folder/filename structure that meets our business archiving needs (e.g. year/month/day)

Trawling the archives

And we finally get to the strategies for making use of those archives.

I’ve been [vacillating](“To sway from one side to the other; oscillate. To swing indecisively from one course of action or opinion to another.”) on a search engine installation, originally drooling over htdig and various failed install attempts, to Apache Solr gaining traction and recently documented for FreeBSD by BSDMag

I still need to use a search engine at some point, because the archives are growing to 30GB+ a month, but in the meantime I got a job to extract mail for userX over 5 days.

I’ve documented how I did that in trawling the archives and it boils down to using procmail and formail (part of the procmail package) to wade through the messages and suck out the messages that met my criteria (to userX).

We already had the messages separated by date, so it was just a matter of feeding those days of messages into my procmail recipe and getting the mail that our user wanted.

Once the recipes are built, the whole process is relatively fast and pain free. We even have a work-around for getting that archive mail to our Outlook friends.

Our recipe is rather simple, but it highlights the flexibility you have to trawl the arcives with your own dig(solr)ing into the procmail recipe book.