February 2004 Archives

My Terpstra encounter

| No Comments | No TrackBacks

Last Thursday, Christine and I attended a presentation at BYU given by John H. Terpstra — one of the core members of the Samba open source software development team.

Mr. Terpstra’s presentation was basically about businesses using Samba to cut IT costs. But, beyond this, he talked specifically about his forthcoming new book — Samba-3 By Example: Practical Exercises to Successful Deployment, why software like Samba and Apache are becoming more like commodity products, Linux on the desktop, and the opportunity for entrepreneurial open-source advocates to form companies to provide outsourced IT services for businesses.

The (created with OpenOffice) slides are available at <http://samba.org/~jht/Presentations/FLOSS-UUGBUY-20040219.pdf>.

John also had two draft copies of his new book with him and he was going to give them away at the end of his presentation. He gave one away to an active Samba administrator at BYU. He asked who else the audience thought should get a book and Christine raised her hand as high as she could. John picked on her and she said, “I think you should give to my husband because he’s started a company to provide the services you described.”

So, I’ve got Mr. Terpstra’s book here. I’m looking forward to putting it to work.

Outlook?!

| No Comments | No TrackBacks

I wonder if there is anyone out there who can honestly say, “The more I use Microsoft software, the more I like it.”?

I used to be a “Windows expert” and, to an extent, I still have some expertise in helping people solve their Windows-related problems, but by and large, I can’t stand Microsoft software anymore.

The more I use open source software like Fedora Core Linux, Apache, Sendmail, Perl, Mozilla, GAIM, vim, SpamAssassin, and all the other cool stuff I use, the more I love it. I can do so much with so little because I have access to the code; because I have such simple interfaces to the programs, and because there’s such a broad support base available for me to tap into.

Case in point: Here’s three more reasons why Outlook is not good:

  1. Outlook can’t bounce or redirect e-mail
  2. Outlook’s IMAP support sucks big-time
  3. Outlook can’t save an e-mail message as plain text while preserving full headers.

These reasons make Outlook a major pain in the neck for me. Iodynamics has several clients using Outlook for one reason or another and are getting tired (as we all are) of all the spam they get, so they’ve asked us to install our praiseworthy anti-spam solution (based on SpamAssassin) on their Linux-based mail server.

SpamAssassin works best when you train it to know what your spam looks like. It’s all part of that Bayesian filtering technology. In order to train SpamAssassin to learn what your spam looks like, you need to deliver your spam to SpamAssassin in its most raw form — full headers, just like the mail server itself would see it.

One way to do this is bounce the mail to an address on the mail server that has a program processing messages sent to that address to train SpamAssassin. If you’re the unlucky soul using Outlook, however, you can’t do this because you can’t bounce. You can forward messages, sure, but it ruins the format of the message and makes it useless for SpamAssassin training. Messages need to be unadulterated and I guess you could say that forwarding them, with any e-mail client, “adulterates” the messages.

Another way I’ve come across is configuring Outlook to retrieve mail from the server via IMAP instead of POP — what most people use.

The advantage of using IMAP over POP is it allows users to create folders and mailboxes on the mail server. They can then use a couple of these folders to put untagged spam and incorrectly tagged messages in. Then, a program can run periodically and process these messages and train SpamAssassin using them.

This works great... as long as you’re not using Outlook. Outlook works horribly with UW IMAP which is the most common IMAP server on Unix and Linux systems throughout the world. Outlook users see all kinds of messages about the server closing the connection and garbage like that. Then, there are messages users can’t retrieve from the server or they aren’t allowed to check their mail at all because Outlook thinks they’re not “online” (but they can browse websites just fine.)

So, IMAP isn’t a graceful solution either.

In the past, we’ve just told our clients using Outlook, “just stop using that.” Often it works and we feel a great sense of accomplishment moving them to Eudora, Mozilla, and/or a web-based mail solution like OpenWebMail.

Now we’re faced with the prospect of setting up SpamAssassin for another client — an all-Outlook shop. I keep thinking there has got to be a graceful solution for matching Outlook and SpamAssassin up.

Today, Mike and I talked about it and I asked him if he knew if Outlook could save messages to text files with full headers intact. He said he’d go play on his Windows box and let me know.

When Mike came back, he informed me Outlook could save messages in a couple different formats. One was plain text, but it had abbreviated headers... an adulterated message. The other was a .msg file format which was a much larger file and seemed to be some kind of proprietary file format that contained message information and a bunch of data structures.

So... Outlook is not making this easy.

The (semi-) good news is that I found a Perl script Matijs van Zuijlen wrote called msgconvert.pl that converts a .msg file into a plaintext mbox format (which SpamAssassin likes very, very much).

Now I’m investigating the feasibility of a user process that involves saving untagged spam and incorrectly tagged messages to a Samba share on the mail server where they’ll be used for SpamAssassin training after being converted from .msg files to mbox format.

About this Archive

This page is an archive of entries from February 2004 listed from newest to oldest.

November 2003 is the previous archive.

March 2004 is the next archive.

Find recent content on the main index or look in the archives to find all content.