May 2006 Archives

I've been so busy lately, but tonight I took some time to get caught up on the Utah Open Source Planet and, I must say, there was lots of good stuff to read. Thanks to y'all sharing your knowledge. You rock.

I thought I'd pick on one of my favorite UOSSP bloggers, Aaron Toponce, but not in a negative way. I read his semi-recent entry about using HTML entities to obfuscate web site data in an attempt to foil robots -- particularly robots intent on harvesting e-mail addresses and other information.

Some years ago, I implemented this technique on several sites, personal and professional. It seemed to make sense the average spammer/data-harvester, was not going to implement the code necessary to de-entity-ize the content in search of e-mail addresses. In retrospect, however, I think that's a poor assumption.

See, spammers have money and they give their money to poor souls who will write code for money and, in many cases, have the smarts to pull it off. So, semi-smart coders tasked with maximizing the pool of e-mail addresses gleamed from a vast array of websites will very quickly implement techniques to foil the simplest of data obfuscation techniques. Converting text to HTML entities has got to be one of the first obfuscation techniques they are faced with circumventing.

After that, they probably implement simple OCR techniques to gleam data from sites that convert all their e-mail addresses into text rendered as image files.

That said, this HTML entity-based obfuscation technique is better than nothing, right? Because spammers like their pools of e-mail addresses to be fresh, it usually only takes a couple of weeks to see if any anti-spam technique results in a significant reduction of incoming spam, so it's easy to verify your technique is working. When we implemented the HTML-entity based obfuscation technique, there was a decrease in the amount of spam, but there was still plenty of spam.

If you're interested in playing with ways of automating the process of converting text data to a string of HTML entities, check out the HTML::Entities Perl module -- part of the comprehensive HTML::Parser distribution of modules.

Once you have this installed, you can do something like this:

perl -MHTML::Entities -ne 'print encode_entities($_, "\32-\255")'

For the Perl head-scratchers, this is a one-liner that loads the HTML::Entities module, wraps a loop around reading from STDIN or a filename parameter, and prints the result of the encode_entities() function call for each line of input read. Hit Control+D to get out of it.

[foo] /home/fozz 19 % perl -MHTML::Entities \
-ne 'print encode_entities($_, "\32-\255")'
Aaron Toponce
Aaron Toponce

When it was clear the HTML entity-based obfuscation simply did not have what it takes to win against increasingly smart harvesting bots, we deployed a CAPTCHA solution using the Authen::Captcha Perl module for our clients that really needed/wanted to publish e-mail addresses on their websites. This solution has worked out much better and, paired with educating users about the risks of leaving your e-mail address on websites, we've seen more significant decreases of incoming spam.

Froggy Blog

| No Comments | No TrackBacks

Wow. Some great responses on the Blue Security thing.

I've learned a thing or two about it and my animosity is somewhat dampened. I apologize for any offense I may have caused with my zeal and energy (it happens).

My still-looming concern about Blue Security's tactics deal with the scenario of the spammer that doesn't provide the option to opt-out or the spammer who only pretends to support opt-out but, really, ignores all requests or uses those requests to validate the e-mail addresses. I believe these scenarios may result in network abuse.

I applaud Blue Security for thinking out of the box, but I don't think we are quite there yet.

If you don't follow the Utah Open Source Planet or Aaron Toponce's blog, this post may mean nothing to you.

Aaron's been spamming -- for lack of a better word -- the Utah Open Souce Planet with post after post about something called "Blue Frog" from a company called Blue Security. I responded to his first post on his site, but my comment never showed up. I guess Aaron wants to keep all the feedback on his site positive and complementary to his views. ;-)

Anyway- this Blue Frog business is really shady stuff- fighting spam with tactics that really add up to abuse of network resources. It's possible Aaron is too young to remember when network abuse was a far more serious topic -- when the Internet wasn't quite as as robust as it is today and a concentration of traffic, malicious or not, could bring down networks for an entire educational institution or geographic region.

The flaw with Blue Security's tactic is that it will only work against spammers that are semi-legitimate -- who have their own mail servers, mail administrators, etc. Of course, these spammers may not be spammers at all. These organizations may be perfectly legitimate companies sending out targetted e-mail to interested parties. It's a grey area, but these organizations aren't the ones trying to get you to buy smallcap stocks, viagra, or kiddie porn.

The spammers that, in my opinion, are the plague of the Internet, won't be stopped by Blue Frog, Polkadot Frog, or Aaron The Frog because they operate covertly using free or compromised accounts, spambots, or compromised websites or e-mail servers. Targeting the source of these kinds of spam messages with many opt-out requests is useless. Not only this, but Blue Security forgets that bandwidth still costs money: The ISPs between Blue Frog and these spam sources are all on the line - providing bandwidth in an honor-type agreement with each other.

If more Blue Security-like tactics begin to appear, the trust agreements between Internet backbone providers will likely begin to disintegrate.

Iodynamics' clients don't really get much spam. Their mail servers use a combination of SpamAssassin, MIMEDefang, and a greylisting milter for Sendmail.

Greylisting is, perhaps, one of the most interesting ways of stopping spam from reaching its intended recipients and it works based on a principle that also makes Blue Security's tactics worthless: Spammers don't use real SMTP servers.

When greylisting is in effect, it postpones delivery of messages from upstream addresses it hasn't dealt with before. If the upstream server attempts to deliver the message again, the address is then whitelisted. Many spamming systems don't honor these postponement requests and, as a result, they simply don't attempt to redeliver the messages. For the same reason, they will be completely oblivious to opt-out requests.

In closing, I think I may speak for the entire Utah Open Source Planet readership in saying that I hope this is the last time we have to read about Blue Security or Aaron's Frog issues.

About this Archive

This page is an archive of entries from May 2006 listed from newest to oldest.

April 2006 is the previous archive.

June 2006 is the next archive.

Find recent content on the main index or look in the archives to find all content.