Spam: Issues, Information and Resolution

Overview

Problems

How to Help

Upcoming Events

Statistics

 

Overview

Because we, as an ISP, must be cautious not to delete legitimate emails, yet provide filtering of spam, we are in something of a no-man's land in terms of balancing performance of filtering against delivery of legitimate emails.  To that end, we have adopted the following:

1) All emails through the system are scanned

2) The scanning, based on sender, keywords and a number of other factors, assigns a score to the email

3) If the score is below a certain threshold (eg a score of 2.0), simply send the mail.

4) If the score is between that and a cutoff score (eg a score of 8.0), mark it as suspected spam, but deliver it.

5) If the score is above the cutoff, do not deliver it.

Testing reveals that fewer than about 1 in 100,000 legitimate emails ever reach a score of  8.0.  However, many will score between 2.0 and 8.0, depending on their content.  

Anything between those two scores is "suspected spam" and marked as such.  Sometimes it means we'll mark a legitimate email this way.  While that may seem amusing - for example, marking our own billing emails as "suspected spam" which has happened a couple of times - it also means that the system is, in fact, scanning as much as possible to reduce the spam delivery as much as possible.

 

Problems

The main problem with any spam filtering system is, how do you tell it what is and isn't spam?  Keep in mind, the spammers are actively doing everything they can to bypass the filters, so it is a constant battle.

One way is to "train" the system on what is and is not actually a spam.  To this end, we set up two email addresses; one to report spam (spamdrop@imagen.ca), the other to request that users be "whitelisted" (whitelist@imagen.ca) - meaning all email from them should be seen as valid.

However, users have been forwarding *spam* to the whitelist - meaning they have been training the system to *allow* spam to come in.

Yes, that's right.  People - our own customers - have been training the system to *accept* more and more and more spam.  As a result, the spam filters, as of about a week ago, took a massive nosedive in efficiency.

We are now trying to fix that, to clean up the mess.  Also see what we're planning under "Upcoming Events".

 

How to Help

The best things users can do to help, right now, is to feed us spam as fast as possible.  Whenever you have an email which is, in fact, a spam - not simply unwanted, but actual spam - forward it, as an attachment, to our spamdrop@imagen.ca address.

The key points here are two: first, send the email as an attachment.  The system completely ignores emails without attachments, and only examines the attachments themselves for the details of what is a spam; so simply forwarding the email "inline", or copying it to a new email, etc, won't work.

So.  Forward spam, as attachments.

Second... the whitelist@imagen.ca address.  The idea behind the whitelist is simple enough; if someone's emails are getting flagged as spam and shouldn't be, just send their email address to whitelist@imagen.ca.  Not a copy of the email.  And most certainly not spam or spammer email addresses.  The whitelist is for people we *want* to get email from.

So.  Send their address.  Only their address.  And only if they're people we want mail from.

 

Upcoming Events

Given the difficulties involved in all this, we are currently in the testing stage of deploying a more user-interactive spam management system.

We're hoping to include a simple "spam/not spam" marking mechanism, personal whitelist and blacklist management and the ability to retrieve quarantined emails (eg those determined to be actual spam and not delivered).

In short, an interface that ties into our webmail system and provides the ability to define and manage spam on a per-user basis, to per-user tastes.

We will keep you updated.

 

Statistics

Since going online with the new system, we have been getting up to 2400 spam per minute.  That is some 40 per second.

This does not include emails which were rejected at point of entry (for example, mail from those senders listed in the various realtime black hole lists) or rejected because of invalid recipient addresses; that's just the emails actually making it through to the point of being scanned and determined to be either probably spam or actually spam.

That is  up to 3.4 million spam per day.

3.4 million spam sent by people actively trying every possible dirty trick to get past any and all filtering.

Despite that, we manage to detect and tag or eliminate well over 90% of it now, and - despite the hiccup caused by some people whitelisting actual spammers - the system has been steadily increasing in effectiveness.

With the new interface which should be going in shortly, we expect another major leap ahead in performance and flexibility.