December 14, 2004  ·  Lessig

So the most significant change in my technology-related life in the last year is the elimination of spam without a white-list technology. I used to use Mailblocks for my main account, but Marc Perkel convinced me to try his own Bayesian spam filter.

I’m on record saying such systems could never work. I was wrong. Marc’s system is amazing. I get endless email. His system filters the mail into three boxes — my inbox, a low probability box, and a high probability box. I have never found a mistake in the high probability box, so I no longer look at it. I very rarely find a mistake in the low probability box, so I scan it about once a week (maybe 1% error). And it is almost fun to get an error in my inbox, reminding me that there still is this problem of spam out there.

Anyway, I’m giving Marc’s spam filter service to my family for Christmas (no, they don’t read my blog). And I’d recommend it to anyone else out there looking for a gift (note, I don’t have any financial interest in this). As Marc described to me:

I sell it as a service. I can do it several ways. If someone wants a single email address I can give them a something@marxmail.net account. $25/year. Or I can host their email domain for $95/year. Or I can be a front end spam filter where I clean it and pass it on to their existing email server $75/year.

You can reach him for at this MarxMail address.

  • http://www.spamassassin.org d.

    Anyone curious as to how this is done should note that, based on his mailing list posts, Marc is probably just using SpamAssassin with the threshold turned down a bit. I get 900-2000 spams a day on average, and with the SpamAssassin threshold set to 3.1 and a small tweak to Bayesian scoring only 4-5 a month slip through. Since I don’t have the time or patience to sort through a “maybe” folder, I just auto-generate bounces instructing folks to resend with a “magic word” subject for borderline messages — which happens about twice a year.

  • http://www.lawhacker.com Andrew Greenberg

    I think you mean, “Bayesian,” not “Boolean.” The latter term refers to an algebra based upon binary logical values, named after George Boole. The former refers to a theorem of probability theory, dealing with conditional probability, referring to Thomas Bayes.

    Maybe something else is happening with Marc’s filter that makes it Boolean, but most probabilistic spam filtering mechanisms with which I am familiar work by reference to a pseudo-Bayesian analysis.

  • http://tim.blog.kosmo.com Tim A

    Sounds like Marc’s service is very very similar to ours. We’ve been doing this for well over two years for both SMTP and POP3 email (we use the same back-end anti-spam engine for both our SMTP and POP3 service). We also classify to LOW, MEDIUM and HIGH. We also have additional features that make handling the email a bit easier like NOT delivering HIGH as we have never found an email classified HIGH incorrectly. So a lot of clients have determined they are willing to take the chance and not even receive that email.

    We get the very same reaction to people when they first try it out and wonder why they haven’t done something about their spam problem. I still see posts from people who are complaining about spam and I’ve offered trials for free (our POP3 service has always been a free 2 week trial but SMTP is on an customer by customer basis). I’ve offered it as a free trial to these people that complain time and time again but never seem to do anything about it here – http://tim.blog.kosmo.com/blog/_archives/2004/9/26/149860.html.

    I won’t mention the service directly so as not to be labeled as putting out comment spam here but you can figure it out easily enough by visiting my blog here – http://tim.blog.kosmo.com.

    Just to comment on a previous posters comment: Marc openly admits to using SpamAssassin as his backend.

  • http://zgp.org/~dmarti/ Don Marti

    If the spam filter is so great, why the disposable address?

  • Peter O

    Paying money for a bayesian filter seems extreme – the technology is quickly becoming free, stable, and high quality. At work I am forced to use Outlook, so I installed SpamBayes, which has worked without a single error for months. My mail volume is low (on the order of tens per day, total) but nonetheless.

    Paul Graham has an article that shows that a purely Bayesian approach is every bit as good and sometimes better than a bayesian + rule based approach. Counterintuitive, maybe, but very encouraging.

  • http://www.sidney.com sidney

    Peter, the technology may be free and open source, but it is hard to imagine Lessig being able to do any setting up and maintenance of the free software for less than $95 worth of his time per year.

    I’m one of the nine active committers to the SpamAssassin project, I have six different installations of SpamAssassin in various operating systems for development and testing, yet all my mail goes through my ISP’s mail server and their SpamAssassin configuration. They get to decide when a new version of software is stable enough for an upgrade, they get to make backups, they get to install redundant hardware and connectivity and power, they get to set up alarms to page on call staff when something crashes at three in the morning. And I get to receive spam free mail even if I do something stupid to my laptop.

    Other of the SpamAssassin committers are more into “eating their own dog food”, running their own mail servers and filtering their own spam. I have more fun tweaking and breaking things and not having to be quite so careful until it is time to actually complete and commit a change to the sources.

    As for pure Bayes vs rules+Bayes, in practice just about all the spam filtering tools work so well that if you find one that you can use given your constraints of computer, mail client, mail server, technical expertise, time, and money, then you will be happy with it. For Lessiig, it sounds like Marc’s service is perfect. For me, my ISP is, and for you SpamBayes is.

  • Andrew Boysen

    Does anyone know if Gmail or Apple’s mail program use a variety of a Bayesian spam filter? I’m always impressed with the accuracy of their filters, and it’s improved by your ability to teach it what is and isn’t spam, though I’m not sure if Gmail’s filter learns on a system wide basis, or if it just figures out what each person considers spam.

    I’m also curious about the legal implications of spam filters. Can an ISP ever be held liable for filtering out important e-mails by accident? Does it matter if the customer has or hasn’t signed a waiver? What if an ISP puts a filter on outgoing mail? Will an ISP ever be required to filter outgoing mail?

    I know this is for comments – I hope questions are allowed as well.

  • Denisov

    Since Rod Smolla took the time to mention you in a slate piece on the MGM Studios Inc. v. Grokster Ltd. and StreamCast Networks Inc. case, I was wondering if you had any reaction to the arguments made, or the piece itself?

    Link is:
    http://slate.com/id/2110982/

  • lessig

    Stupid of me. Right you are. Bayes.

  • Jer

    Bayesian filtering is pretty amazing – I’ve been using it on my server pretty successfully for a few years. What continued to get me though is the bandwidth – my email bandwidth was out of control. If you pay for your email bandwidth (don’t know if you have your own server or go through your ISP’s server), check into greylisting. It’s cut down on over 90% of spam and viruses for me, and it does it before it receives the message from the other server, so I’ve also cut my mail bandwidth down to about 10% what it used to be.

  • http://www.ii.com Nancy McGough

    A couple comments: I think it is more accurate to refer to Mailblocks technology as “Challenge/Response technology” rather than “white-list technology.” There are systems that use “whitelists” but do not use C/R and what Mailblocks is known for (and sometimes disdained for) is C/R.

    Another comment: I maintain a list of IMAP Service Providers and when I get back home I’ll add the provider that you mention and also the providers that the commenters mention above. Before you or anyone chooses an IMAP provider, I recommend that you read my list of

    What to Look For in an IMAP Service Provider

    For example, make sure you get a provider that does backups and restores of all your mailboxes!

    Hope this is helpful,
    Nancy