Filtering UCE with Bogofilter
Table of Contents
- 1. Introduction
- 2. Why use Bogofilter?
- 3. Installation
- 4. Seeding the Filter
- 5. Wrapper Script
- 6. Procmail Modifications
- 7. Mailer Modifications
- 8. Global Installation
- 9. Upgrading Bogofilter
- 10. Resources
1. Introduction
Bogofilter is a nice UCE filtering tool that uses Bayesian statistics
to track messages and learn what to detect. Bogofilter was originally
written by Eric S. Raymond.
Some sites of interest:
-
http://www.paulgraham.com/spam.html
-
http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
The above sites include the Bogofilter home page and a couple of sites
that discuss Bayesian statistics. Caution, the sites contain math,
which may or may not be desirable in your case. If you prefer, just
consider Bayesian statistical calculations "magic". That’s what I do.
2. Why use Bogofilter?
There are now several products available that do what Bogofilter does.
SpamAssassin and SpamBayes are two popular ones. So why choose
Bogofilter? Bogofilter is written in C, which means it is slightly
more robust when it comes to execution. Other filters are written in
Perl and Python, which offer other advantages, but speed usually isn’t
one of them.
I like Bogofilter simply because it’s small and doesn’t rely on a lot
of external support software.
3. Installation
Bogofilter is super easy to install. The project appears to be
offering RPM packages now. If that floats your boat, download the
package and you’ll be up and running.
If you suffer from the Not Compiled Here problem like me, grab the
source and compile and install it:
gzip -dc bogofilter-0.10.3.1.tar.gz | tar -xvf -
cd bogofilter-0.10.3.1
./configure --prefix=/usr/local
make
make install
For those that like to compile things, but the above steps appear
scary, grab the source RPM and let RPM compile it for you.
4. Seeding the Filter
Bogofilter scores spam and stores the results in two databases: the
good list and the spam list. These are BerkDB files that grow as you
use bogofilter. You must seed bogofilter for it to be useful. There
are several ways to do this. Manually can be painful. Getting a DB
dump from another bogofilter user is handy. If you get your hands on
other DB files, you need to dump them to text first and then load them
on your system:
# On the source machine
bogoutil -d goodlist.db
goodlist.txt
bogoutil -d spamlist.db
spamlist.txt
# On your machine
cat goodlist.txt | bogoutil -l goodlist.db
cat spamlist.txt | bogoutil -l spamlist.db
Using another set of data for your seed may or may not be a good idea.
Be sure to think about this before doing it. Ideally you should seed
your particular bogofilter installation with UCE that you have
received. To seed bogofilter by hand, take your mbox file (or
collection of email files) and pipe them through bogofilter with the
-s option if it is spam, -n if it is not spam. The formail(1) tool
is handy for doing this.
5. Wrapper Script
I use a wrapper script to invoke Bogofilter which currently just forces
the configuration path. At one point in time, it was forcing some
other settings. I still use it, and it is simply:
#!/bin/sh
/usr/local/bin/bogofilter -d /usr/local/etc/bogofilter/ $*
exit $?
The script is root:root and 0755.
6. Procmail Modifications
Bogofilter hooks in to procmail with ease. The man page for bogofilter
gives a good procmailrc example. Here’s what I do:
VERBOSE=yes
LOGDIR=$HOME/.procmail
LOGFILE=$LOGDIR/log
# Scan for spam
:0fw
| /usr/local/bin/spamfilter -u -e -p
# Return mail to queue on bogofilter failure
:0e
{ EXITCODE=75 HOST }
# Place in SPAM mbox if it's spam
:0:
* ^X-Bogosity: Yes, tests=bogofilter
SPAM
7. Mailer Modifications
The man page provides some macros for Mutt that let you handle UCE That
bogofilter didn’t catch. I have Mutt configured so that if I hit
Esc-Del, the message is forced through bogofilter flagged as spam.
Pressing just Del will delete the message.
8. Global Installation
Global installation can be done several ways. No special steps are
required other than just installing hooks in the global procmailrc
file. If you want users to be able to train bogofilter with spam that
wasn’t caught, you will need to make bogofilter setuid root or create a
user and/or group that bogofilter runs as and change the database files
to that user/group.
I recommend that you install bogofilter under your account only rather
than globally.
9. Upgrading Bogofilter
From time to time you will want to download and install a new version
of bogofilter. The authors make upgrades easy with the bogoupgrade
command. This command upgrades your data files. You still need to
compile and install the new version, but they always provide a tool to
upgrade the BerkDB files. Be sure to check the man page for details.