Sed and Awk: Editing Streams for Fun and Profit

Table of Contents

A common task, in shell programing and elsewhere, is to take a stream of
characters and somehow modify it or extract data from it. Two powerful tools
that UNIX offers for this purpose, both using the magic of regular expressions,
are sed and awk.

1. Regular Expressions

expressions give the user the power to match any regular language while managing
to be completely unreadable and incomprehensible. To further complicate things,
there are two types of regular expressions defined by POSIX, basic and extended,
and no two tools, or even two implementations of the same tool, seem to be able
to agree on what the difference really is. For the most part, though, sed and
awk implementations are at least compatible with the POSIX definitions, even if
additional layers and features are added on top.

A regular expression is
used to match some portion of a string. At its most basic, a regex is just a
substring. So the string "Caution! Contents may be hot!" contains
matches for the regular expression


as well

may be

or even


Regular expressions are case sensitive, so the string contains no matches


Simple, no? But not very powerful. Let’s
add few extra characters. If a circumflex (‘^’) is the first character of a
regex, it will match the beginning of the string. Likewise, if a dollar sign
(‘$’) is the last character of a regular expression, it will match the end of
the string. So the regular expression


would match Caution! in the above example string, but not the in the
string "Wet floor ahead, Caution!". Similarly, the expression


would match the Caution! in the second
string, but not in the first.

Now let’s say you want to match more
than one possibility for a character. Characters between brackets (‘[‘ and
‘]’) are treated as a list of possible characters to match. So
[abcd]‘ would match a single character, and that character may
be ‘a’, ‘b’, ‘c’, or ‘d’. Bracket expressions can also use ranges, so the
previous example is equivalent to ‘[a-d]‘, though constructs
such as this are sometimes bad for internationalization. More on that

Bracket expressions may also by negated using the circumflex
(‘^’) as the first character. So ‘[^abcd]‘ would match a single
character that is not ‘a’, ‘b’, ‘c’, or ‘d’.

A few
exceptions are needed in order to match the characters ‘]’, ‘^’, or ‘-‘ in a
bracket expression. To match a ‘]’, make it the first character, after the
circumflex if one is used. So something like ‘[]abcd]‘ or
[^]abcd]‘. To match a ‘-‘, make it either the first or last
item in the list, after the circumflex. To match a ‘^’, just put it anywhere
except up front.

Another useful way to match multiple possibilities
for a character is the period (‘.’), which will match any character. So the
regular expression ‘Ca.tion would match both
"Caution" and "Caption".

Note that any of these
special characters can be escaped with a ‘\’ to remove their special
meaning, so the expression ‘\.‘ would match a period
character. ‘\\‘ matches a backslash character.

than one character may be matched at a time using the asterisk ‘*’. An
asterisk following a character or a bracket expression will match zero or
more instances of that character or bracket expression. For example,
let’s say you’re programming in LISP for some reason, and want to match
every possible car and cdr expression. You could do this using
c[ad]*r‘ which will match "car", "cdr",
"caadr", "cdaar", and everything else. However, it also
matches "cr", which probably isn’t something you want. You can
avoid that using ‘c[ad][ad]*r‘ which forces at least one
instance of [ad] to exist for a match, but there is a cleaner way that
we’ll look into later.

Regular expressions match repetitions
greedily, meaning that it will match as long a string as it possibly can.
So the regular expression ‘.*power‘ applied to the string
"My power supply is not powerful enough" would match "My
power supply is not power

If an expression is enclosed in
escaped parenthesis (‘\(‘ and ‘\)’), the entire enclosed expression will
be treated as a single element. So the regex ‘\(bob\)*
would match a string of zero of more bob’s. In addition to allowing
better groupings for repetitions, the text matched within a parenthesis
group may be uses later in the expression with \digit>, with the
first parenthesis group (ordered by the beginning of the grouping) being
\1, the second \2, and so on through \9. So
\([Bb][Oo][Bb]\)\1\1‘ would match "BOBBOBBOB" and
"BoBBoBBoB" but not "BOBbobbob".

A specific
number of repetitions can be specified by adding appending a number
enclosed in escaped curly braces (‘\{‘ and ‘\}’) to an expression. So
"BOBbobbob" could be matched using
\([Bb][Oo][Bb]\)\{3\}‘. Ranges can also be given as
\{start,end\} to match between start and end repetitions
inclusive, or ‘\{start,\}‘ to match at least start
repetitions. POSIX does not specify behavior for ‘\{,end\}‘,
but pretty much everyone implements it.

For Basic Regular
Expressions, that’s about it. Extended Regular Expressions treat
unescaped curly braces and parenthesis as the special characters, and add
a few more special characters of their own.

If two expressions are
separated by a vertical bar (‘|’), then either expression will be matched.
So ‘(bob|jimmy) would match either bob or jimmy.

addition symbol (‘+’) can be used to match one or more of an expression,
so ‘c[ad]+r‘ would solve the problems of the car and cdr
example above. ‘expression+‘ is equivalent to
expression{1,}‘. A question mark (‘?’) following
an expression will match that expression zero or one times. So
expression?‘ is equivalent to

Another nice little
feature not defined in POSIX but implemented by pretty much everyone is
that escaped angle brackets (‘\‘ and ‘\>’) can be used to match
the beginning or the end of a word. So the expression
Caution\>‘ would match "Caution" and not

I mentioned earlier that using ranges in a
bracket expression is bad, and this is because not all character sets are
created equal, or even contiguous. So while something like
[A-Za-z]‘ may match all letters in ASCII, but it wouldn’t
match things like , and who knows what it might do in something
like EBCDIC. To solve this problem, equivalency classes were created, and
given an even more horrible and confusing syntax. If something like
‘[:alpha:]’ occurs in a bracket expression, this matches any character
that would return true for isalpha() in the current locale. Note that the
brackets around the equivalency class are additional brackets, not the
ones already around the bracket expression. So, in ASCII
[[:alpha:]]‘ is equivalent to ‘[A-Za-z]‘,
[[:lower:][:digit:]+=*]‘ is equivalent to
[a-z0-9+=*]‘, and so on.

Further complications are
introduced with collating elements, but that gets more into
internationalization than I care to cover in this article.

So, now
that you’re a regularly matching fool, what next?

2. Sed, the stream editor

When given an set of rules and some
inputer, sed will read a line of the input, modify it according to the
provided rules, output the modified form, and repeat until the input is
gone. The most common use of sed is to replace a regular expression with
some other string, like


which will
replace the first foo on each line of the input with bar. If you want to
replace every foo on each line, add a ‘g’ after the replacement


The sed commands are often
provided along with the invocation of sed, as in


Sed uses basic regular expressions, so it requires
that the special characters be escaped with backslashes, otherwise they
are interpreted as the literal character. For example, to use parenthesis
to group elements, they must be used as ‘\(stuff\)‘. GNU sed
defines an "extended regular expression" mode which eliminates the need to
escape these characters, but at the cost of portability. GNU sed also
allows for ‘?’ and ‘+’ in regular expressions, though they must be escaped
(‘\?’ and ‘\+’) if -r is not being used.

The choice of ‘/’ as the
separating character above is arbitrary; any character could be used.
Another common choice is to use ‘%’ to avoid having to escape large
numbers of ‘/’ in the expression or in the replacement text. So
s/regex/replace/‘ is equivalent to

Another option that can be
appended to a replacement, like ‘g’, is ‘p’, which will print the line to
stdout if a replacement was made. This should only be used if sed is
invoked with the ‘-n’ flag, which will cause sed to print nothing unless
explicitly requested with a ‘p’. POSIX does not specify whether lines
printed with ‘p’ should be printed again, so depending on the sed
implementation, some lines may be printed twice.

2.1. Line addresses

Addresses may be specified before the command to limit on
which lines the command will be executed, such as in
12s/foo/bar/‘ which will replace foo with bar, but only on
the 12th line of the input.

Addresses may be a line number (’12’),
a regular expression enclosed in slashes (‘/c[ad]*r/’) which will match
any line containing the expression, or the dollar sign (‘$’) which matches
the last line. A range may also be given as addr1,addr2. If regular
expressions are used in an address range, the first line that matches the
regular expressions will be used. If the first address in a range is a
regular expression, matches for the second address will be checked
beginning with the next line.

The choice of ‘/’ characters to
delimit regular expression addresses is not necessary, but if another
character is used, the first one must be prefixed by a backslash, since
otherwise it will be interpreted as a command. This character does not
affect the delimiting character in ‘s’ commands, so something like
\%c[ad]*r%s/r//‘ is valid.

2.2. Other commands

Other useful commands are ‘d’, which deletes
the line matching the address, and ‘p’, which prints out lines matching
the address, or every line if no address is given (again, this should only
be used in conjunction with -n, since the behavior otherwise is
undefined). These three commands will make up nearly all of your usage of

The only (portable) command line options that sed accepts
besides -n are -f script-file, which reads in a script from the
given filename, and -e script, which adds the given sed command
to the script to be executed. If -f or -e are given, then a sed command
cannot be given as an operand without -e, since otherwise it will be
interpreted as a filename. If multiple -f or -e commands are given, they
are evaluated in order.

Multiple filenames may be given, and will
be concatenated in order and run through the sed program. stdin is only
used if no filenames are given.

3. How sed really works

Sed has two memory spaces, the hold space
and the pattern space. For each cycle, the pattern space is cleared, a
line of input is read into the pattern space, the program is run, and, if
the -n flag was not given, the final contents of the pattern space are
written to the output. This repeats until all input is read, or until
execution is terminated with the ‘q’ command. Nothing is ever
automatically placed in the hold space, but there are several commands to
manipulate it.

The ‘s’ command, in addition to being the most
useful for actual text processing, can also be used for conditional
branches. A branch point can be defined using ‘:
‘, and the command ‘t LABEL
will branch to this label if a successful substitution has been made since
the last branch or input read. ‘b LABEL‘ is the
unconditional counterpart. If no label is given to either t or b, they
will jump to the end of the script, which is useful for starting a new

Using all of this, powerful, incomprehensible programs may
be written, like the implementation of the dc calculator shipped with the
GNU sed source, or the following very short text adventure:

# Should be runnable either with or without -n
# Only
commands supported are directions, since I didn't want this to get
three miles long
# Trying very hard to use only BREs
Look text shamelessly stolen from Infocom's ZORK

# restore
# x exchanges hold and pattern spaces
# Each room must
exchange back to read input
t room1
t room2
b room0

# North goes to room1, south goes back
to room0, southeast goes to room2
: room0
# i\ outputs text
up to first line without trailing '\'
# '{' and '}' commands are used
to create groups matched by a
# single address
# expression
matches line containing word "look" optionally surrounded by
whitespace, and nothing
You are in
a maze of twisty little passages, all alike
b end
Matches optional leading "go" and word "n" or "north"
# directions
work by putting room name in pattern space, and if substitution
was made, the room name is copied to the hold space and the pattern

Line Break\([Oo][Rr][Tt][Hh]\)\{0,1\}[[:space:]]*$/room1/

Line Break[[:space:]]*[Ss]\([Oo][Uu][Tt][Hh]\)\{0,1\}[[:space:]]*$/room0/

No '|' in BREs, so need two expressions for 'se' and

Line Break[Ss][Ee][[:space:]]*$/room2/

Line Break[Ss][Oo][Uu][Tt][Hh][Ee][Aa][Ss][Tt][[:space:]]*$/room2/

b badend

# South goes back to room0, North goes to
: room1
# Matches any line that begins with the
word "look
West of
You are standing in an open field west of a white house,
with a boarded\
front door.\
There is a small mailbox

Line Break\([Oo][Rr][Tt][Hh]\)\{0,1\}[[:space:]]*$/room2/

Line Break[[:space:]]*[Ss]\([Oo][Uu][Tt][Hh]\)\{0,1\}[[:space:]]*$/room0/

b badend

# East wins and quits, West goes to
room0, South goes to room1
You are standing in front of a massive barrow of
stone.In the east face is a\
huge stone door which is
open.You cannot see into the dark of the tomb.
# delete input so not printed when

Line Break\([Aa][Ss][Tt]\)\{0,1\}[[:space:]]*$//


Line Break\([Ee][Ss][Tt]\)\{0,1\}[[:space:]]*$/room0/
Line Break\([Oo][Uu][Tt][Hh]\)\{0,1\}[[:space:]]*$/room1/

b badend

: win
You win!
# d starts
a new cycle, so is not good for deleting pattern space and
# there will be an extra newline printed out at the end if
-n not used

: badend
# assumes all unknown commands
are directions, for brevity
# strips off leading "go", prints out
# does nothing if there is no
/./s/\(^[[:space:]]*go[[:space:]]*\)\{0,1\}\(.*\)/There is no
exit to the \2/p
b end
: copyend
# h replaces the hold
space with the contents of the pattern space
: end
delete whatever is left in the pattern space so it is not

Interaction with this little script may look
something like this: (input bold, output italic)

$ sed -f
are in a maze of twisty little passages, all alike



You are standing in front of a massive barrow of
stone. In the east face is a

huge stone door which is
open. You cannot see into the dark of the

You win!


Another command that is useful to know in sed is ‘N’,
which reads another line of input and appends it to the pattern space.
This can be used to match multi-line expressions. However,
considerations must be made for lines of data read in unusual

# replace all 'one\ntwo' with
: begin
# if line
read has 'one\n', strip out second line so first can be
output.Restore 'one\n' line from hold space and read next
t again
b begin

Several other commands exist in sed, and are described in the GNU
sed info and man pages, among other places.

4. Awk, that other shell utility thing

Awk is named after its
creators, Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan.
According to the gawk (GNU awk) info page, awk programs are
"refreshingly easy to read and write" compared to programs written in
traditional procedural langauges.

Awk is described as
"data-driven", in that rather than a list of commands to perform on
data, awk programs are a description of data and actions to take based
on which descriptions are matched.

Awk programs consist of a set
of rules of the form ‘PATTERN { ACTION }‘, where the
pattern can be ‘BEGIN’, ‘END’, an extended regular expression enclosed
in /’s, an awk expression that is matched if it evaluates to a non-zero
value, nothing at all which matches everything, or a range given by as
‘PATTERN1, PATTERN2’. The pattern ranges are unlike those in sed in
that they may be repeated; after the end of a range is found, the
beginning may be matched again.

Most of your interaction with
awk will probably be with a small subset of its features. The most
commonly used awk command is ‘print’, usually used in conjunction with
awk’s field separation features. awk '{print $4}' would
print the fourth field of every line of input, so if you were to, for
example, run the output of ls -l through this tiny
program, awk would spit out a big list of group names.

course, awk is much more than a fancy cut.

4.1. Separation of Data

Awk views input as a sequence of
records, and it views records as a collection of fields. By default,
each line of input is a record, and each portion of a record separated
by whitespace is a field. The behavior for records can be changed by
setting the RS variable. RS is a single character (or no character, in
which case all input becomes one record), and by default is ‘\n’.
Field separation can similarly be modified by setting the FS variable.
An initial value can be given to FS on the command line, with the -F
flag. Depending on its contents, FS can be interpreted in three
different ways. By default, FS contains a single space (‘ ‘), which
means that leading and trailing whitespace is ignored, and fields are
separated by any number of spaces or tabs. If FS contains a single
character, the behavior is more like that of cut, in that each
occurrence of the FS character will start a new field, and if more than
one FS characters are adjacent, it will be interpreted as an empty
field. awk -F : '{print $3}' /etc/passwd would print out
the UID of every user, and is equivalent to cut -d : -f 3

The third mode for FS is when it contains
more than character, in which case it is interpreted as an extended
regular expression. Field separators are then matched starting from
the left, and using the longest possible non-empty string. The fields
are whatever is left in between.

Fields can be accessed using
the $number> variables, and the entire record can be accessed
using $0.

4.2. Examples

Rather than
cover every detail of awk syntax, which would be rather long and
boring, I’ll just go over a few examples. If you want to learn more
about awk, the gawk man and info pages have a complete description of
what awk can do.

Suppose you have a directory listing from ls
-l, and you want to know exactly how many bytes are being used by the
files. Awk can do this, simply by taking the sum of the 5th fields of
each record (which in the case of ls, would be the file length).

Recall that ls -l output looks something like this:

total 28
2003-05-19 22:31 directory/
2003-05-19 22:31 file1
2003-05-19 22:31 files

Variables can be treated as either
strings or numbers, depending on the context, and conversions are made
automatically, so we can use $5 in a sum simply by adding to it. If
something like $4 were used in a sum instead, it would be converted to

So we can take the sum of the lengths with the following:

BEGIN { total = 0 }
{ total = total + $5 } # 'total
+= $5' would also work
END { print total }

So just assigning
to a variable will cause it to spring into being. The BEGIN statement
could be omitted, since the value for an empty numeric variable is zero
(this can also be seen as unassigned variables being equal to the empty
string, and the emptry string being converted to 0 when used as a
number). For the first record (total 28), $5 is also equal to 0, since
the fifth field of this record is empty.

Also, note that
variables are referenced only by their name, instead of "$name" as in
Bourne shell and some other scripting languages. If $total were used
instead of total, awk would take the current value of total as a
number, and then try to interpret that as a field number.

above program may still not be what you want, since directories are
included in the sum as well. Those can be easily eliminated through
pattern matching.

/^-/ { total += $5 }
print total }

This will only add the file’s length to the sum
if the record begins with ‘-‘, which would mean it is a regular
file. Similarly, a pattern of


would only omit directories.

such as this can become cumbersome if only a specific field
matters, so matches may be made based only on a particular field.
Suppose you want only the files owned by the user root.

$3 ~ /^root$/ { total += $5 }
END { print total

The ~ operator will result in true if the awk expression
on the left matches the regular expression on the right. !~ can
be used for the opposite.

And for one last example, let’s
throw some numeric and string tests. Same situation as before,
but now we only want to consider the length of the file if it is
greater than 1024, but not if the user’s name is more than 5
characters long.

($5 1024) &&
(length($3) = 5) { total += $5 }
END { print total }

5. Further Reading

Filtering UCE with Bogofilter

Table of Contents

1. Introduction

Bogofilter is a nice UCE filtering tool that uses Bayesian statistics
to track messages and learn what to detect. Bogofilter was originally
written by Eric S. Raymond.

Some sites of interest:

The above sites include the Bogofilter home page and a couple of sites
that discuss Bayesian statistics. Caution, the sites contain math,
which may or may not be desirable in your case. If you prefer, just
consider Bayesian statistical calculations "magic". That’s what I do.

2. Why use Bogofilter?

There are now several products available that do what Bogofilter does.
SpamAssassin and SpamBayes are two popular ones. So why choose
Bogofilter? Bogofilter is written in C, which means it is slightly
more robust when it comes to execution. Other filters are written in
Perl and Python, which offer other advantages, but speed usually isn’t
one of them.

I like Bogofilter simply because it’s small and doesn’t rely on a lot
of external support software.

3. Installation

Bogofilter is super easy to install. The project appears to be
offering RPM packages now. If that floats your boat, download the
package and you’ll be up and running.

If you suffer from the Not Compiled Here problem like me, grab the
source and compile and install it:

gzip -dc bogofilter- | tar -xvf -

cd bogofilter-

./configure --prefix=/usr/local


make install

For those that like to compile things, but the above steps appear
scary, grab the source RPM and let RPM compile it for you.

4. Seeding the Filter

Bogofilter scores spam and stores the results in two databases: the
good list and the spam list. These are BerkDB files that grow as you
use bogofilter. You must seed bogofilter for it to be useful. There
are several ways to do this. Manually can be painful. Getting a DB
dump from another bogofilter user is handy. If you get your hands on
other DB files, you need to dump them to text first and then load them
on your system:

# On the source machine

bogoutil -d goodlist.db


bogoutil -d spamlist.db


# On your machine

cat goodlist.txt | bogoutil -l goodlist.db

cat spamlist.txt | bogoutil -l spamlist.db

Using another set of data for your seed may or may not be a good idea.
Be sure to think about this before doing it. Ideally you should seed
your particular bogofilter installation with UCE that you have
received. To seed bogofilter by hand, take your mbox file (or
collection of email files) and pipe them through bogofilter with the
-s option if it is spam, -n if it is not spam. The formail(1) tool
is handy for doing this.

5. Wrapper Script

I use a wrapper script to invoke Bogofilter which currently just forces
the configuration path. At one point in time, it was forcing some
other settings. I still use it, and it is simply:


/usr/local/bin/bogofilter -d /usr/local/etc/bogofilter/ $*

exit $?

The script is root:root and 0755.

6. Procmail Modifications

Bogofilter hooks in to procmail with ease. The man page for bogofilter
gives a good procmailrc example. Here’s what I do:




# Scan for spam


| /usr/local/bin/spamfilter -u -e -p

# Return mail to queue on bogofilter failure



# Place in SPAM mbox if it's spam


* ^X-Bogosity: Yes, tests=bogofilter


7. Mailer Modifications

The man page provides some macros for Mutt that let you handle UCE That
bogofilter didn’t catch. I have Mutt configured so that if I hit
Esc-Del, the message is forced through bogofilter flagged as spam.
Pressing just Del will delete the message.

8. Global Installation

Global installation can be done several ways. No special steps are
required other than just installing hooks in the global procmailrc
file. If you want users to be able to train bogofilter with spam that
wasn’t caught, you will need to make bogofilter setuid root or create a
user and/or group that bogofilter runs as and change the database files
to that user/group.

I recommend that you install bogofilter under your account only rather
than globally.

9. Upgrading Bogofilter

From time to time you will want to download and install a new version
of bogofilter. The authors make upgrades easy with the bogoupgrade
command. This command upgrades your data files. You still need to
compile and install the new version, but they always provide a tool to
upgrade the BerkDB files. Be sure to check the man page for details.

10. Resources

Server Monitoring With Mon

Table of Contents

1. Introduction

This is a quick and dirty introduction to Mon, a server monitoring tool that can
be used to monitor any number of services running on any number of servers. Mon
is useful for a system administrator who needs to be notified as soon as a
server or network resource goes down, so that he can respond immediately and
have as little downtime as possible.

Mon’s strategy is to be highly modular, thereby allowing you to write whatever
monitors (programs that check a service’s status) and alerts (programs that let
you know when a service has gone down or come up) tickle your fancy.

Don’t worry if you don’t want to or know how to write such scripts, though,
because the likelihood is that someone has already contributed the monitor or
alert script you’re looking for. For the purposes of this introduction, I will
assume we don’t need to write our own. Otherwise I’d need to spend more than 30
minutes writing this presentation! Besides, if you’re at that level already, you
can just read the mon manpage and figure it out for yourself.

Mon can also listen as a server daemon on a particular port, which allows other
computers running some form of the mon client to contact the mon server when
something running on it is down, so that mon can do whatever it has to do to let
you know. This feature is called event trapping (or "traps" for short). This is
also beyond the scope of this presentation, but is not too difficult to

2. Getting Mon

The first thing you’ll need to do is download mon. You can find it at:

Once you have gotten the mon tarball, follow these steps:

  1. untar it to /usr/local/lib/mon (trust me, it kind of assumes that you will be
    putting it there).
  2. Move the contents of the etc/ directory into /etc/mon
  3. mkdir /var/state/mon for the mon state information
  4. touch /var/state/mon/disabled

Now you’ll want to download all the user contributed programs. There are the

Download them all into /usr/local/lib/mon and untar them into the directories
they untar into. Easy enough. Anyway, once they’re untarred, and you’re still
in /usr/local/lib/mon, you should move the resulting files as follows:

# mv monitors/*/* mon.d
# mv alerts/*/* alert.d

3. Configuring Mon

Now you’re ready to copy /etc/mon/ to /etc/mon/ and edit
/etc/mon/ This file is laid out as follows:

  1. Global options
  2. Hostgroup definitions (assigning names to sets of hosts)
  3. Watch definitions (defining what will be monitored for each host group)

Each watch definition consists of any number of service definitions. A service
definition defines one service type that you will be checking on the current
host group.

Each service definition consists of:

  1. Various service options, including the frequency with which to check, the
    monitor program to use for the check, and a description string for the
  2. One or more period definitions that dictate how to behave if the monitors
    fail during various times of the day or week

Each of these period definitions consists of various options such as what alert
and upalert programs to use, and with what options, as well as options that
dictate how frequently to notify you if the service remains down, or how many
failures must occur before the alert is sent.

The cool thing is that instead of using /etc/mon/, you can call it
/etc/mon/mon.m4 (and make sure to start mon with the "-c /etc/mon/mon.m4"
option), and mon processes the file with m4 before processing the mon directives
in the file. This can be useful for DEFINEs, so you can keep from having to
write the same email address, pager number, time interval over and over
throughout the file, but instead use the DEFINEd version of it. See the included
example.m4 for a good example of this.

4. User contributed scripts

There are so many user contributed scripts, that I figured I’d list them here so
you could see all of them.

The monitors:



































































The alerts:














Additionally, there is a cgi program in the cgi-bin package to create a status
webpage that the average joe can grok (and even use it to modify mon’s
parameters while it’s running). There is also a GUI configuration utility in the
utils package, but it uses Perl/Tk (yuck), and is not too good anyway, but is
worth a try.

Anyway, I have about -3 minutes left to write this, so here comes the example
configuration, and you can probably find whatever else you need in the manpage!

5. Example configuration

this example configuration.

# Global Options
basedir= /usr/local/lib/mon
alertdir= alert.d
mondir= mon.d
cfbasedir= /usr/local/lib/mon/etc
dep_behavior = m
dep_recur_limit = 10
dtlogging= no
histlength = 100
logdir= /var/log/mon
maxprocs= 20
pidfile= /var/run
randstart= 60s

# Host Groups
hostgroup dns

hostgroup ftp ftp linux2 craq01 archer

hostgroup nntp news

hostgroup pop3 mail

hostgroup smtp mxqmail2 mxqmail1 mail craq01

watch dns

service dns
description Check DNS services
interval 1m
monitor dns.monitor -zone -master
period wd {Sun-Sat}
alert mail.alert

upalert mail.alert
alertevery 30m

watch ftp
service ftp
description Check FTP servers
interval 5m
monitor ftp.monitor -p 21 -t 20

period wd {Sun-Sat}
alert mail.alert
upalert mail.alert
alertafter 2

watch nntp
service nntp
description Check that news server is up

interval 1m
monitor nntp.monitor -p 119
period wd {Sun-Sat}
alert mail.alert
upalert mail.alert
alertevery 30m

watch pop3

service pop3
description Check that the pop3 server is working
interval 1m
monitor pop3.monitor -p 110 -t 20
period wd {Sun-Sat}
alert mail.alert

upalert mail.alert
alertevery 30m

watch smtp
service smtp
description Check mail sending
interval 1m
monitor smtp.monitor -p 25 -t 20

period wd {Sun-Sat}
alert mail.alert
upalert mail.alert
alertevery 30m
alertafter 2

watch routers
service ping

description Ping our routers
interval 1m
monitor fping.monitor
period wd {Sun-Sat}
alert mail.alert
upalert mail.alert

alertevery 15m
alertafter 2

6. Resources

Encrypting and Signing using GPG

Table of Contents

1. Why encrypt?

As more and more communication moves into the digital realm, privacy tends to erode in favor of convenience, even though the expectation of privacy often remains. Just as one does not expect a physical letter to be read by every postal worker through whose hands it passes, an email isn’t expected to be read before it reaches its destination. Unfortunately, this often is not the case. In the United States efforts are continually being made to remove privacy from electronic mail in the name of crime prevention through devices such as Carnivore and laws like the Patriot Act. Even for those with nothing in particular to hide, privacy in conventional email is quickly dwindling. Encryption, even if privacy and security are not entirely necessary, helps privacy to seem normal again.

A form of encryption, known as signatures, can be useful even for times when the information itself is not private by confirming the sender of a message. Signatures are also often used in software distribution to verify the integrity of an archive.

2. A little history

PGP (Pretty Good Privacy) was written by Phil Zimmermann, with the original intent of aiding political activists and human rights organizations in keeping their communications away from the eyes of government organizations. In 1991, before PGP was yet released, Senate Bill 266 was drafted and included a measure that would require all telecommunication companies to provide the government access to any communication in unencrypted form, which would have effectively made software like PGP illegal. In order to subvert this measure before it became law, PGP was rushed out the door, and quickly posted to several USENET groups and BBSs across the country, and from there it became the most widely used encryption tool all over the world.

Since PGP used encryption that the government classified as ‘military strength’, and since it was obviously in use outside of the United States, Phil Zimmermann was charged with violating restrictions on the export of cryptography software. In order to protect the continued availability of PGP, the source was published in book form in 1995 (ISBN 0-262-24039-4). The case was dropped in 1996.

In 1997 the Free Software Foundation, liking things to be more free than the average free thing, released the GNU Privacy Guard (GnuPG), which was intended to be a completely free (GPL) implementation of the OpenPGP standard (RFC 2440) and use no patented algorithms.

3. Overview

GnuPG uses public key cryptography in the encryption of its messages. In public key cryptography, every key is actually a key-pair, the ‘public’ key, which is distributed to the world, and ‘private’ key, which is kept private. Messages are encrypted using the recipient’s public key, and can then only be decrypted using the private key. Since the public key does not need to be kept secure, the need for a secure channel to exchange keys is eliminated, but the need for some means to verify keys remains.

So, let’s say that Jim wants to send an email to his friend John, who lives down the hall, but he wants to encrypt it, so that his roommate, Bill, won’t be able to read it. After typing his email, he encrypts it using John’s public key, and then sends it on its merry way. John, upon receiving this email, decrypts it using his private key, and then reads whatever dumb thing Jim had to say.

In this example, Bill would be unable to read the email without walking down the hall and looking over John’s shoulder, but there is no way to verify that Jim was the one who sent the email. Since everyone should be able to get a copy of John’s public key, Bill could have written an email pretending to be Jim, encrypted it, and sent it to John. A solution to this problem is signatures. After Jim types his email, instead of encrypting it right away, he creates a one-way digest of the email, encrypts this digest with his private key, appends the encrypted block to the end of the email, encrypts the whole thing with John’s public key, and then hands it off to the carrier pigeons. The signature block can by decrypted with Jim’s public key, and then the digests can be compared to ensure that the signature matches the message. Since only Jim has a copy of his private key, only Jim can create this signature, so it ensures the identity of the sender.

In this example, key verification is simple, Jim and John both know each other and can meet in person to exchange keys. But suppose there’s another person at the far end of the hall, Bob, and neither Jim nor Bob wants to walk all the way down the hall to the other. However, they both know John, and have exchanged keys with John. Instead of leaving their computers and moving, they can use John to verify keys. After exchanging keys with both Jim and Bob, John signed them using his key. So now Bob can simply email a copy of his public key with John’s signature to Jim, and Jim can verify the key using the signature and John’s public key. If Bill were to intercept the email and send a fake key to Jim instead, then the signature would be missing.

So now Jim has a copy of Bob’s public key that he knows to be valid, even though he has never met Bob in person. Jim can now add his signature to Bob’s key, effectively telling all people who trust Jim that this is a valid copy of Bob’s public key. If Jim trusts Bob to have the sense to securely verify other keys, than Jim can also use Bob’s key to verify more public keys that Bob claims are valid. Using this method, a web of trust can be built so that a large number of people can exchange keys securely without going through the trouble of actually meeting each other.

4. How to get started

Note that gpg should be installed setuid root. Gpg uses the root capabilities to mlock pages that contain unencrypted private keys, to prevent private keys from being stored unencrypted on your hard drive.

So you’ve installed a copy of GnuPG and you’re ready to get started. First you’ll need a key. In order to be difficult to guess, gpg needs a big pile of random data to use in the key generation. In linux, this data comes from /dev/random, which may empty during key generation. This may be a good time for a kernel compile and a game of pysol.

To generate a keypair, run

gpg --gen-key

gpg will then ask you several questions about what sort of key you want. The default type and keysize are most likely what you want. Larger keys may be more secure, since they should be more difficult to guess, but are more often simply a waste, since after 2048 bits or so, the digest and encryption algorithms become the weaker links.

Once you’ve configured your name and email address to use in the key id, gpg will ask you for a passphrase. A passphrase is sort of like a password, but longer. Your private key will be encrypted using this passphrase, so you want it to be secure, but not so long that typing it would become a nuisance.

So now you have a keypair. An easy way to make it quickly available to the world is through the magic of keyservers. There are a wide variety of keyservers to select from, most of which mirror each other. I use To upload your key to a keyserver, run

gpg --keyserver --send-keys "your name"

If you prefer to send your key by hand, you can use gpg --armor --export instead to have your key dumped to stdout. The –armor, for ASCII armored output, is important. ‘Armor’ is a bit misleading, since it isn’t really any more secure, it’s just uuencoded. Without it you’ll end up with a big wad of binary data dumped to your terminal.

Keyservers are a convenient way to send and receive and keys, but still no substitute for verification. Keyservers do not authenticate or verify keys, leaving this task up to the users. They do save the trouble of carrying a whole key around in your wallet to trade with people, since now other people have something that they think is your key, and that can be verified using the fingerprint. The fingerprint is a 160bit secure one-way digest of the key, and a bit more manageable than a 1024bit random number.

5. Encrypting and signing your email

Many email clients, such as mutt, include support to encrypt, decode, sign, and verify messages using gpg. Mutt is not configured to use gpg by default, but comes with a gpg.rc example file which contains the needed options.

If properly configured, your email client be able to encrypt and sign emails before sending (‘P’ in mutt) and decrypt and verify incoming email, asking for a a passphrase as needed. Basically the only thing to consider are how long, or if at all, the client should cache a passphrase if it is capable of doing so, and, if a copy of outgoing email is kept, whether it should be saved in encrypted form. With mutt, the fcc_clear option will save the email unencrypted and unsigned. Encrypted emails can be saved, but you will need to add yourself to the recipient list if you ever plan to read it again. This can be done either by adding –encrypt-to name> to the gpg encrypt command, or by placing the same, without dashes, in ~/.gnupg/options. Other long options can also be automatically used in this way.

6. Encrypting and signing other random bits of your hard drive

Encryption can be done using

gpg --encrypt --recipient name file

Multiple recipients can be given. gpg will then spit out file.gpg, or file.asc if you requested ASCII output.

Signatures can be created either as a separate file, or as part of the data itself. Detached signatures are usually more useful, since the original data is available without using gpg, and can be created with

gpg --detach-sign

Inline signatures can be created with --sign (the data and signature is saved to file.gpg or file.asc), and may be combined with --encrypt.

Decrypting and verifying usually requires no options, since gpg can figure out what to do from the file.

7. Key management

Most key management can be done from the interactive key edit menu, accessible through

gpg --edit name

Since it’s important to know how useful a key is in verifying other keys, gpg keeps a trust database that contains information on how much you trust other users to correctly sign keys (the "ownertrust" value), and from this calculates a key’s validity. You should only sign a key yourself if you are absolutely certain of the validity of the owner, so many of the keys that you consider valid for your own security needs may not be signed by you.

gpg uses four degrees of ownertrust, not at all, marginally, fully, and ultimately. Multiple marginally trusted are usually needed to consider a key valid (default is 3, tunable with --marginals-needed), and by default only one completely trusted signature is required (tunable with --completes-needed). Ultimately trusted keys are always considered valid signatures, and usually used only for your own key.

If a key becomes compromised, you need to revoke it, using a revocation certificate. The certificate can be generated using

gpg --gen-revoke name

which creates a certificate signed with the private key. This certificate can then be applied to the public key and redistributed, telling everyone that the key is no longer valid.

You may want to generate this certificate after creating the key and keep it in a safe place, in case you lose your private key or forget your passphrase. Care should be taken in the security of the revocation certificate, since anyone who gets a copy of it can invalidate your public key.

8. Resources


Table of Contents

1. Introduction

Setting up and hosting your own domain name is easy! This document will
hopefully give you all the information you need to set one up using BIND 9.

2. The Domain Name System

If you already know how DNS works, you can skip this section. It is intended
as a brief introduction to the complex behavior of the DNS system in general.

When you perform a DNS query, you are attempting to find the/an IP address
associated with a particular domain name. You as an end user generally have
two DNS servers that you always use to perform lookups. These are generally
run by either you or your Internet access provider, and in most Unix-like
platforms, are listed in /etc/resolv.conf.

When you send a query to your local nameserver, it first looks to see if it is
hosting the domain about which you’re requesting information. If it is, it
simply looks up the information locally and sends the response to you.

If it doesn’t host that domain, it looks to see if it has recently answered a
query for the same record as you are requesting. If it has, and the
Time-To-Live on the cached record data has not yet expired (meaning the record
is still current enough to be accurate), it then just returns the answer from
its cached data.

If it doesn’t have the data cached, and it is set up as a recursive nameserver
(one that will perform the subsequent requests on your behalf instead of
making you do it), it must fetch an authoritative answer. Since it is
impossible to know what the authoritative server is for a particular zone,
that information must be maintained centrally. The servers that store this
information are known as the "root" nameservers. There are currently 13 of
them, with DNS names of A.ROOT-SERVERS.NET through M.ROOT-SERVERS.NET. If
these servers were all to go down, the Internet would become practically

Once your local DNS server has queried a root nameserver for the requested
domain’s authority information, it is given a (short) list of servers that
will be able to return authoritative data for the domain under which your
requested record resides.

It then sends a query to the authoritative servers in turn until it receives
an answer. It then relays the answer to you.

The highly distributed nature of DNS can cause many headaches when you need to
make any changes to a domain’s authoritative servers or any of its records.
Changes can take days to propagate, and knowing how to minimize your downtime
is very important.

3. DNS Hosting Requirements

If you wish to host your own domain name, there are some prerequisites that
must be satisfied.

First, your DNS servers must have completely static IP addresses. Any change
to your IP address will make your domain name completely useless.

Second, you must have one server to function as the primary nameserver, and at
least one separate machine to host the secondary nameserver(s). These are the
servers that will answer DNS queries for your domain in the event that your
primary DNS server is unavailable.

Usually, one of your friends will be willing to be the secondary nameserver
for your domain in return for the same from you. If you don’t have any
friends, you can use which will be your backup name server
for free.

4. Setting it all up

The steps involved in setting up your own domain name are as follows:

4.1. Register your domain name with a domain name registrar.

You can use any registrar such as,,, or The domain name will cost you
between $8 and $20 per year depending on where you register it.

Some features to look for with a registrar are the ability to make ALL
changes online, especially your administrative and technical contact
information and the name servers registered as authoritative for your

4.2. Register your DNS servers with your registrar, if necessary.

Many registrars require that you register your name servers’ IP addresses
with them before they allow you to use them for your domain. Follow the
instructions on their website to set these up.

You may need to have the DNS server already running on the IP address
before registering it with them, but in most cases, you can just do it
right now. If they request a name for the DNS server, you should use
something like and

4.3. Register your DNS servers as authoritative for your domain.

Once you’ve registered the DNS servers, you will need to edit your domain
information and change the name servers listed on there to and (assuming that’s what you named

It will now take between 24 and 48 hours for this new domain to be
propagated to the root nameservers, so you will only be able to use it
from your own machine for a while, provided you have set your own
machine’s primary DNS server to in /etc/resolv.conf.

4.4. Install BIND on your host(s) and perform the basic configuration.

I won’t go into the specifics of how to perform the basic installation
(It’s pretty much a straight ./configure && make && make install)
You can download the BIND 9 tarball here:

If you don’t like building from a tarball, you can probably find a package
of BIND made just for your Linux distribution.

4.5. Perform basic configuration

Once you have BIND installed, you will need to create /etc/named.conf.
There are some example configurations in the ARM (see Resources section).
I will go over the basic options that you’ll need to get a more-or-less
typical setup working.

The named.conf syntax is very rigid, but pretty straightforward. Mostly
everything is either just a statement followed by a semicolon, e.g.:

file "";

or a block-style option, which requires a semicolon after the closing
brace, as well as after each sub-block’s closing brace, and after each
simple statement it contains:

allow-transfer {;;

The most basic configuration, which will probably suit most needs just
fine, starts out with an options block like this:

options {
/* Base dir; where to look when we see a relative pathname */
directory "/var/named";

/* Write a pidfile on startup so we know the pid of named */

pid-file "/var/run/";

/* Allow anyone to perform DNS queries to this server */
allow-query { any; };

/* Perform recursive queries for clients. This helps build a cache
* of DNS data, so that fewer external requests are generated in

* the future. */
recursion yes;

/* What interfaces for BIND to listen on */
listen-on {;; };

/* When a change is made to any of the zones for which we are the
* master server, notify the slaves of the update immediately */

notify yes;

Now we need to specify the base zone "." which is used when we need to
make external requests.

zone "." in {
type hint;

file "root.hint";

The file root.hint is found in /var/named/root.hint since we specified
/var/named in the directory directive above. The "type hint;" directive
indicates that this zone file is a listing of the root nameservers, which
provide a "hint" to point you to the authoritative server for a domain.
The root.hint file looks like this. You never need to modify it:



; continue in this fashion...



Now we need reverse resolution for the 127.* range of IP addresses. The
following zone will do it:

zone "" in {
type master;
file "";

allow-update { none; };

The backwards "" system is used to reverse resolve IP
addresses to domain names. It’s a long story why this is the case, but in
a nutshell, it is so that forward resolvers don’t have to be entirely
rewritten to perform reverse resolution as well.

The zone file is very simple, only listing a record
for, or to resolve to "localhost":

@INSOAlocalhost. root.localhost.(

1997022700 ; Serial
28800; Refresh
14400; Retry
3600000; Expire
86400 ); Minimum


I will go over the meaning of these records in more detail when we get to
the section on configuring our zone files.

Finally, let’s set up forward resolution for the "localhost" hostname:

zone "localhost" in {
type master;
file "local";
allow-update { none; };

The "local" file is also very simple; it maps "localhost" to

$TTL 21600
@ IN SOA @ root.localhost. (

86400 )
@ IN NS localhost.
@ IN A

Finally, we must set up a shared key to allow the use of the "rndc" tool
to control the server. The reason it must use a key is that rndc can be
used from any host on the Internet, so we must be able to verify that it
is authorized, as well as be able to encrypt its data.

Before we can add the shared key section to the named.conf, however, we
must, run the rndc-confgen program that comes with bind. This program
generates the content for /etc/rndc.conf for you automatically, with a
randomly generated key. Its output will look something like this:

# Start of rndc.conf
key "rndc-key" {
algorithm hmac-md5;
secret "OpxZPKpwc5vNOCsD/rz9sw==";

options {
default-key "rndc-key";
default-port 953;
# End of rndc.conf

# Use with the following in named.conf, adjusting the allow list as needed:
# key "rndc-key" {
#algorithm hmac-md5;
#secret "OpxZPKpwc5vNOCsD/rz9sw==";
# };
# controls {
#inet port 953
#allow {; } keys { "rndc-key"; };
# };
# End of named.conf

In the commented section of this file, it provides you with the exact
blocks you need to place in named.conf to allow rndc to connect to it
using this rndc.conf. Simply copy these lines onto the end of the
named.conf and save it and exit.

You can now verify the syntax and validity of your config file using the
named-checkconf program that comes with BIND. Make sure everything is
kosher, and you should be ready to progress to the next section! But
first, take a caffeine break 🙂

4.6. Add the new zone to your primary server’s BIND config file (named.conf).

You now need to tell BIND that you want to host the primary DNS for This is done by adding a new "zone" block to the named.conf,
as follows:

/* New zone called, of type "in" (Internet) */
zone "" in {

/* We're the master server for this domain */
type "master";

/* Store the data in /var/named/ */
file "";

/* Who will we allow to transfer the entire domain? Normally, the
* only IP addresses listed here should be those of the secondary
* DNS servers for the domain */

allow-transfer {; //; //

At this point, you should be done with the named.conf on the primary DNS
server. You don’t need to do anything else to it. Run named-checkconf to
make sure the file checks out, and you should then be good to go.

4.7. Create and populate the new zone file.

Now’s the fun part. This is where you get to specify all the information
about your domain. Assuming you pointed your named.conf at
/var/named/ for the information pertatining to
(for which you are listed as the primary DNS server), you should now
create this file with your favorite editor, and proceed with me.

All BIND 9 compatible zone files must start out with a default
time-to-live that will be applied to all the resource records in your
domain. The global TTL specification looks like this:

$TTL 86400

This TTL specification says that for every record in your zone that does
not have a specific TTL set for it, that the TTL should be set to 86400
seconds, or 1 day. BIND 8 did not require this statment, but BIND 9 does.

Now we begin with the resource records. Each record specifies something
about the zone, and follows this general format:


The resource name is usually a hostname in your domain, or the domain name
itself. The protocol is usually IN. This is short for Internet, and
pretty much all records are of type IN. In fact, if you don’t specify a
protocol, BIND will usually assume type IN. The TYPE is usually one of
the following (there are others, but they are rarely used):

Type Description
SOA Start of Authority, some global settings for the zone
NS Specifies a Name Server for the domain
MX Specifies a Mail eXchanger for the domain
A Specifies an IP Address for a particular host/domain
CNAME Specifies the canonical name for this "nickname" entry

The VALUE varies depending on the TYPE. You will begin to see how this all
works as we go over examples.

The first resource record (RR) that comes in every zone file is the SOA
record. It looks something like this: IN SOA (

200209272 ;serial
14400;retry delay


The resource name is "" since we are specifying the SOA record
for this domain, not one of its subdomains. Please note that we must
place a period after the domain name so that BIND doesn’t automatically
append "" to the end of it, resulting in a record for
"". This is a common beginner’s mistake. Any
fully qualified domain name (FQDN) that is specified _anywhere_ in a zone
file must be followed by a "." or BIND will append the "origin" domain
name to it.

Note that you can also replace any instance of the zone’s base domain
("") with an @ sign, so that a single zonefile could be used
for multiple zones that are copies of each other, and the @ will be
replaced with the proper domain name for each. So then the SOA record ends
up looking like this:

200209272 ;serial
14400;retry delay


The two values after SOA, are, respectively, the primary DNS server name
for this domain (note the period after it), and the email address of the
domain administrator (with a "." instead of the "@").

The serial number is what keeps track of the file’s version. DNS servers
around the world attempt to cache your domain’s information, and will
retrieve updated information from your server only if the serial number
has incremented since last time it checked. Therefore, if you make
changes to your DNS zone, you *must* increment the serial number or it
will not take effect until the number of seconds in the "expire" field has
passed, or until $TTL seconds have passed, whichever comes later.

It is conventional to use some form of the date (e.g. YYYYMMDDNN, where NN
is the number of the revision for that day, so you could revise it up to
100 times in that day). Other people simply like to start their serial
number at 1, and just increment by 1 each time. It’s all a matter of
personal preference.

The refresh, retry delay, and expire are almost always good at their
defaults. The default_ttl is the amount of time that negative responses
for data from your zone should be cached. That is, if someone requests an
IP address for and it doesn’t exist, their DNS
server will cache that negative response for 21600 seconds. This means
that even if you add to your zone within 21600
seconds, they will not notice until that amount of time has passed on
their end.

Now that you’ve specified your SOA record, you need to list the
authoritative name servers for your domain. This will use the NS record
type, and it will look something like this:



As you can see, we didn’t need to specify the @ for each record, because
if the resource name is omitted, BIND assumes you are still referring to
the last one mentioned.

If you want to receive email at your domain, you must now specify a mail
exchanger. This is the host to which an MTA (e.g. sendmail, qmail) will
send mail when there is mail for The MX record
specification looks something like this:



The number between the MX and the mail server hostnames is the preference
number. The lower numbered servers are tried first, and if they fail, mail
is sent to a backup MX — one with a higher preference number. This is so
that if the main mail server is down, mail can be sent to another mail
server that will hold the mail until the primary server comes back up.

Now you should specify an A record for the main domain name so when people
go to or ping, they get a valid IP

@IN A128.61.48.46

At this point, you’ve taken care of the basic settings for the base
domain. Notice that we have used several hostnames we haven’t yet
defined, such as and We’ll need to add
those domains below.

First, however, let’s set our origin to so we can be lazy
and specify only hostnames instead of full domain names:


Again, notice the "." after the domain name, as always.

Now we can proceed to specify our hostnames. One rule is that all
nameserver hostnames and mail exchanger hostnames must be defined by A
records, not with CNAME records. It just creates another level of
indirection for lookups, and is against standards.

mailIN A128.61.48.46
nsIN A128.61.48.46
myboxIN A128.61.48.46
lappyIN A192.168.0.2

"mybox" is a name we’re giving to the host at We want to show up when you look up the name will resolve to, so it is really only
useful inside my network.

Most domains have www and ftp names with it. Assuming we are hosting our
web and ftp service on box1, we can just make these records CNAMEs, so
there will be less switching around of records if we ever change IP


Okay, now say your friend Joe wants to have point to the
same IP address as his website so he can set up some virtual hosting.
Let’s say that is his normal website, and he wants us to
point there. What we can do to avoid having to change
our record every time he changes his IP address is simply create a CNAME
to point to his domain:

Again, we put a period at the end this time, since it is not a cname for

Now assume that jill decides she wants to have control of and all its subdomains; she’s going to run her own name
server for What we do, then, is delegate the zone to
her with an NS record:


That way, requests to and * will all
be redirected to her name server.

All right! It looks like everything is set up and ready to go! You can use
the named-checkzone program distributed with BIND to check the syntax of
your zone, and make any corrections necessary, and then you’ll be ready to
progress to the next step.

4.8. Start named!

Assuming you (or the named installation) created a nonprivileged user
"named" under which to run named, you’re ready to start named as follows:

named -u named

Then we set up our secondary server…

4.9. Add the new zone to your backup server’s named.conf

Now all you’ve gotta do is tell the secondary server that it’s
authoritative as a slave server for Assuming they’re running
BIND 9, you can accomplish this by adding a block similar to the following
to their named.conf:

zone "" in {

type slave;
file "com/mydomain";
masters {; //

On this server, the administrator likes to arrange the domain files in a
different structure, placing them in a hierarchy where each component of
the domain name gets its own subdirectory. So if their "directory" was
specified as /etc/named, then when their named is restarted, it will
transfer the zone from us and place it in /etc/named/com/mydomain.

Now reload the zones on the slave server (You can use "rndc reload") and
it should pull down the domain name from your main server.

Congratulations, you’re now hosting your own domain name! Now it’s time to
set up your mail and web servers… But I’m not going to tell you how to do
that here 🙂 That’s for another presentation!

5. Resources

Making Sendmail Work For You

Table of Contents

1. Introduction

sendmail is one of the most popular SMTP servers available. Other popular
ones include qmail and postfix. People stand behind their choice of mail
server like their choice of editor. I have used qmail and sendmail
extensively, but definitely understand sendmail to the greatest degree.
This presentation is aimed at someone who is interested in setting up
sendmail, but doesn’t understand how it works. I hope we can get some
volunteers to do presentations on other mail servers in the future.

I plan to explain in general how the UNIX mail system is designed to
work, how you can get sendmail up and running, a walk through the
configuration files, and lastly some examples of sendmail configurations.
I am by
no means a sendmail expert, but I understand enough of it to get it
working in several scenarios.

1.1. MTA, MUA, and MDA

The UNIX mail system follows the basic UNIX design principle, that is,
each program really only does one task. The basic components of the mail
system are:

  • MTA: mail transport agent (e.g., sendmail)
  • MUA: mail user agent (e.g., mutt)
  • MDA: mail delivery agent (e.g., procmail)

When you compose a message, you typically do so from within your MUA.
When you send the message, the MUA hands the message to the MTA. The MTA
reads the envelope and directs it to the appropriate system where it is
handled by the MDA for delivery. The MTA is a very important part of the
mail system, so this document is mostly about configuring a popular MTA:

2. sendmail

sendmail is one of the most popular MTA packages available. Other popular
ones include qmail and postfix. Great wars have been fought over which
one is really the best one to use, but in the end it doesn’t really
matter. This document covers sendmail, but I hope future presentations
cover the other MTA choices.

2.1. What and Where

sendmail was written by Eric Allman at UCB for the BSD UNIX operating
system. It has been ported to almost every platform in existence. Most
Linux distributions and commercial UNIX operating systems include sendmail
in one form or another. Linux distributions generally use the latest
releases from, while commercial distributions lag behind
since they like maintaining their own forks of the sendmail source tree.
Whatever system you’re using, if you plan on using sendmail, check to make
sure it is the latest available. If it isn’t, head over to and download the
latest version. Documentation is
included for compiling it on your system.

2.2. How sendmail should run

In the ideal environment, sendmail runs on each machine handling local
mail transport and delivery, as well as talking to a main SMTP node. A
lot of people like this setup, since it allows for mail composition from
almost anywhere. But, unless properly configured, your mail spool ends up
on several machines. The common way to overcome the multiple mail spool
problem is to keep your mail spool on an NFS mount. In my opinion, this
is totally unnecessary. Most people agree and end up with…

2.3. How most people run sendmail

Designate a system as your mail server. Configure sendmail to deliver
mail on this system. Mail composition and reading can be handled through
hear. If remote access is desired, allow POP3 or IMAP access to mail
spools. This configuration is much more desirable than the other

2.4. Check your sendmail version

Sendmail is usually installed as /usr/sbin/sendmail or /usr/lib/sendmail.
Some bizarre systems may even install it as /usr/etc/sendmail. If
sendmail is running on your system, check the version by connecting to the
SMTP port:

You can hit the escape sequence to drop back to the shell. Or simply type
‘quit’ and the server will close the connection.

2.5. Installation

Sendmail is pretty easy to build and install by hand. Most all Linux
distributions include it, except Debian. Check and see if you have it
already or if there is a package available. It will save time.

If you must compile and install it by hand, follow these steps:

  1. Download the source from I reference version 8.12.7
    here, which is the latest version available. Be sure to use the latest
    version of sendmail available.

  2. Extract
    gzip -dc sendmail.8.12.7.tar.gz | tar -xvf -
  3. Compile

    cd sendmail-8.12.7
    # Follow the steps in the INSTALL file, which walk you through compiling
    # sendmail and setting up your configuration files.

2.6. Ties with procmail

The sendmail software can be combined with procmail which results in a
nice system for mail delivery on top of sendmail. Procmail is a topic for
an entirely different presentation. It’s worth noting here because if you
are installing sendmail from scratch, be sure to get procmail. You will
most certainly want it. Moshe can answer your procmail questions at all
hours of the night. That’s why he has a pager.

3. The configuration files

Sendmail reads several different configuration files to figure out what
it should be doing. These files are explained below. This list is not
complete, but covers the most common files you’re likely to encounter.

3.1. Files


The file is your main sendmail configuration file.
Technically the program reads and not Since is not modifiable by humans, we write the mc file and run it
through m4 to generate the file. The configuration elements
you put in your mc are actually m4 macros that get expanded to the real
configuration elements for sendmail.

Why is it done this way? Well, sendmail was designed to be somewhat
like a machine. It doesn’t really know what to do except load
and "execute" it. So think of as an embedded language that
drives sendmail. It may seem silly, but that’s because it is. Sendmail
came out of an era where computing resources were much more scarce, so
saving time and making the most of what you had was important. I don’t
care at all about my file. I only edit my mc file and have it
automatically generate the cf file. I think the mc file should be named
.cf, but it isn’t, so we get this layer of confusion.

3.1.2. aliases

Ever emailed a webmaster@something email address? It’s fairly common.
Try to make a user account with the name ‘webmaster’. It won’t happen,
usernames are limited to 8 characters. The way we get the webmaster
address is by using an alias. The aliases file maps email address aliases
to something, usually a real user account.

Sendmail doesn’t directly read the aliases file, it reads the
aliases.db file. This is a BerkDB format of aliases. Again, the
historical reasoning comes in to play here. Each time you modify the
aliases file, you need to run newaliases to update the BerkDB

3.1.3. access

This is a plain text file listing host access rights to the server. The
default policy of sendmail is to accept mail locally or for domains that
it is specifically configured for. However, if you get a steady stream of
spam from a specific host, consider listing that host in the access file.
For example, the entry: ERROR:"550 Korea is a gigantic spam house; go away"

Denies all mail coming from servers on the domain. The error
message defined is returned by sendmail and typically written to the log
files on their end.

3.1.4. genericstable

This provides the outbound name to virtual address mapping, that is, the
reverse of what the virtusertable does. For a proper virtual domain
configuration, you will need to configure this file as well as the
virtusertable (described below).

This is a file that must be compiled to BerkDB format before sendmail
can read it.

3.1.5. mailertable

This file contains custom domain routing information. You may wish to
specifically route all email to addresses on the domain through
a different SMTP server. This is the file where you define that.

This file must be compiled in to a BerkDB file that sendmail actually

3.1.6. relay-domains

This is a plain text file that lists individual hosts or ranges of hosts
that are allowed to relay mail off your server. You’ll need this if you
want to be able to use your mail server as an SMTP server when configuring
a program like Evolution.

3.1.7. virtusertable

This file maps usernames from one hostname to a real user or another
hostname. This file is used to set up virtual domains and virtual

This is another file that must be converted to BerkDB format before
sendmail can read it.

3.1.8. local-host-names

This file lists the domain names that you are delivering mail on. If you
own and want to accept AND deliver mail for that domain, you
need to put in the local-host-names file.

This file sometimes differs in name across various distributions. Red
Hat used to (or still does) call it Some simply call it
locals. The format and purpose are the same, but the name may be

3.2. Inconsistencies

You may have noticed that among the sendmail configuration files there
are several inconsistencies. Namely, some files are plain text, some are
special format files (BerkDB), and some are macro processed files. This
is one of the things that bothers me about sendmail, but it’s not the end
of the world.

3.3. Message Submission Program

If you wish to have users remotely connect to your SMTP to send
outgoing email, you should consider using the message submission program
with sendmail. I won’t go in to the details here, but here is a link:

With MSP, you can configure sendmail to require user logins and
passwords to connect to the SMTP server.

4. Writing a sendmail configuration file

4.1. authoring

The format of the mc file is fairly simple. It is a list of m4 macros
and accompanying options. Generally there is one macro per line. Let’s
make a basic configuration file. We’ll start with the basic settings:

VERSIONID(`My very own')

We now have some basic settings, but we should add some features to the
mail server. First note the use of capital letters for VERSIONID, OSTYPE,
and DOMAIN. These are the m4 macro names. The values in the parentheses
are the options for that macro. Please take a note to ask me about m4
quoting if I haven’t already explained it.

Features are added using the FEATURE macro. I’m going to add some
common features to my mc file:

FEATURE(access_db, `hash -o -TTMPF /etc/mail/access')

We should define a couple of settings for the configuration file.

define(`confCW_FILE', `-o /etc/mail/local-host-names')

Lastly, we need to set some mailers for our system:


This configures sendmail for local mail operation and SMTP mail
operation. Now, there are a ton of other settings available for sendmail.
Rather than buying the Bat Book, I recommend you refer to the
documentation on

The Sendmail Consortium provides a nice HTML browsable copy of the
documentation that ships with sendmail. This is a very good reference.

4.2. Generating

Generating the cf file for sendmail uses the m4 macro processor. Some
distributions provide a Makefile in /etc/mail that automatically generates
the configuration file. If you don’t have this, you’ll need to run the m4
command by hand. You run m4 and pass it the path to the sendmail macro
directory and the path to the main sendmail configuration file macro and
your new mc file. That’s a mouthful. Here’s what you do:

m4 -D_CF_DIR=/usr/share/sendmail/cf \
/usr/share/sendmail/cf/m4/cf.m4 \

This assumes your sendmail m4 directory is in /usr/share/sendmail/cf.
It may be in a different location on your system. Sometimes you’ll find
it in /usr/src/sendmail. Once you run the above command, you’ll have a
ready-to-use file.

5. Starting and Stopping sendmail

5.1. The queue runner and sendmail

To start the server, I run these commands:

/usr/sbin/sendmail -L sm-mta -bd -q25m
/usr/sbin/sendmail -L sm-msp-queue -Ac -q25m

This starts the sendmail MTA as well as the queue runner. Your
distribution probably includes some form of a script that runs the above
two commands in a dozen lines or so. If a script is provided in
/etc/init.d, you should use that.

To stop sendmail, I issue this command:

/sbin/killall sendmail

Sendmail reacts to signals in a normal manner and when it is sent SIGTERM
it will shut itself down.

6. Examples

6.1. Acting as a primary mail server

Acting as a primary mail server means we want to accept and deliver mail
for a specific domain. Assuming you have the proper MX records configured
according to Moshe’s BIND presentation, you’re ready to set up sendmail as
your primary nameserver.

In this case, all we need to do is add the domain to the
/etc/mail/local-host-names configuration file and restart
sendmail. Mail will be accepted for that domain and sendmail will deliver
it on the system.

6.2. Acting as a backup mail server for your best friend

As Moshe stated in his BIND presentation, sendmail is smart enough to know
if it is a backup MX server. If it determines it is the backup server and
the domain is not listed in local-host-names, sendmail will spool the mail
and try to pass it on to the primary MX at a later time.

You can run sendmail -qf to give sendmail a swift kick to
deliver any messages it has waiting for a primary MX.

6.3. Masquerading as another server

This idea of masquerading as another server is useful if you access the
Internet through a dialup connection. You can run sendmail locally and
compose messages locally, but sendmail will be configured to pass all
outbound messages to the server it is masquerading as. What you generally
don’t want is for sendmail to pass all mail, even local mail, to the other
server. The options below can be added to your file to enable

define(`SMART_HOST', `')
FEATURE(`genericstable', `hash -o /etc/mail/genericstable.db')

You should add localhost and the hostname you are
masquerading as to the local-host-names file.

The genericstable file can be used to map local user names to the
remote address name.

6.4. Running mail for a virtual domain

With a virtual domain configuration, you simply populate the virtusertable
and genericstable with the mappings for the virtual address to the real
user name. Be sure to set up the hostname and usernames first.

7. Resources

There is a wealth of great sendmail information on the Interweb and
probably on your own computer.

7.1. man pages

The commands and files associated with the sendmail system
ship with good man pages. Be sure to consult these when
you’re looking for an answer.


The Sendmail Consortium (not to be confused with Sendmail, Inc.)
supports the open source releases of sendmail. They provide a
lot of documentation, FAQs, security notices, and links to
other resources.


Not specifically part of this presentation, but it’s worth noting
that procmail ties in very well with the sendmail system. If you
will be using sendmail, consider using it with procmail to get a
nice mail system. Moshe is available to answer all of your
procmail questions.

7.4. Don’t Blame Sendmail

This page talks about problems people encounter and blame sendmail for.
Worth a look if you plan on running your own server.

Using Postfix

Table of Contents

1. Overview

Postfix is a mailer daemon maintained by Wietse Venema. It is an open source
project, and was created as an alternative to Sendmail. It is a drop-in
replacement for Sendmail, and includes a ‘sendmail’ wrapper that acts as
a Sendmail program, and allows programs to function with it as if it was
sendmail itself. More information, including FAQs and downloads are
available at

2. Why Postfix?

Postfix was designed from the beginning up with speed in mind. Postfix is
running on systems that send over 1,000,000 unique messages a day. Postfix
is also designed differently from Sendmail. Instead of one single, monolithic
binary, Postfix is split up into several smaller programs, which run at
lowered privileges, each of which handles a specific task. There are no
setuid binaries in Postfix, and only one setgid binary. What this means is,
there are less chances for root exploit through postfix.

Postfix is also much easier to configure than Sendmail. it uses a plaintext
configuration file, (no m4 needed) and can be reloaded on the fly without
shutting down.

Postfix also supports alias, virtualhosts, and other databases in several
formats, such as plaintext (hash), berkeley DB, or even MySQL. This provides
more options when setting up mail services for systems with many users and
multiple domains.

3. Installing Postfix

The postfix install is very simple. Most distributions provide packages
for postfix, and if they don’t, simply download the tarball (latest
stable version as of this writing is 2.0.4) and unpack it. then simply run

# make

to compile it. After compilation, make sure you back up your old sendmail:

# mv /usr/sbin/sendmail /usr/sbin/sendmail.OFF
# mv /usr/bin/newaliases /usr/bin/newaliases.OFF
# mv /usr/bin/mailq /usr/bin/mailq.OFF
# chmod 755 /usr/sbin/sendmail.OFF /usr/bin/newaliases.OFF \


Next, make sure you add a postfix user for postfix to run as.
Either add the line


to your /etc/passwd, or run

# useradd -d /no/where -d /no/where -s /no/shell postfix

Next, add a postdrop group. Note that, no user should be a member of this
group, not even the postfix user. This is the group postfix will setgid to
when doing mail delivery.

# groupadd postdrop

Now run:

# make install

This will run an interactive install script, and prompt you where to install
things. The rest of this guide will assume you install stuff to the default
install place.

3.1. Running Postfix

To start the postfix daemon simply run:

# postfix start

Likewise, to stop it, simply:

# postfix stop

To regenerate the aliases and virtual maps databases:

# newaliases

To force reload of the config file:

# postfix reload

To have all queued mail to be flushed out of the queue and attempt delivering
it again:

# postfix flush

4. Configuration

Postfix keeps all its configuration in /etc/postfix (unless you specified
otherwise during install) The main Postfix config file is A default
config file will be provided for you with comments. You only need to edit a
few thing and you will be ready to run Postfix.

Files: Main config file for postfix config for the master postfix process, sets limits on its
child processes and functions. This is safe to leave
aliases this file contains the username aliases. useful for having
multiple email addresses point to the same user.
virtualhosts similar to aliases, except this allows you to specify
aliases per-domain instead of just for one domain. Useful
when running multiple domains.

4.1. configuration directive style

Directives in popstfix are in the style key = value.
String values don’t need to be quoted. You can use
the value of one directive in another, prefixed by a $.
for example:

mydestination = $myhostname, mail.$mydomain

Lists of comma-separated strings can be extended to multiple lines,
simply by ending a line with a comma.

Boolean values are yes/no.

4.2. Common configuration directives

soft_bounce When set to on, soft_bounce will not bounce any emails,
rather keep them in queue. this feature is great for
debugging, when you aren’t sure if your config is
correct. Make sure you turn this off when you are
done debugging, else your queue will fill up.
queue_directory the directory where postfix keeps its spools, usually
mail_owner The user to run lowered privilege postfix processes, usually
myhostname The hostname of the mail server, given out in the motd. if
not specified, gethostname() is used to find your hostname.
mydomain your local internet domain (ie if your myhostname is , mydomain should be
myorigin the domain that is appended to mail that originates locally.
inet_interfaces set to all to listen on all interfaces, set to an IP or
hostname of an interface or interfaces to bind to otherwise,
comma separated.
mydestination domains to accept mail for, comma separated.
alias_maps this is a file or list of files that map username aliases.
types of files include dbm, hash (plain text), or even mysl.
for example:
alias_maps = hash:/etc/postfix/aliases,dbm:/etc/aliases.db
mail_spool_directory here the mail spool is kept. usually /var/spool/mail
mailbox_command if specified, you can choose another delivery agent, ie
procmail. to set procmail as your delivery agent, use:
mailbox_command = /some/where/procmail
smtpd_banner if specified, you can tell postfix to send an alternate
banner when you connect. ie:
smtpd_banner = $myhostname ESMTP $mail_name

5. Spam, or how I learned to stop worrying and love the RBL

There are several ways to block spam. You can pipe your mail through a spam
filter, something like procmail or bogofilter, or use an RBL. an RBL is a
list of domains to deny mail from, usually maintained by a third party. These
parties usually list known open relays and spammers. Some charge you, but a
good bit of them are free. Postfix has a smtpd_client_restrictions directive
where you can specify options for blocking. for example:

smtpd_client_restrictions = reject_maps_rbl, reject_unknown_client

The reject_unknown_client rejects mail if the postfix server cannot determine
the client’s hostname. This will not block many spam mails, but it’ll block

The reject_maps_rbl tells postfix to use an RBL list when deciding whether
or not to block a domain. you then use the maps_rbl_domains to specify what
RBL lists to use. ie:

maps_rbl_domains =

an RBL list is simply a DNS server that you can query with the domain name in
question. if the domain name matches, then that domain is in the RBL list,
and postfix will deny mail from that domain. More info on RBL’s can be found
by looking at the links at the bottom of this document.

Another thing you can do is specify a smtp_sender_restrictions directive. this
directive allows you to set what sender addresses are blocked. this is
another way of blocking common spam. You can specify any number of the
following restrictions, as well as a map or maps.

The restrictions are: (taken from the postfix sample configurations)

  • permit_mynetworks: permit if the client address matches $mynetworks.
  • reject_unknown_client: reject the request if the client hostname is
  • reject_maps_rbl: reject if the client is listed under $maps_rbl_domains.
  • reject_invalid_hostname: reject HELO hostname with bad syntax.
  • reject_unknown_hostname: reject HELO hostname without DNS A or MX record.
  • reject_unknown_sender_domain: reject sender domain without A or MX record.
  • check_sender_access maptype:mapname
  • maptype:mapname: look up sender address, parent domain, or localpart@.
    Reject if result is REJECT or "[45]xx text"

    Permit if result is OK or all numerical.

for example:

smtpd_sender_restrictions = reject_unknown_sender_domain,

6. Fun with lookup tables (maps)

Postfix supports a lot more than simply hash maps. You can do maps using dbm,
regular expressions (either PCRE or standard type), even mysql, or even ldap.

For example, you might want to use regular expressions in your access map.
you could tell smtpd_sender_restrictions to use
pcre:/etc/postfix/access-regexp as its map.

Then you create an access-regexp file like this:

### file start: /etc/postfix/access-regexp
# Protect your outgoing majordomo exploders
/^(?!owner-)(.*)-outgoing@/550 Use ${1}@${2} instead

/^friend@(?!my\.domain)/550 Stick this in your pipe $0

550 Asia is a big spam house, mail from .$1 is not allowed.
####file end

Sure, you could implement the pattern matching through procmail or some other
mail delivery agent, but the mail will still be accepted by the mail daemon
(which will prompt the spammer to keep sending mail at your box, since the
spammer thinks he got through. Implementing pattern matching at the smtpd
level means

7. Resources

Network Filesystems

Table of Contents

1. Overview of Network Filesystems

Network filesystems are used to allow a machine to access another machine’s
files remotely. There are a plethora of ways to do this in Linux. The most
common way today is NFS, however many people are shifting to different methods
because of either security or administration reasons.

2. NFS

2.2. Server Setup

To set up a server, one must have the portmap and nfs-utils installed
(see resources for links). To configure your exported filesystems,
edit the file /etc/exports. for example, if your fileserver BARNEY
wanted to share folders /mnt/mp3 and /mnt/work with system JOHN, and
also share /mnt/work with systems MARY and BETTY, with betty also
having read-only access to /mnt/mp3, your exports file would look like


# Shares on BARNEY
/mnt/mp3 JOHN(async,rw) BETTY(async,ro)
/mnt/work JOHN(async,rw) MARY(async,rw) BETTY(async,rw)

As you can see, you merely specify the directory you want to share,
and then follow it by the hosts you want to be able to access it (IP’s
are fine too) and the permissions for that user. Now you are ready to
share your files. start portmap, and then start the nfs service (rc
scripts are included with most distributions to do this)

2.4. Secure NFS over SSH

Okay, so that’s great, now what happens when John is on a business
trip in Paris and wants to mount his network share to grab some files
he forgot for a presentation he’s doing? It’s not very smart to set
up the NFS server to allow connections from outside IP’s, and even if
it was, it would be even stupider to actually access it, since all the
data goes in the clear, and can be inspected at any point by anyone
with half a brain. Well, John uses SSH to get into a secure shell, so
why not use it to forward nfs? Well, NFS uses UDP, and SSH can only
forward TCP, it does not know what to do with UDP datagrams. Enter
SNFS and sec_rpc. How does it work? you ask. sec_rpc basically
translates the UDP datagrams into something that SSH can forward,
and then translates them back on the other side. so you’re still
using NFS, but through a tunnel.

Here’s how you start out. on the server, you create an /etc/exports
file like so:


# SSH exports file
/mnt/mp3 localhost(async,rw)
/mnt/work localhost(async,rw)

Now that you’ve exported the filesystems to the local host, you need
to install sec_rpc on both the client and server machines. you do it
the standard autoconf way, ./configure; make; make install. so now on
BARNEY you need to be running nfsd like normal.

On John’s pc is where all the complex stuff is happening. John runs
at his PC:


Here MOUNTPROG is a six-digit number chosen between 200000 and
249999 such that both MOUNTPROG and NFSPROG=MOUNTPROG+50000 are
unassigned RPC program numbers (e.g. MOUNTPROG=201000,
NFSPROG=251000). REMOTE is the remote host name.

That will create the config file needed for the RPC numbers. now you
add a line to john’s fstab:


LOCAL.DOMAIN:/DIR /REMOTE/DIR nfs user, noauto, hard, intr, rsize=8192, wsize=8192, mountprog=MOUNTPROG, nfsprog=NFSPROG 0 0

(sorry if that wraps weird, it should go on one line)
in that case above, MOUNTPROG and NFSPROG are the numbers you figured
out before. LOCAL.DOMAIN is john’s fully qualified domain or

Now that John is thoroughly confused, he creates the local mount
directory, starts portmap, and now has some more weird commands to
throw at the host:

# smkdirall
# rpc_psrv -r -d /usr/local/etc/snfs/REMOTE
(where REMOTE is the remote host name)

Now john types in the root password for BARNEY, and he should get the
message: "ID String correctly read: RPC Proxy Client". Now he is
ready to mount:

# mount /REMOTE/DIR
# df

woo, you have a mount or something. And now you are thorougly
confused. As you may have noticed, a major disadvantage of SNFS is
that you need to know the host’s root password. also, you need to
have remote root SSH enabled. Not to mention, this is a messy setup.
but it works (sorta).

It is possible to set it up so that the remote
mount runs as non-root, with the correct setuid binaries, but that is
still messy. The people at SFS agree with you. the next section will
show you how to do it using SFS, a more elegant solution.

3. SFS

3.1. Background

NFS was originally developed by Sun for the purpose of mounting a disk
partition on a remote machine as if it was on a local hard drive.
this allowed for fast seamless sharing of files across a network. NFS
is an RPC service, and works through the RPC portmapper daemon. Most
unix-like systems ship with an NFS daemon included, or a NFS daemon
can be obtained, which is why it is the de-facto standard for network
file sharing.

The problem with NFS is that it is inherently insecure. The current
NFS protocol does not support any authentication, and any validation
is only done by IP, which can be easily spoofed. There is no
user-level restriction, so if anyone can exploit a client machine on a
NFS network or in some cases just plug another machine into the
network, they can gain access to the NFS server’s shares.

Linux supports NFS protocols v. 2 and 3. the implementation of NFS in
the linux kernel is partly user-space, and partly in the kernel space,
in order to keep things running fast.

SFS was created because the insecurities of NFS made it easily spoofed
and not very insecure. not to mention, if a client was connecting
with a dynamic IP, the NFS would have to change its exports as per
each time a client’s ip changed, or (very insecure) export it to an
entire IP range.

SFS version uses shared-key authentication at 1024 bits default
(higher bit keys are allowed)

3.2. Server set-up

To set up an SFS server, you create a user sfs and a group sfs, and
then get the sfs-0.6 source from and compile it. then you
set up your /etc/exports to export filesystems to localhost similar to
like you would SNFS:


/var/sfs/root localhost(async,rw)
/mnt/mp3 localhost(async,rw)
/mnt/work localhost(async,rw)

Then you create the file /etc/sfs/sfsrwsd_config as follows:


Export /var/sfs/root /
Export /mnt/mp3 /mp3
Export /mnt/work /work

note sfs requires a ‘root’ for the server. the exports will be under
this root. you now create /var/sfs/root , /var/sfs/root/mp3 , and
/var/sfs/root/work. then chown these to the user sfs group sfs. you
generate a host key for the server like so:

# sfskey gen -P /etc/sfs/sfs_host_key

After doing this, start your nfsd, and sfssd.

4. OpenAFS

4.1. Introduction

AFS was pioneered at Carnegie Mellon University, supported and
developed by TransArc corporation (now owned by IBM). AFS, or the
Andrew File System, is a distributed filesystem. IBM in 2001 branched
out AFS and created OpenAFS, an open source version of AFS. OpenAFS
is unique among this bunch in that it works on windows systems as well
as Linux.

Because openAFS is a distributed filesystem, it allows for high
availability. a cluster of servers can mirror the same AFS cell. Any
client will not know which server they are connected to, and when they
write to a file, the changes are propagated to the other servers in
the cell. if any server goes down, the cell can continue to operate.

4.2. Server setup

Gentoo has a good guide on how to set up OpenAFS. I will put up my own
guide to OpenAFS setup soon.

4.3. Client setup

To be a NFS client, you don’t need to edit any config files. all you
need to do is start portmap, and then provided you have the nfs tools
installed, simply mount the partitions. Let’s say JOHN from the
previous example wanted to mount the mp3 and work folders now. he
simply runs at the command line:

# mount -t nfs BARNEY:/mnt/mp3 /mnt/mp3
# mount -t nfs BARNEY:/mnt/work /mnt/work

It is very simple to mount NFS partitions. because no further
authentication is needed, on a fixed network with fixed hosts, one
can simply set a bootup script to mount these on startup.

Now John wants to connect to sfs. Here’s what he does: on the server
BARNEY (not his local client box) john creates a key from the shell
like this: sfskey register . he can do this from an ssh shell if he
wants to. it’s a one-time process. Now what john does from paris
is, he starts portmap and sfscd from his client box. Now as non-root,
he runs:

# sfsagent

sfs will not read your /etc/hosts file, it will totally ignore it (to
prevent spoofing, only a globally-verifiable DNS name will be
accepted) now john logs in with the passcode he put on his key, and
now he is into barney. he now proceeds to access his share:

# cd /sfs/
# ls

All his files will be read over a 1024-bit secure encrypted channel.
SFS is probably the best way for sharing files over an insecure
network, because it uses the tried-and-true nfs protocol, and once a
key is made, john can log in from anywhere.

Check for the next version of this howto to learn about OpenAFS client

5. Coda

Coda is another distributed filesystem from carnegie-mellon based on
AFS. Coda was open source from the beginning however, and it has been
available in the Linux Kernel since 2.2. Coda has one important
feature which NFS and OpenAFS do not, and that is disconnected
operation. disconnected operation allows a client to disconnect from
the network, and continue to access and modify files. when the client
reconnects to the network, that user’s changes are propagated back to
the coda cell. This is excellent for laptops, because then a user can
take the laptop away, modify his own home directory files, and then
put it back on the network and have all the files be resynched with
the global fileserver. Coda does this by locally caching files.
Because of aggressive local caching, Coda is also faster than OpenAFS.

One downside of all these features is that, since Coda needs to keep
so much metadata on the filesystems, it needs its own raw partition
for data storage.

Check back for the next version of this Howto for more info on setting
up coda.

6. Intermezzo

Intermezzo is another distributed file system, loosely based on coda.
Like coda, it allows disconnected operation. Intermezzo is meant to
be lightweight. It also needs a dedicated raw partition, which can be
any of ext2/3, xfs, or reiserfs. The intermezzo protocol is based on
HTTP, and it can use apache or any cgi-supporting http daemon for

Intermezzo is still in its early stages. However, linus made sure it
was included in the 2.4 kernel starting with 2.4.14, just before he
went off to work on the 2.5 kernel. Intermezzo is still in early
beta, but it shows a lot of promise.

In my tests with Intermezzo, i found it is a little flaky in its
setup, and the setup process is not very well documented. Whatever
documentation exists is inconsistent with the most recent version.

7. Summmation

There are many alternatives to NFS. NFS is a very good protocol, and
it has been tried and tested, but it is too insecure to be used over
wide area networks, unless encapsulated inside a VPN tunnel, or
another more secure protocol such as SNFS or SFS is used. Also, NFS
is limited in that a volume can only be exported by one server, so
that you cannot distribute the load across multiple servers or
implement failover. Coda, Intermezzo, and OpenAFS aim to leverage
that by distributing the filesystem. As of now, Coda looks to be the
best choice for a distributed FS, although IBM is working hard to
improve OpenAFS. Intermezzo is a little too unstable to be considered
usable, and not enough documentation exists.

Which one you choose is dependent on how you want to do things. If
you have a home directory you want to access when on the road and not
connected, then a caching system like Coda is probably for you.
However, if you simply want to access files from two connected systems
over the net (over Resnet or LAWN for example), you probably want to
use SFS as SNFS is a little tricky to setup, and SFS allows connection
from anywhere with the same public key using strong encryption.

8. Resources

Linux Security: A Whirlwind Tour

Table of Contents

2. Security Rant

2.1. Common security paradigm

Broadly, computer security is often divided into two fields: host
security, and network security. Host security concerns the integrity
of an individual host, which is a single node on a network. Network
security focuses much more on the integrity of an entire network and
analysis from this point of view. This is a useful paradigm for
constructing a secure computing environment. Secure each host, then
secure the whole network.

For Linux machine, the risks to both host and network security are
great. These risks stem from several root causes. One of the largest
factors is that since the source code for most software on a Linux system
is freely available, would-be attackers are free to analyze it and
locate security holes.

2.2. The Hackers

There are several types of people who carry out this type of analysis.
Some of them are willing to share their findings while many others are
not. Those who share their security related discoveries are often
called "white hat" hackers. Those who do not share their findings
then fall into two categories, the "grey hats" and the "black hats".
These metaphorical hats denote the ethical stance of the people in
question. Never assume that the "White hats" have found all of the

2.3. Security == Risk Management

For the end administrator, it is important to know that there are a
huge number of possible ways to attack a system and you should never
assume that a given piece of software is completely safe. Measures
should always be taken to mitigate risk. Security is ultimately about
risk management, and that’s how it should be approached.

3. Advanced uses of SSH

3.2. Port Forwarding

SSH port forwarding is a very powerful tool for securely
connecting two hosts. This can be done in several ways.

Relevant excerpt from man page:

ssh [-L port:host:hostport] [-R port:host:hostport] [-D port]
hostname | user@hostname

1) Local port is forwarded to remote port (static) Use ‘-L’ 2)
Remote port is forwarded to Local port (static) Use ‘-R’ 3)
Local port is forwarded to remote port (dynamic, specific to
application) use ‘-D’ (this is new)

The first two are the most common. The third type of forward
is specific to an application protocol, and as of OpenSSH
3.1p1 it only works as a SOCKS4 proxy.

Note that in the syntax for the ‘-L’ and ‘-R’ options, the
"host" entry is a hostname or IP, that is relative to the
machine you are actually connecting to. You are actually
connecting to a machine where you have an account to set up
the forward, and the "host" is contacted from the server you
are connecting to. This is probably the most confusing part
of the port forwarding scheme, and hopefully the examples
below will clear it up.


1) My computer is outside of Georgia Tech and I want to connect to the
news server using the NNTP Protocol. The NNTP protocol uses port
119 and the news server is The news server only
accepts connections from hosts inside the GT network, so I need a
port forward from some machine inside the network. I want to
connect to my local machine’s port 9999 and have it forward to

I have an account on, so I will use it to do the
forward to Note that I do not have an account on

Command Line:

ssh -L -N -C

The ‘-N’ option tells OpenSSH not to execute a remote command, so i
can just background ssh (ctrl-Z) after I authenticate. The ‘-C’
option tells OpenSSH to compress the data (very useful for X11
forwarding, too).

Alternatively, I could have used local port 119 (if I were root)
and then simply told my news-reader the server is "localhost" and
it would happily connect to localhost:119 but in reality connect to Kinda tricky, eh?

2) I have a local network of a few machines, and I am running a
webserver that only my internal network can see (it’s behind a
firewall). But, I want to be able to use that webserver when I’m
away from the home network in another office on a separate network.

So, here we set up a similar forward, but now we shall initiate the
forward from the server, instead. This time we will forward a
remote port (on another private network) to our local webserver
port. Our webserver machine is running a server on port 8080.

Say we want to forward a port on "" to our
internal server. In this case we must have an account on the
machine in question, "workstation1".

Command Line:

ssh -R 1234:localhost:8080 -N -C

So, when I’m logged into workstation1 at the other office, I point
my web browser to http://localhost:1234/ and I can access the
webserver in my home office just like I was there. Magic.

You might be wondering why I used "localhost" as the hostname in
the ssh command. This is because I was forwarding a port to the
local machine. I could have also forwarded the remote port to
another machine’s server in the network. Now that’s getting
complicated Smile

You can verify that the tunnels are in place using the netstat
command to examine which ports are open and what IP they are bound
to. OpenSSH always binds to so no one else from another
host can abuse your tunnel.

4. Virtual Private Networks

4.1. Intro

Security is a monstrously huge field and so to impart knowledge, I
must divide and conquer. In this presentation I will cover advanced
usage of OpenSSH and the rudimentary basics of VPNs under Linux and a
small rant about security.

The Secure Shell Protocol is a modern, advanced standard for securely
connecting multiple hosts. For the morbidly curious it is defined in
a RFC (Request For Comment); use Google to find it. The protocol is
designed to do much more than simply substitute for telnet or rsh. It
is a highly layered, configurable communication system which can be
used to connect hosts with several different types of communication

Examples include:

  1. X11 forwarding
  2. SSH port forwarding (tunnels)

I must also point out that OpenSSH is itself open to several forms of
attack, and I encourage administrators to restrict the hosts that are
allowed into your boxen. Other security professionals have
recommended the use of the commercial SSH package instead.

For further information on how to use SSH, there are two excellent
articles on the LUG web server (see Links section below) SSH keys are
particularly useful to me, since I use many systems on a daily basis.

What is a VPN? Broad Definition: (my own) A VPN is a way to
securely connect two or more physically separate networks over the

This means that we have a "virtual" network that is comprised of
the two separate network, and nobody in between the networks can
understand the VPN traffic.

Why is it useful?

VPNs are usually used in business settings where sensitive data
must be transmitted between different business locations that are
often very far away physically. It offers a more general solution
than simply using SSH because they encrypt *all* IP traffic, and
they are totally transparent to the end hosts. The burden of
encryption is moved to the VPN gateways, which handle all of the
details of security.

4.2. VPN technologies

There are two major categories of VPN technologies in place today:

  • IPSec (part of IPv6)
  • non standardized: CIPE, vpnd, etc

IPSec is a standardized VPN technology for IP networks. It is a
part of IPv6 and must be included in any IPv6 implementation and
it is optionally available for IPv4 implementations as well.

There are other methods of connected networks using encryption
that can also be called VPNs, but I will stick to IPSec as it is
most common and a real standard though rather complex.

It would be a waste of space to try and describe the IPSec
protocols in this document, you can read the relavent RFCs/Books
for that. I can describe IPSec as "just another layer" of IP that
adds encryption and uses UDP for authentication. It is very
flexible like SSH and there are many implementations that are very
different in what they provide.

There are two important pieces to IPsec:

  • Authentication – proving the identities of the hosts
  • Encryption – agreement on the method, key exchange,etc.

VPNs are an extremely complex technology, which is why they are
probably mostly used in business settings where they are needed
most. They can be rather expensive (both money and time) to set
up and maintain. When they break, it is often a nightmare to
debug them. I’m just dripping with optimism, aren’t I? 🙂

4.3. Linux Free S/WAN

For Linux, the foremost software is called Free S/WAN which is
named after a commercial product (Secure WAN). It is a free IPv4
IPSec implementation for Linux Kernel 2.4.x. It comes in one
package with two parts:

1) Kernel Level Support

  • requires patching
  • requires GMP (GNU MultiPrecision Arithmetic Library)
  • Creates IPSec networking modules
  • Creates new ipsecX interfaces that correspond to physical

2) Proprietary user space tools

  • scripts that support SYSV style init
  • userspace tools to augment kernel modules
  • Can start/stop/reload IPsec
  • Supports various levels of logging
  • Pluto – Name of the authentication daemon – UDP port 500
  • /etc/ipsec.conf is the main config file

To Install it, download the tarball from the site, configure a
Linux kernel source tree, and use the targets from the freeswan
makefile to build your kernel. The Freeswan makefile will also
install the userspace tools for you. I’m not sure if any major
distributiors have created packages of them yet.

The configuration of FreeSwan is rather difficult, ask anyone’s
who has tried it. There are several examples on their online
documentation. But to understand what is going on, the
adminstrator needs a very good understanding of IP networking and
of the IPSec protocols.

Interoperability with other VPNs

  • Shared Key
  • RSA
  • X.509 certs
  • difficulties

Unfortunately, it is often very hard to get two different VPN
products which supposedly speak the same language (IPSec) to talk
to each other. Linux Freeswan for example, does not (by default)
support X.509 certificates, which is the most common method of
authentication for commercial products. There is a patch for
support, but it is, again, tricky to get it to work.

IPSec does support a lowest-common-demoninator form of
authentication called shared key which is just that, a shared key
between the two VPN hosts. FreeSwan also supports RSA
authentication, though I haven’t seen any commercial products
which support that method.

4.4. Stability, bugs

In my experiences with some of the earlier versions of FreeSwan,
I have encountered many bugs and problems in the code. Not to
say that it does not work, it certainly does, but be aware that
it is still very much a work in progress.

Introduction to LDAP

Table of Contents

1. What is LDAP?

LDAP stands for "Lightweight Directory Access Protocol". It is a TCP/IP
implementation of the X.500 DAP/OSI protocol.

Note: X.500 = DAP (DAP is just an older, non-standard name).

A Directory is just a database that usually follows these properties:

  • designed for reading more than writing
  • offers a static view of the data
  • simple updates without transactions

A Directory Service adds a network protocol used to access the
directory, on top of the above. We’ve all used a directory service in
the past day: DNS!

LDAP is defined by RFC 1777 ( Some
common points of the standard are:

  • a network protocol for accessing information in the directory
  • an information model defining the form and character of the
  • a namespace defining how information is referenced and organized
  • an emerging distributed operation model defining how data may be
    distributed and referenced
  • designed-in extensibility

2. What good is LDAP?

A Directory holds information. It doesn’t matter what type: text,
photos, urls, pointers to whatever, binary data, public key
certificates, etc. (Note here that the particular LDAP server you use
may have limitations.)

There are different contexts for a Directory (and Directory Service).

  • LOCAL – only for a subset of machines/users/etc.
  • GLOBAL – can be accessed by anyone

LDAP is a vendor-independent, platform-independent protocol…this means
interconnection is easy! (The Internet, for instance.) Also because of
this same reason, translating from LDAP to another protocol/system is

Currently existing gateways:

  • LDAP to X.500 (and vice versa)
  • HTTP to LDAP
  • WHOIS++ to LDAP
  • E-mail to LDAP
  • ODBC to LDAP
  • and more!!!

Concrete example:
Address books usually use LDAP to store the book on a centralized
server and then pull down the information when requested. Netscape
Communicator uses this model. (Microsoft Exchange/Outlook does
something similar, but Microsoft hacks the protocol some.)

When the user pulls up his/her address book, the request is sent to
the LDAP server. This server then returns each entry in the book in
a standard format, similar to using XML.

3. Schemas

The Directory is actually a distributed, tree-like structure. Every
entry in the directory has a distinguished name (DN) which uniquely
identifies that entry. The DN can be generated by concatenating the
relative distinguished names (RDNs) of entries higher up in the tree.


If you notice, the RDNs are all of the form parameter>=value>. The
idea behind a schema is related in the following flow chart (read it
like a CFG):

root := root country
| root locality
| root organization
| (epsilon).

country := locality

| organization.

locality := organizational_unit.

organization := organizational_unit.

organizational_unit := organizational_unit container
| (epsilon).

[Here a container is the base object, holding extremely specific]
[data, like a person's name, a department's budget, etc.]

A really good reference for learning about schemas for use in LDAP can
be found at:

4. Using LDAP in your shtuff

There is no "way" to use LDAP. It’s more of a methodology:

Each language usually has its own hooks into LDAP.

C has a whole API suite.

Java uses the Java Naming and Directory Interface

A good step-by-step HOWTO can be found in Chapter 4 of IBM’s Redbook: It uses
the C API to walk through accessing a LDAP server.

5. OpenSource Projects

OpenLDAP is perhaps the best known due to naming popularity and
similarities. The project consists of a stand-alone LDAP server, a
replication server, and client-application libraries. The latest
version is 2.1.8 as of this writing.

Installing and using OpenLDAP is fairly straight-forward. There is a
great online/HTML HOW-TO available from OpenLDAP’s site:

6. Other OS Projects

7. Resources