May 21, 2003 / Kurt Nelson / 0 Comments
Table of Contents
A common task, in shell programing and elsewhere, is to take a stream of
characters and somehow modify it or extract data from it. Two powerful tools
that UNIX offers for this purpose, both using the magic of regular expressions,
are sed and awk.
1. Regular Expressions
Regular
expressions give the user the power to match any regular language while managing
to be completely unreadable and incomprehensible. To further complicate things,
there are two types of regular expressions defined by POSIX, basic and extended,
and no two tools, or even two implementations of the same tool, seem to be able
to agree on what the difference really is. For the most part, though, sed and
awk implementations are at least compatible with the POSIX definitions, even if
additional layers and features are added on top.
A regular expression is
used to match some portion of a string. At its most basic, a regex is just a
substring. So the string "Caution! Contents may be hot!" contains
matches for the regular expression
Caution!
as well
as
may be
or even
onten
Regular expressions are case sensitive, so the string contains no matches
for
HOT
Simple, no? But not very powerful. Let’s
add few extra characters. If a circumflex (‘^’) is the first character of a
regex, it will match the beginning of the string. Likewise, if a dollar sign
(‘$’) is the last character of a regular expression, it will match the end of
the string. So the regular expression
^Caution!
would match Caution! in the above example string, but not the in the
string "Wet floor ahead, Caution!". Similarly, the expression
Caution!$
would match the Caution! in the second
string, but not in the first.
Now let’s say you want to match more
than one possibility for a character. Characters between brackets (‘[‘ and
‘]’) are treated as a list of possible characters to match. So
‘[abcd]
‘ would match a single character, and that character may
be ‘a’, ‘b’, ‘c’, or ‘d’. Bracket expressions can also use ranges, so the
previous example is equivalent to ‘[a-d]
‘, though constructs
such as this are sometimes bad for internationalization. More on that
later.
Bracket expressions may also by negated using the circumflex
(‘^’) as the first character. So ‘[^abcd]
‘ would match a single
character that is not ‘a’, ‘b’, ‘c’, or ‘d’.
A few
exceptions are needed in order to match the characters ‘]’, ‘^’, or ‘-‘ in a
bracket expression. To match a ‘]’, make it the first character, after the
circumflex if one is used. So something like ‘[]abcd]
‘ or
‘[^]abcd]
‘. To match a ‘-‘, make it either the first or last
item in the list, after the circumflex. To match a ‘^’, just put it anywhere
except up front.
Another useful way to match multiple possibilities
for a character is the period (‘.’), which will match any character. So the
regular expression ‘Ca.tion
would match both
"Caution" and "Caption".
Note that any of these
special characters can be escaped with a ‘\’ to remove their special
meaning, so the expression ‘\.
‘ would match a period
character. ‘\\
‘ matches a backslash character.
More
than one character may be matched at a time using the asterisk ‘*’. An
asterisk following a character or a bracket expression will match zero or
more instances of that character or bracket expression. For example,
let’s say you’re programming in LISP for some reason, and want to match
every possible car and cdr expression. You could do this using
‘c[ad]*r
‘ which will match "car", "cdr",
"caadr", "cdaar", and everything else. However, it also
matches "cr", which probably isn’t something you want. You can
avoid that using ‘c[ad][ad]*r
‘ which forces at least one
instance of [ad] to exist for a match, but there is a cleaner way that
we’ll look into later.
Regular expressions match repetitions
greedily, meaning that it will match as long a string as it possibly can.
So the regular expression ‘.*power
‘ applied to the string
"My power supply is not powerful enough" would match "My
power supply is not power".
If an expression is enclosed in
escaped parenthesis (‘\(‘ and ‘\)’), the entire enclosed expression will
be treated as a single element. So the regex ‘\(bob\)*
‘
would match a string of zero of more bob’s. In addition to allowing
better groupings for repetitions, the text matched within a parenthesis
group may be uses later in the expression with \digit>, with the
first parenthesis group (ordered by the beginning of the grouping) being
\1, the second \2, and so on through \9. So
‘\([Bb][Oo][Bb]\)\1\1
‘ would match "BOBBOBBOB" and
"BoBBoBBoB" but not "BOBbobbob".
A specific
number of repetitions can be specified by adding appending a number
enclosed in escaped curly braces (‘\{‘ and ‘\}’) to an expression. So
"BOBbobbob" could be matched using
‘\([Bb][Oo][Bb]\)\{3\}
‘. Ranges can also be given as
‘\{start,end\}
to match between start and end repetitions
inclusive, or ‘\{start,\}
‘ to match at least start
repetitions. POSIX does not specify behavior for ‘\{,end\}
‘,
but pretty much everyone implements it.
For Basic Regular
Expressions, that’s about it. Extended Regular Expressions treat
unescaped curly braces and parenthesis as the special characters, and add
a few more special characters of their own.
If two expressions are
separated by a vertical bar (‘|’), then either expression will be matched.
So ‘(bob|jimmy)
would match either bob or jimmy.
The
addition symbol (‘+’) can be used to match one or more of an expression,
so ‘c[ad]+r
‘ would solve the problems of the car and cdr
example above. ‘expression+
‘ is equivalent to
‘expression{1,}
‘. A question mark (‘?’) following
an expression will match that expression zero or one times. So
‘expression?
‘ is equivalent to
‘expression{0,1}
‘.
Another nice little
feature not defined in POSIX but implemented by pretty much everyone is
that escaped angle brackets (‘\‘ and ‘\>’) can be used to match
the beginning or the end of a word. So the expression
‘Caution\>
‘ would match "Caution" and not
"Cautionary".
I mentioned earlier that using ranges in a
bracket expression is bad, and this is because not all character sets are
created equal, or even contiguous. So while something like
‘[A-Za-z]
‘ may match all letters in ASCII, but it wouldn’t
match things like , and who knows what it might do in something
like EBCDIC. To solve this problem, equivalency classes were created, and
given an even more horrible and confusing syntax. If something like
‘[:alpha:]’ occurs in a bracket expression, this matches any character
that would return true for isalpha() in the current locale. Note that the
brackets around the equivalency class are additional brackets, not the
ones already around the bracket expression. So, in ASCII
‘[[:alpha:]]
‘ is equivalent to ‘[A-Za-z]
‘,
‘[[:lower:][:digit:]+=*]
‘ is equivalent to
‘[a-z0-9+=*]
‘, and so on.
Further complications are
introduced with collating elements, but that gets more into
internationalization than I care to cover in this article.
So, now
that you’re a regularly matching fool, what next?
2. Sed, the stream editor
When given an set of rules and some
inputer, sed will read a line of the input, modify it according to the
provided rules, output the modified form, and repeat until the input is
gone. The most common use of sed is to replace a regular expression with
some other string, like
s/foo/bar/
which will
replace the first foo on each line of the input with bar. If you want to
replace every foo on each line, add a ‘g’ after the replacement
string.
s/foo/bar/g
The sed commands are often
provided along with the invocation of sed, as in
sed
's/foo/bar/'
Sed uses basic regular expressions, so it requires
that the special characters be escaped with backslashes, otherwise they
are interpreted as the literal character. For example, to use parenthesis
to group elements, they must be used as ‘\(stuff\)
‘. GNU sed
defines an "extended regular expression" mode which eliminates the need to
escape these characters, but at the cost of portability. GNU sed also
allows for ‘?’ and ‘+’ in regular expressions, though they must be escaped
(‘\?’ and ‘\+’) if -r is not being used.
The choice of ‘/’ as the
separating character above is arbitrary; any character could be used.
Another common choice is to use ‘%’ to avoid having to escape large
numbers of ‘/’ in the expression or in the replacement text. So
‘s/regex/replace/
‘ is equivalent to
‘s%regex%replace%
‘.
Another option that can be
appended to a replacement, like ‘g’, is ‘p’, which will print the line to
stdout if a replacement was made. This should only be used if sed is
invoked with the ‘-n’ flag, which will cause sed to print nothing unless
explicitly requested with a ‘p’. POSIX does not specify whether lines
printed with ‘p’ should be printed again, so depending on the sed
implementation, some lines may be printed twice.
2.1. Line addresses
Addresses may be specified before the command to limit on
which lines the command will be executed, such as in
‘12s/foo/bar/
‘ which will replace foo with bar, but only on
the 12th line of the input.
Addresses may be a line number (’12’),
a regular expression enclosed in slashes (‘/c[ad]*r/’) which will match
any line containing the expression, or the dollar sign (‘$’) which matches
the last line. A range may also be given as addr1,addr2. If regular
expressions are used in an address range, the first line that matches the
regular expressions will be used. If the first address in a range is a
regular expression, matches for the second address will be checked
beginning with the next line.
The choice of ‘/’ characters to
delimit regular expression addresses is not necessary, but if another
character is used, the first one must be prefixed by a backslash, since
otherwise it will be interpreted as a command. This character does not
affect the delimiting character in ‘s’ commands, so something like
‘\%c[ad]*r%s/r//
‘ is valid.
2.2. Other commands
Other useful commands are ‘d’, which deletes
the line matching the address, and ‘p’, which prints out lines matching
the address, or every line if no address is given (again, this should only
be used in conjunction with -n, since the behavior otherwise is
undefined). These three commands will make up nearly all of your usage of
sed.
The only (portable) command line options that sed accepts
besides -n are -f script-file, which reads in a script from the
given filename, and -e script, which adds the given sed command
to the script to be executed. If -f or -e are given, then a sed command
cannot be given as an operand without -e, since otherwise it will be
interpreted as a filename. If multiple -f or -e commands are given, they
are evaluated in order.
Multiple filenames may be given, and will
be concatenated in order and run through the sed program. stdin is only
used if no filenames are given.
3. How sed really works
Sed has two memory spaces, the hold space
and the pattern space. For each cycle, the pattern space is cleared, a
line of input is read into the pattern space, the program is run, and, if
the -n flag was not given, the final contents of the pattern space are
written to the output. This repeats until all input is read, or until
execution is terminated with the ‘q’ command. Nothing is ever
automatically placed in the hold space, but there are several commands to
manipulate it.
The ‘s’ command, in addition to being the most
useful for actual text processing, can also be used for conditional
branches. A branch point can be defined using ‘:
LABEL
‘, and the command ‘t LABEL
‘
will branch to this label if a successful substitution has been made since
the last branch or input read. ‘b LABEL
‘ is the
unconditional counterpart. If no label is given to either t or b, they
will jump to the end of the script, which is useful for starting a new
cycle.
Using all of this, powerful, incomprehensible programs may
be written, like the implementation of the dc calculator shipped with the
GNU sed source, or the following very short text adventure:
# Should be runnable either with or without -n
# Only
commands supported are directions, since I didn't want this to get
#
three miles long
#
# Trying very hard to use only BREs
#
#
Look text shamelessly stolen from Infocom's ZORK
# restore
state
# x exchanges hold and pattern spaces
# Each room must
exchange back to read input
x
s/room0/&/
t
room0
s/room1/&/
t room1
s/room2/&/
t room2
#
default
b room0
# North goes to room1, south goes back
to room0, southeast goes to room2
: room0
x
# i\ outputs text
up to first line without trailing '\'
# '{' and '}' commands are used
to create groups matched by a
# single address
# expression
matches line containing word "look" optionally surrounded by
#
whitespace, and nothing
else
/^[[:space:]]*look[[:space:]]*$/{
i\
Maze\
You are in
a maze of twisty little passages, all alike
b end
}
#
Matches optional leading "go" and word "n" or "north"
# directions
work by putting room name in pattern space, and if substitution
#
was made, the room name is copied to the hold space and the pattern
space
#
cleared
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}[[:space:]]*[Nn]
\([Oo][Rr][Tt][Hh]\)\{0,1\}[[:space:]]*$/room1/
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}
[[:space:]]*[Ss]\([Oo][Uu][Tt][Hh]\)\{0,1\}[[:space:]]*$/room0/
#
No '|' in BREs, so need two expressions for 'se' and
'southeast'
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}[[:space:]]*
[Ss][Ee][[:space:]]*$/room2/
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}[[:space:]]*
[Ss][Oo][Uu][Tt][Hh][Ee][Aa][Ss][Tt][[:space:]]*$/room2/
t
copyend
b badend
# South goes back to room0, North goes to
room2
: room1
x
# Matches any line that begins with the
word "look
/^[[:space:]]*look[[:space:]]*$/{
i\
West of
House\
You are standing in an open field west of a white house,
with a boarded\
front door.\
There is a small mailbox
here.
b
end
}
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}[[:space:]]*[Nn]
\([Oo][Rr][Tt][Hh]\)\{0,1\}[[:space:]]*$/room2/
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}
[[:space:]]*[Ss]\([Oo][Uu][Tt][Hh]\)\{0,1\}[[:space:]]*$/room0/
t
copyend
b badend
# East wins and quits, West goes to
room0, South goes to room1
:
room2
x
/^[[:space:]]*look[[:space:]]*$/{
i\
Stone
Barrow\
You are standing in front of a massive barrow of
stone.In the east face is a\
huge stone door which is
open.You cannot see into the dark of the tomb.
b
end
}
# delete input so not printed when
quitting
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}[[:space:]]*[Ee]
\([Aa][Ss][Tt]\)\{0,1\}[[:space:]]*$//
t
win
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}[[:space:]]*[Ww]
\([Ee][Ss][Tt]\)\{0,1\}[[:space:]]*$/room0/
s/^[[:space:]]*\(go[[:space:]]\)\{0,1\}[[:space:]]*[Ss]
\([Oo][Uu][Tt][Hh]\)\{0,1\}[[:space:]]*$/room1/
t
copyend
b badend
: win
i\
You win!
# d starts
a new cycle, so is not good for deleting pattern space and
quitting
# there will be an extra newline printed out at the end if
-n not used
q
: badend
# assumes all unknown commands
are directions, for brevity
# strips off leading "go", prints out
rest
# does nothing if there is no
input
/./s/\(^[[:space:]]*go[[:space:]]*\)\{0,1\}\(.*\)/There is no
exit to the \2/p
b end
: copyend
# h replaces the hold
space with the contents of the pattern space
h
: end
#
delete whatever is left in the pattern space so it is not
printed
d
Interaction with this little script may look
something like this: (input bold, output italic)
$ sed -f
adventure.sed
look
Maze
You
are in a maze of twisty little passages, all alike
go
southeast
look
Stone
Barrow
You are standing in front of a massive barrow of
stone. In the east face is a
huge stone door which is
open. You cannot see into the dark of the
tomb.
e
You win!
$
Another command that is useful to know in sed is ‘N’,
which reads another line of input and appends it to the pattern space.
This can be used to match multi-line expressions. However,
considerations must be made for lines of data read in unusual
contexts.
# replace all 'one\ntwo' with
'three'
: begin
N
s/one\ntwo/three/
t
# if line
read has 'one\n', strip out second line so first can be
#
output.Restore 'one\n' line from hold space and read next
line
h
s/\(.*\n\).*one$/\1/
t again
b
:
again
s/\n//
p
x
s/.*\n\(.*one\)$/\1/
b begin
Several other commands exist in sed, and are described in the GNU
sed info and man pages, among other places.
4. Awk, that other shell utility thing
Awk is named after its
creators, Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan.
According to the gawk (GNU awk) info page, awk programs are
"refreshingly easy to read and write" compared to programs written in
traditional procedural langauges.
Awk is described as
"data-driven", in that rather than a list of commands to perform on
data, awk programs are a description of data and actions to take based
on which descriptions are matched.
Awk programs consist of a set
of rules of the form ‘PATTERN { ACTION }
‘, where the
pattern can be ‘BEGIN’, ‘END’, an extended regular expression enclosed
in /’s, an awk expression that is matched if it evaluates to a non-zero
value, nothing at all which matches everything, or a range given by as
‘PATTERN1, PATTERN2’. The pattern ranges are unlike those in sed in
that they may be repeated; after the end of a range is found, the
beginning may be matched again.
Most of your interaction with
awk will probably be with a small subset of its features. The most
commonly used awk command is ‘print’, usually used in conjunction with
awk’s field separation features. awk '{print $4}'
would
print the fourth field of every line of input, so if you were to, for
example, run the output of ls -l
through this tiny
program, awk would spit out a big list of group names.
Of
course, awk is much more than a fancy cut.
4.1. Separation of Data
Awk views input as a sequence of
records, and it views records as a collection of fields. By default,
each line of input is a record, and each portion of a record separated
by whitespace is a field. The behavior for records can be changed by
setting the RS variable. RS is a single character (or no character, in
which case all input becomes one record), and by default is ‘\n’.
Field separation can similarly be modified by setting the FS variable.
An initial value can be given to FS on the command line, with the -F
flag. Depending on its contents, FS can be interpreted in three
different ways. By default, FS contains a single space (‘ ‘), which
means that leading and trailing whitespace is ignored, and fields are
separated by any number of spaces or tabs. If FS contains a single
character, the behavior is more like that of cut, in that each
occurrence of the FS character will start a new field, and if more than
one FS characters are adjacent, it will be interpreted as an empty
field. awk -F : '{print $3}' /etc/passwd
would print out
the UID of every user, and is equivalent to cut -d : -f 3
/etc/passwd
.
The third mode for FS is when it contains
more than character, in which case it is interpreted as an extended
regular expression. Field separators are then matched starting from
the left, and using the longest possible non-empty string. The fields
are whatever is left in between.
Fields can be accessed using
the $number> variables, and the entire record can be accessed
using $0.
4.2. Examples
Rather than
cover every detail of awk syntax, which would be rather long and
boring, I’ll just go over a few examples. If you want to learn more
about awk, the gawk man and info pages have a complete description of
what awk can do.
Suppose you have a directory listing from ls
-l, and you want to know exactly how many bytes are being used by the
files. Awk can do this, simply by taking the sum of the 5th fields of
each record (which in the case of ls, would be the file length).
Recall that ls -l output looks something like this:
total 28
drwx------2
davidusers4096
2003-05-19 22:31 directory/
-rw-------1
davidusers11
2003-05-19 22:31 file1
-rw-------1
davidusers17138
2003-05-19 22:31 files
Variables can be treated as either
strings or numbers, depending on the context, and conversions are made
automatically, so we can use $5 in a sum simply by adding to it. If
something like $4 were used in a sum instead, it would be converted to
0.
So we can take the sum of the lengths with the following:
BEGIN { total = 0 }
{ total = total + $5 } # 'total
+= $5' would also work
END { print total }
So just assigning
to a variable will cause it to spring into being. The BEGIN statement
could be omitted, since the value for an empty numeric variable is zero
(this can also be seen as unassigned variables being equal to the empty
string, and the emptry string being converted to 0 when used as a
number). For the first record (total 28), $5 is also equal to 0, since
the fifth field of this record is empty.
Also, note that
variables are referenced only by their name, instead of "$name" as in
Bourne shell and some other scripting languages. If $total were used
instead of total, awk would take the current value of total as a
number, and then try to interpret that as a field number.
The
above program may still not be what you want, since directories are
included in the sum as well. Those can be easily eliminated through
pattern matching.
/^-/ { total += $5 }
END {
print total }
This will only add the file’s length to the sum
if the record begins with ‘-‘, which would mean it is a regular
file. Similarly, a pattern of
!
/^d/
would only omit directories.
Patterns
such as this can become cumbersome if only a specific field
matters, so matches may be made based only on a particular field.
Suppose you want only the files owned by the user root.
$3 ~ /^root$/ { total += $5 }
END { print total
}
The ~ operator will result in true if the awk expression
on the left matches the regular expression on the right. !~ can
be used for the opposite.
And for one last example, let’s
throw some numeric and string tests. Same situation as before,
but now we only want to consider the length of the file if it is
greater than 1024, but not if the user’s name is more than 5
characters long.
($5 1024) &&
(length($3) = 5) { total += $5 }
END { print total }
5. Further Reading
May 14, 2003 / Kurt Nelson / 0 Comments
Table of Contents
1. Introduction
Bogofilter is a nice UCE filtering tool that uses Bayesian statistics
to track messages and learn what to detect. Bogofilter was originally
written by Eric S. Raymond.
Some sites of interest:
The above sites include the Bogofilter home page and a couple of sites
that discuss Bayesian statistics. Caution, the sites contain math,
which may or may not be desirable in your case. If you prefer, just
consider Bayesian statistical calculations "magic". That’s what I do.
2. Why use Bogofilter?
There are now several products available that do what Bogofilter does.
SpamAssassin and SpamBayes are two popular ones. So why choose
Bogofilter? Bogofilter is written in C, which means it is slightly
more robust when it comes to execution. Other filters are written in
Perl and Python, which offer other advantages, but speed usually isn’t
one of them.
I like Bogofilter simply because it’s small and doesn’t rely on a lot
of external support software.
3. Installation
Bogofilter is super easy to install. The project appears to be
offering RPM packages now. If that floats your boat, download the
package and you’ll be up and running.
If you suffer from the Not Compiled Here problem like me, grab the
source and compile and install it:
gzip -dc bogofilter-0.10.3.1.tar.gz | tar -xvf -
cd bogofilter-0.10.3.1
./configure --prefix=/usr/local
make
make install
For those that like to compile things, but the above steps appear
scary, grab the source RPM and let RPM compile it for you.
4. Seeding the Filter
Bogofilter scores spam and stores the results in two databases: the
good list and the spam list. These are BerkDB files that grow as you
use bogofilter. You must seed bogofilter for it to be useful. There
are several ways to do this. Manually can be painful. Getting a DB
dump from another bogofilter user is handy. If you get your hands on
other DB files, you need to dump them to text first and then load them
on your system:
# On the source machine
bogoutil -d goodlist.db
goodlist.txt
bogoutil -d spamlist.db
spamlist.txt
# On your machine
cat goodlist.txt | bogoutil -l goodlist.db
cat spamlist.txt | bogoutil -l spamlist.db
Using another set of data for your seed may or may not be a good idea.
Be sure to think about this before doing it. Ideally you should seed
your particular bogofilter installation with UCE that you have
received. To seed bogofilter by hand, take your mbox file (or
collection of email files) and pipe them through bogofilter with the
-s option if it is spam, -n if it is not spam. The formail(1) tool
is handy for doing this.
5. Wrapper Script
I use a wrapper script to invoke Bogofilter which currently just forces
the configuration path. At one point in time, it was forcing some
other settings. I still use it, and it is simply:
#!/bin/sh
/usr/local/bin/bogofilter -d /usr/local/etc/bogofilter/ $*
exit $?
The script is root:root and 0755.
6. Procmail Modifications
Bogofilter hooks in to procmail with ease. The man page for bogofilter
gives a good procmailrc example. Here’s what I do:
VERBOSE=yes
LOGDIR=$HOME/.procmail
LOGFILE=$LOGDIR/log
# Scan for spam
:0fw
| /usr/local/bin/spamfilter -u -e -p
# Return mail to queue on bogofilter failure
:0e
{ EXITCODE=75 HOST }
# Place in SPAM mbox if it's spam
:0:
* ^X-Bogosity: Yes, tests=bogofilter
SPAM
7. Mailer Modifications
The man page provides some macros for Mutt that let you handle UCE That
bogofilter didn’t catch. I have Mutt configured so that if I hit
Esc-Del, the message is forced through bogofilter flagged as spam.
Pressing just Del will delete the message.
8. Global Installation
Global installation can be done several ways. No special steps are
required other than just installing hooks in the global procmailrc
file. If you want users to be able to train bogofilter with spam that
wasn’t caught, you will need to make bogofilter setuid root or create a
user and/or group that bogofilter runs as and change the database files
to that user/group.
I recommend that you install bogofilter under your account only rather
than globally.
9. Upgrading Bogofilter
From time to time you will want to download and install a new version
of bogofilter. The authors make upgrades easy with the bogoupgrade
command. This command upgrades your data files. You still need to
compile and install the new version, but they always provide a tool to
upgrade the BerkDB files. Be sure to check the man page for details.
10. Resources
May 8, 2003 / Kurt Nelson / 0 Comments
Table of Contents
1. Introduction
This is a quick and dirty introduction to Mon, a server monitoring tool that can
be used to monitor any number of services running on any number of servers. Mon
is useful for a system administrator who needs to be notified as soon as a
server or network resource goes down, so that he can respond immediately and
have as little downtime as possible.
Mon’s strategy is to be highly modular, thereby allowing you to write whatever
monitors (programs that check a service’s status) and alerts (programs that let
you know when a service has gone down or come up) tickle your fancy.
Don’t worry if you don’t want to or know how to write such scripts, though,
because the likelihood is that someone has already contributed the monitor or
alert script you’re looking for. For the purposes of this introduction, I will
assume we don’t need to write our own. Otherwise I’d need to spend more than 30
minutes writing this presentation! Besides, if you’re at that level already, you
can just read the mon manpage and figure it out for yourself.
Mon can also listen as a server daemon on a particular port, which allows other
computers running some form of the mon client to contact the mon server when
something running on it is down, so that mon can do whatever it has to do to let
you know. This feature is called event trapping (or "traps" for short). This is
also beyond the scope of this presentation, but is not too difficult to
implement.
2. Getting Mon
The first thing you’ll need to do is download mon. You can find it at:
http://www.kernel.org/software/mon/
Once you have gotten the mon tarball, follow these steps:
- untar it to /usr/local/lib/mon (trust me, it kind of assumes that you will be
putting it there).
- Move the contents of the etc/ directory into /etc/mon
- mkdir /var/state/mon for the mon state information
- touch /var/state/mon/disabled
Now you’ll want to download all the user contributed programs. There are the
Download them all into /usr/local/lib/mon and untar them into the directories
they untar into. Easy enough. Anyway, once they’re untarred, and you’re still
in /usr/local/lib/mon, you should move the resulting files as follows:
# mv monitors/*/* mon.d
# mv alerts/*/* alert.d
3. Configuring Mon
Now you’re ready to copy /etc/mon/example.cf to /etc/mon/mon.cf and edit
/etc/mon/mon.cf. This file is laid out as follows:
- Global options
- Hostgroup definitions (assigning names to sets of hosts)
- Watch definitions (defining what will be monitored for each host group)
Each watch definition consists of any number of service definitions. A service
definition defines one service type that you will be checking on the current
host group.
Each service definition consists of:
- Various service options, including the frequency with which to check, the
monitor program to use for the check, and a description string for the
service
- One or more period definitions that dictate how to behave if the monitors
fail during various times of the day or week
Each of these period definitions consists of various options such as what alert
and upalert programs to use, and with what options, as well as options that
dictate how frequently to notify you if the service remains down, or how many
failures must occur before the alert is sent.
The cool thing is that instead of using /etc/mon/mon.cf, you can call it
/etc/mon/mon.m4 (and make sure to start mon with the "-c /etc/mon/mon.m4"
option), and mon processes the file with m4 before processing the mon directives
in the file. This can be useful for DEFINEs, so you can keep from having to
write the same email address, pager number, time interval over and over
throughout the file, but instead use the DEFINEd version of it. See the included
example.m4 for a good example of this.
4. User contributed scripts
There are so many user contributed scripts, that I figured I’d list them here so
you could see all of them.
The monitors:
asyncreboot.monitor
bootp.monitor
cpqhealth.monitor
dialin.monitor
dir_file_age.monitor
dns.monitor
file_change.monitor
flexlm.monitor
foundry-chassis.monitor
fping.monitor
freespace.monitor
ftp.monitor
hpnp.monitor
http.monitor
http_integrity.monitor
http_t.monitor
http_tp.monitor
http_tpp.monitor
https.monitor
icecast.monitor
imap.monitor
informix.monitor
informixdbspace.monitor
ipsec.monitor
ldap.monitor
lwp-http-post.monitor
mailloop.monitor
mon.monitor
msql-mysql.monitor
na_quota.monitor
netappfree.monitor
netsnmp-exec.monitor
netsnmp-freespace.monitor
netsnmp-proc.monitor
|
nntp.monitor
ntp.monitor
ntservice.monitor
phttp.monitor
ping.monitor
pop3.monitor
postgresql.monitor
printmib.monitor
process-full-command-line.monitor
process.monitor
radius.monitor
rd.monitor
reboot.monitor
remote.monitor
rpc.monitor
rptr.monitor
samba.monitor
seq.monitor
silkworm.monitor
smtp.monitor
smtp3.monitor
smtp_rt.monitor
snmp_interface.monitor
sqlconn.monitor
ssh.monitor
startremote.monitor
tcp.monitor
tcpch.monitor
telnet.monitor
traceroute.monitor
umn_mon.monitor
up_rtt.monitor
xedia-ipsec-tunnel.monitor
|
The alerts:
bugzilla.alert
file.alert
gnats.alert
hpov.alert
mail.alert
|
netpage.alert
qpage.alert
remote.alert
simplepage.alert
sms.alert
|
snapdelete.alert
snpp.alert
test.alert
trap.alert
winpopup.alert
|
Additionally, there is a cgi program in the cgi-bin package to create a status
webpage that the average joe can grok (and even use it to modify mon’s
parameters while it’s running). There is also a GUI configuration utility in the
utils package, but it uses Perl/Tk (yuck), and is not too good anyway, but is
worth a try.
Anyway, I have about -3 minutes left to write this, so here comes the example
configuration, and you can probably find whatever else you need in the manpage!
5. Example configuration
Download
this example configuration.
#####################################
# Global Options
#
basedir= /usr/local/lib/mon
alertdir= alert.d
mondir= mon.d
cfbasedir= /usr/local/lib/mon/etc
dep_behavior = m
dep_recur_limit = 10
dtlogging= no
histlength = 100
logdir= /var/log/mon
maxprocs= 20
pidfile= /var/run
randstart= 60s
#####################################
# Host Groups
#
hostgroup dns 66.20.234.14 66.20.234.15
hostgroup ftp ftp linux2 craq01 archer
hostgroup nntp news
hostgroup pop3 mail
hostgroup smtp mxqmail2 mxqmail1 mail craq01
watch dns
service dns
description Check DNS services
interval 1m
monitor dns.monitor -zone speedfactory.net -master ns.speedfactory.net
period wd {Sun-Sat}
alert mail.alert moshe@speedfactory.net
upalert mail.alert moshe@speedfactory.net
alertevery 30m
watch ftp
service ftp
description Check FTP servers
interval 5m
monitor ftp.monitor -p 21 -t 20
period wd {Sun-Sat}
alert mail.alert moshe@speedfactory.net
upalert mail.alert moshe@speedfactory.net
alertafter 2
watch nntp
service nntp
description Check that news server is up
interval 1m
monitor nntp.monitor -p 119
period wd {Sun-Sat}
alert mail.alert moshe@speedfactory.net
upalert mail.alert moshe@speedfactory.net
alertevery 30m
watch pop3
service pop3
description Check that the pop3 server is working
interval 1m
monitor pop3.monitor -p 110 -t 20
period wd {Sun-Sat}
alert mail.alert moshe@speedfactory.net
upalert mail.alert moshe@speedfactory.net
alertevery 30m
watch smtp
service smtp
description Check mail sending
interval 1m
monitor smtp.monitor -p 25 -t 20
period wd {Sun-Sat}
alert mail.alert moshe@speedfactory.net
upalert mail.alert moshe@speedfactory.net
alertevery 30m
alertafter 2
watch routers
service ping
description Ping our routers
interval 1m
monitor fping.monitor
period wd {Sun-Sat}
alert mail.alert moshe@speedfactory.net
upalert mail.alert moshe@speedfactory.net
alertevery 15m
alertafter 2
6. Resources
April 9, 2003 / Kurt Nelson / 0 Comments
Table of Contents
1. Why encrypt?
As more and more communication moves into the digital realm, privacy tends to erode in favor of convenience, even though the expectation of privacy often remains. Just as one does not expect a physical letter to be read by every postal worker through whose hands it passes, an email isn’t expected to be read before it reaches its destination. Unfortunately, this often is not the case. In the United States efforts are continually being made to remove privacy from electronic mail in the name of crime prevention through devices such as Carnivore and laws like the Patriot Act. Even for those with nothing in particular to hide, privacy in conventional email is quickly dwindling. Encryption, even if privacy and security are not entirely necessary, helps privacy to seem normal again.
A form of encryption, known as signatures, can be useful even for times when the information itself is not private by confirming the sender of a message. Signatures are also often used in software distribution to verify the integrity of an archive.
2. A little history
PGP (Pretty Good Privacy) was written by Phil Zimmermann, with the original intent of aiding political activists and human rights organizations in keeping their communications away from the eyes of government organizations. In 1991, before PGP was yet released, Senate Bill 266 was drafted and included a measure that would require all telecommunication companies to provide the government access to any communication in unencrypted form, which would have effectively made software like PGP illegal. In order to subvert this measure before it became law, PGP was rushed out the door, and quickly posted to several USENET groups and BBSs across the country, and from there it became the most widely used encryption tool all over the world.
Since PGP used encryption that the government classified as ‘military strength’, and since it was obviously in use outside of the United States, Phil Zimmermann was charged with violating restrictions on the export of cryptography software. In order to protect the continued availability of PGP, the source was published in book form in 1995 (ISBN 0-262-24039-4). The case was dropped in 1996.
In 1997 the Free Software Foundation, liking things to be more free than the average free thing, released the GNU Privacy Guard (GnuPG), which was intended to be a completely free (GPL) implementation of the OpenPGP standard (RFC 2440) and use no patented algorithms.
3. Overview
GnuPG uses public key cryptography in the encryption of its messages. In public key cryptography, every key is actually a key-pair, the ‘public’ key, which is distributed to the world, and ‘private’ key, which is kept private. Messages are encrypted using the recipient’s public key, and can then only be decrypted using the private key. Since the public key does not need to be kept secure, the need for a secure channel to exchange keys is eliminated, but the need for some means to verify keys remains.
So, let’s say that Jim wants to send an email to his friend John, who lives down the hall, but he wants to encrypt it, so that his roommate, Bill, won’t be able to read it. After typing his email, he encrypts it using John’s public key, and then sends it on its merry way. John, upon receiving this email, decrypts it using his private key, and then reads whatever dumb thing Jim had to say.
In this example, Bill would be unable to read the email without walking down the hall and looking over John’s shoulder, but there is no way to verify that Jim was the one who sent the email. Since everyone should be able to get a copy of John’s public key, Bill could have written an email pretending to be Jim, encrypted it, and sent it to John. A solution to this problem is signatures. After Jim types his email, instead of encrypting it right away, he creates a one-way digest of the email, encrypts this digest with his private key, appends the encrypted block to the end of the email, encrypts the whole thing with John’s public key, and then hands it off to the carrier pigeons. The signature block can by decrypted with Jim’s public key, and then the digests can be compared to ensure that the signature matches the message. Since only Jim has a copy of his private key, only Jim can create this signature, so it ensures the identity of the sender.
In this example, key verification is simple, Jim and John both know each other and can meet in person to exchange keys. But suppose there’s another person at the far end of the hall, Bob, and neither Jim nor Bob wants to walk all the way down the hall to the other. However, they both know John, and have exchanged keys with John. Instead of leaving their computers and moving, they can use John to verify keys. After exchanging keys with both Jim and Bob, John signed them using his key. So now Bob can simply email a copy of his public key with John’s signature to Jim, and Jim can verify the key using the signature and John’s public key. If Bill were to intercept the email and send a fake key to Jim instead, then the signature would be missing.
So now Jim has a copy of Bob’s public key that he knows to be valid, even though he has never met Bob in person. Jim can now add his signature to Bob’s key, effectively telling all people who trust Jim that this is a valid copy of Bob’s public key. If Jim trusts Bob to have the sense to securely verify other keys, than Jim can also use Bob’s key to verify more public keys that Bob claims are valid. Using this method, a web of trust can be built so that a large number of people can exchange keys securely without going through the trouble of actually meeting each other.
4. How to get started
Note that gpg should be installed setuid root. Gpg uses the root capabilities to mlock pages that contain unencrypted private keys, to prevent private keys from being stored unencrypted on your hard drive.
So you’ve installed a copy of GnuPG and you’re ready to get started. First you’ll need a key. In order to be difficult to guess, gpg needs a big pile of random data to use in the key generation. In linux, this data comes from /dev/random, which may empty during key generation. This may be a good time for a kernel compile and a game of pysol.
To generate a keypair, run
gpg --gen-key
gpg will then ask you several questions about what sort of key you want. The default type and keysize are most likely what you want. Larger keys may be more secure, since they should be more difficult to guess, but are more often simply a waste, since after 2048 bits or so, the digest and encryption algorithms become the weaker links.
Once you’ve configured your name and email address to use in the key id, gpg will ask you for a passphrase. A passphrase is sort of like a password, but longer. Your private key will be encrypted using this passphrase, so you want it to be secure, but not so long that typing it would become a nuisance.
So now you have a keypair. An easy way to make it quickly available to the world is through the magic of keyservers. There are a wide variety of keyservers to select from, most of which mirror each other. I use pgp.mit.edu. To upload your key to a keyserver, run
gpg --keyserver pgp.mit.edu --send-keys "your name"
If you prefer to send your key by hand, you can use gpg --armor --export
instead to have your key dumped to stdout. The –armor, for ASCII armored output, is important. ‘Armor’ is a bit misleading, since it isn’t really any more secure, it’s just uuencoded. Without it you’ll end up with a big wad of binary data dumped to your terminal.
Keyservers are a convenient way to send and receive and keys, but still no substitute for verification. Keyservers do not authenticate or verify keys, leaving this task up to the users. They do save the trouble of carrying a whole key around in your wallet to trade with people, since now other people have something that they think is your key, and that can be verified using the fingerprint. The fingerprint is a 160bit secure one-way digest of the key, and a bit more manageable than a 1024bit random number.
5. Encrypting and signing your email
Many email clients, such as mutt, include support to encrypt, decode, sign, and verify messages using gpg. Mutt is not configured to use gpg by default, but comes with a gpg.rc example file which contains the needed options.
If properly configured, your email client be able to encrypt and sign emails before sending (‘P’ in mutt) and decrypt and verify incoming email, asking for a a passphrase as needed. Basically the only thing to consider are how long, or if at all, the client should cache a passphrase if it is capable of doing so, and, if a copy of outgoing email is kept, whether it should be saved in encrypted form. With mutt, the fcc_clear option will save the email unencrypted and unsigned. Encrypted emails can be saved, but you will need to add yourself to the recipient list if you ever plan to read it again. This can be done either by adding –encrypt-to name> to the gpg encrypt command, or by placing the same, without dashes, in ~/.gnupg/options. Other long options can also be automatically used in this way.
6. Encrypting and signing other random bits of your hard drive
Encryption can be done using
gpg --encrypt --recipient name file
Multiple recipients can be given. gpg will then spit out file.gpg, or file.asc if you requested ASCII output.
Signatures can be created either as a separate file, or as part of the data itself. Detached signatures are usually more useful, since the original data is available without using gpg, and can be created with
gpg --detach-sign
Inline signatures can be created with --sign
(the data and signature is saved to file.gpg or file.asc), and may be combined with --encrypt
.
Decrypting and verifying usually requires no options, since gpg can figure out what to do from the file.
7. Key management
Most key management can be done from the interactive key edit menu, accessible through
gpg --edit name
Since it’s important to know how useful a key is in verifying other keys, gpg keeps a trust database that contains information on how much you trust other users to correctly sign keys (the "ownertrust" value), and from this calculates a key’s validity. You should only sign a key yourself if you are absolutely certain of the validity of the owner, so many of the keys that you consider valid for your own security needs may not be signed by you.
gpg uses four degrees of ownertrust, not at all, marginally, fully, and ultimately. Multiple marginally trusted are usually needed to consider a key valid (default is 3, tunable with --marginals-needed
), and by default only one completely trusted signature is required (tunable with --completes-needed
). Ultimately trusted keys are always considered valid signatures, and usually used only for your own key.
If a key becomes compromised, you need to revoke it, using a revocation certificate. The certificate can be generated using
gpg --gen-revoke name
which creates a certificate signed with the private key. This certificate can then be applied to the public key and redistributed, telling everyone that the key is no longer valid.
You may want to generate this certificate after creating the key and keep it in a safe place, in case you lose your private key or forget your passphrase. Care should be taken in the security of the revocation certificate, since anyone who gets a copy of it can invalidate your public key.
8. Resources
February 23, 2003 / Kurt Nelson / 0 Comments
Table of Contents
1. Introduction
Setting up and hosting your own domain name is easy! This document will
hopefully give you all the information you need to set one up using BIND 9.
2. The Domain Name System
If you already know how DNS works, you can skip this section. It is intended
as a brief introduction to the complex behavior of the DNS system in general.
When you perform a DNS query, you are attempting to find the/an IP address
associated with a particular domain name. You as an end user generally have
two DNS servers that you always use to perform lookups. These are generally
run by either you or your Internet access provider, and in most Unix-like
platforms, are listed in /etc/resolv.conf
.
When you send a query to your local nameserver, it first looks to see if it is
hosting the domain about which you’re requesting information. If it is, it
simply looks up the information locally and sends the response to you.
If it doesn’t host that domain, it looks to see if it has recently answered a
query for the same record as you are requesting. If it has, and the
Time-To-Live on the cached record data has not yet expired (meaning the record
is still current enough to be accurate), it then just returns the answer from
its cached data.
If it doesn’t have the data cached, and it is set up as a recursive nameserver
(one that will perform the subsequent requests on your behalf instead of
making you do it), it must fetch an authoritative answer. Since it is
impossible to know what the authoritative server is for a particular zone,
that information must be maintained centrally. The servers that store this
information are known as the "root" nameservers. There are currently 13 of
them, with DNS names of A.ROOT-SERVERS.NET through M.ROOT-SERVERS.NET. If
these servers were all to go down, the Internet would become practically
unusable.
Once your local DNS server has queried a root nameserver for the requested
domain’s authority information, it is given a (short) list of servers that
will be able to return authoritative data for the domain under which your
requested record resides.
It then sends a query to the authoritative servers in turn until it receives
an answer. It then relays the answer to you.
The highly distributed nature of DNS can cause many headaches when you need to
make any changes to a domain’s authoritative servers or any of its records.
Changes can take days to propagate, and knowing how to minimize your downtime
is very important.
3. DNS Hosting Requirements
If you wish to host your own domain name, there are some prerequisites that
must be satisfied.
First, your DNS servers must have completely static IP addresses. Any change
to your IP address will make your domain name completely useless.
Second, you must have one server to function as the primary nameserver, and at
least one separate machine to host the secondary nameserver(s). These are the
servers that will answer DNS queries for your domain in the event that your
primary DNS server is unavailable.
Usually, one of your friends will be willing to be the secondary nameserver
for your domain in return for the same from you. If you don’t have any
friends, you can use www.secondary.com which will be your backup name server
for free.
4. Setting it all up
The steps involved in setting up your own domain name are as follows:
4.1. Register your domain name with a domain name registrar.
You can use any registrar such as http://netsol.com/,
http://godaddy.com/,
http://gandi.net/, or http://joker.com/. The domain name will cost you
between $8 and $20 per year depending on where you register it.
Some features to look for with a registrar are the ability to make ALL
changes online, especially your administrative and technical contact
information and the name servers registered as authoritative for your
domain.
4.2. Register your DNS servers with your registrar, if necessary.
Many registrars require that you register your name servers’ IP addresses
with them before they allow you to use them for your domain. Follow the
instructions on their website to set these up.
You may need to have the DNS server already running on the IP address
before registering it with them, but in most cases, you can just do it
right now. If they request a name for the DNS server, you should use
something like ns1.yourdomain.com
and
ns2.yourdomain.com
.
4.3. Register your DNS servers as authoritative for your domain.
Once you’ve registered the DNS servers, you will need to edit your domain
information and change the name servers listed on there to
ns1.yourdomain.com
and ns2.yourdomain.com
(assuming that’s what you named
them).
It will now take between 24 and 48 hours for this new domain to be
propagated to the root nameservers, so you will only be able to use it
from your own machine for a while, provided you have set your own
machine’s primary DNS server to 127.0.0.1 in /etc/resolv.conf
.
4.4. Install BIND on your host(s) and perform the basic configuration.
I won’t go into the specifics of how to perform the basic installation
(It’s pretty much a straight ./configure && make && make install
)
You can download the BIND 9 tarball here:
http://isc.org/products/BIND/bind9.html
If you don’t like building from a tarball, you can probably find a package
of BIND made just for your Linux distribution.
4.5. Perform basic configuration
Once you have BIND installed, you will need to create /etc/named.conf
.
There are some example configurations in the ARM (see Resources section).
I will go over the basic options that you’ll need to get a more-or-less
typical setup working.
The named.conf
syntax is very rigid, but pretty straightforward. Mostly
everything is either just a statement followed by a semicolon, e.g.:
file "example.com.db";
or a block-style option, which requires a semicolon after the closing
brace, as well as after each sub-block’s closing brace, and after each
simple statement it contains:
allow-transfer {
192.168.4.14;
192.168.5.53;
};
The most basic configuration, which will probably suit most needs just
fine, starts out with an options block like this:
options {
/* Base dir; where to look when we see a relative pathname */
directory "/var/named";
/* Write a pidfile on startup so we know the pid of named */
pid-file "/var/run/named.pid";
/* Allow anyone to perform DNS queries to this server */
allow-query { any; };
/* Perform recursive queries for clients. This helps build a cache
* of DNS data, so that fewer external requests are generated in
* the future. */
recursion yes;
/* What interfaces for BIND to listen on */
listen-on { 66.23.194.234; 127.0.0.1; };
/* When a change is made to any of the zones for which we are the
* master server, notify the slaves of the update immediately */
notify yes;
};
Now we need to specify the base zone "." which is used when we need to
make external requests.
zone "." in {
type hint;
file "root.hint";
};
The file root.hint
is found in /var/named/root.hint
since we specified
/var/named
in the directory directive above. The "type hint;"
directive
indicates that this zone file is a listing of the root nameservers, which
provide a "hint" to point you to the authoritative server for a domain.
The root.hint
file looks like this. You never need to modify it:
.3600000INNSA.ROOT-SERVERS.NET.
A.ROOT-SERVERS.NET.3600000A198.41.0.4
.3600000NSB.ROOT-SERVERS.NET.
B.ROOT-SERVERS.NET.3600000A128.9.0.107
; continue in this fashion...
.3600000NSL.ROOT-SERVERS.NET.
L.ROOT-SERVERS.NET.3600000A198.32.64.12
.3600000NSM.ROOT-SERVERS.NET.
M.ROOT-SERVERS.NET.3600000A202.12.27.33
Now we need reverse resolution for the 127.* range of IP addresses. The
following zone will do it:
zone "0.0.127.in-addr.arpa" in {
type master;
file "0.0.127.in-addr.arpa";
allow-update { none; };
};
The backwards "in-addr.arpa" system is used to reverse resolve IP
addresses to domain names. It’s a long story why this is the case, but in
a nutshell, it is so that forward resolvers don’t have to be entirely
rewritten to perform reverse resolution as well.
The 0.0.127.in-addr.arpa zone file is very simple, only listing a record
for 1.0.0.127.in-addr.arpa, or 127.0.0.1 to resolve to "localhost":
$TTL86400
@INSOAlocalhost. root.localhost.(
1997022700 ; Serial
28800; Refresh
14400; Retry
3600000; Expire
86400 ); Minimum
INNSlocalhost.
1INPTRlocalhost.
I will go over the meaning of these records in more detail when we get to
the section on configuring our zone files.
Finally, let’s set up forward resolution for the "localhost" hostname:
zone "localhost" in {
type master;
file "local";
allow-update { none; };
};
The "local" file is also very simple; it maps "localhost" to 127.0.0.1:
$TTL 21600
@ IN SOA @ root.localhost. (
45
28800
14400
3600000
86400 )
@ IN NS localhost.
@ IN A 127.0.0.1
Finally, we must set up a shared key to allow the use of the "rndc" tool
to control the server. The reason it must use a key is that rndc can be
used from any host on the Internet, so we must be able to verify that it
is authorized, as well as be able to encrypt its data.
Before we can add the shared key section to the named.conf, however, we
must, run the rndc-confgen program that comes with bind. This program
generates the content for /etc/rndc.conf
for you automatically, with a
randomly generated key. Its output will look something like this:
# Start of rndc.conf
key "rndc-key" {
algorithm hmac-md5;
secret "OpxZPKpwc5vNOCsD/rz9sw==";
};
options {
default-key "rndc-key";
default-server 127.0.0.1;
default-port 953;
};
# End of rndc.conf
# Use with the following in named.conf, adjusting the allow list as needed:
# key "rndc-key" {
#algorithm hmac-md5;
#secret "OpxZPKpwc5vNOCsD/rz9sw==";
# };
#
# controls {
#inet 127.0.0.1 port 953
#allow { 127.0.0.1; } keys { "rndc-key"; };
# };
# End of named.conf
In the commented section of this file, it provides you with the exact
blocks you need to place in named.conf to allow rndc to connect to it
using this rndc.conf
. Simply copy these lines onto the end of the
named.conf
and save it and exit.
You can now verify the syntax and validity of your config file using the
named-checkconf program that comes with BIND. Make sure everything is
kosher, and you should be ready to progress to the next section! But
first, take a caffeine break 🙂
4.6. Add the new zone to your primary server’s BIND config file (named.conf).
You now need to tell BIND that you want to host the primary DNS for
mydomain.com. This is done by adding a new "zone" block to the named.conf
,
as follows:
/* New zone called mydomain.com, of type "in" (Internet) */
zone "mydomain.com" in {
/* We're the master server for this domain */
type "master";
/* Store the data in /var/named/mydomain.com.db */
file "mydomain.com.db";
/* Who will we allow to transfer the entire domain? Normally, the
* only IP addresses listed here should be those of the secondary
* DNS servers for the domain */
allow-transfer {
128.61.15.251; // ns.resnet.gatech.edu
128.177.209.27; // secondary.com
}
}
At this point, you should be done with the named.conf
on the primary DNS
server. You don’t need to do anything else to it. Run named-checkconf to
make sure the file checks out, and you should then be good to go.
4.7. Create and populate the new zone file.
Now’s the fun part. This is where you get to specify all the information
about your domain. Assuming you pointed your named.conf
at
/var/named/mydomain.com.db
for the information pertatining to mydomain.com
(for which you are listed as the primary DNS server), you should now
create this file with your favorite editor, and proceed with me.
All BIND 9 compatible zone files must start out with a default
time-to-live that will be applied to all the resource records in your
domain. The global TTL specification looks like this:
$TTL 86400
This TTL specification says that for every record in your zone that does
not have a specific TTL set for it, that the TTL should be set to 86400
seconds, or 1 day. BIND 8 did not require this statment, but BIND 9 does.
Now we begin with the resource records. Each record specifies something
about the zone, and follows this general format:
RESOURCE_NAMEPROTOTYPEVALUE(S)
The resource name is usually a hostname in your domain, or the domain name
itself. The protocol is usually IN. This is short for Internet, and
pretty much all records are of type IN. In fact, if you don’t specify a
protocol, BIND will usually assume type IN. The TYPE is usually one of
the following (there are others, but they are rarely used):
Type |
Description |
SOA |
Start of Authority, some global settings for the zone |
NS |
Specifies a Name Server for the domain |
MX |
Specifies a Mail eXchanger for the domain |
A |
Specifies an IP Address for a particular host/domain |
CNAME |
Specifies the canonical name for this "nickname" entry |
The VALUE varies depending on the TYPE. You will begin to see how this all
works as we go over examples.
The first resource record (RR) that comes in every zone file is the SOA
record. It looks something like this:
mydomain.com. IN SOA ns.mydomain.com. hostmaster.mydomain.com. (
200209272 ;serial
28800;refresh
14400;retry delay
86400;expire
21600;default_ttl
)
The resource name is "mydomain.com." since we are specifying the SOA record
for this domain, not one of its subdomains. Please note that we must
place a period after the domain name so that BIND doesn’t automatically
append "mydomain.com" to the end of it, resulting in a record for
"mydomain.com.mydomain.com". This is a common beginner’s mistake. Any
fully qualified domain name (FQDN) that is specified _anywhere_ in a zone
file must be followed by a "." or BIND will append the "origin" domain
name to it.
Note that you can also replace any instance of the zone’s base domain
("mydomain.com.") with an @ sign, so that a single zonefile could be used
for multiple zones that are copies of each other, and the @ will be
replaced with the proper domain name for each. So then the SOA record ends
up looking like this:
@IN SOA ns.mydomain.com. hostmaster.mydomain.com. (
200209272 ;serial
28800;refresh
14400;retry delay
86400;expire
21600;default_ttl
)
The two values after SOA, are, respectively, the primary DNS server name
for this domain (note the period after it), and the email address of the
domain administrator (with a "." instead of the "@").
The serial number is what keeps track of the file’s version. DNS servers
around the world attempt to cache your domain’s information, and will
retrieve updated information from your server only if the serial number
has incremented since last time it checked. Therefore, if you make
changes to your DNS zone, you *must* increment the serial number or it
will not take effect until the number of seconds in the "expire" field has
passed, or until $TTL seconds have passed, whichever comes later.
It is conventional to use some form of the date (e.g. YYYYMMDDNN, where NN
is the number of the revision for that day, so you could revise it up to
100 times in that day). Other people simply like to start their serial
number at 1, and just increment by 1 each time. It’s all a matter of
personal preference.
The refresh, retry delay, and expire are almost always good at their
defaults. The default_ttl is the amount of time that negative responses
for data from your zone should be cached. That is, if someone requests an
IP address for nonexistent.mydomain.com and it doesn’t exist, their DNS
server will cache that negative response for 21600 seconds. This means
that even if you add nonexistent.mydomain.com to your zone within 21600
seconds, they will not notice until that amount of time has passed on
their end.
Now that you’ve specified your SOA record, you need to list the
authoritative name servers for your domain. This will use the NS record
type, and it will look something like this:
@IN NS ns.mydomain.com.
IN NS ns1.secondary.com.
IN NS ns2.secondary.com.
As you can see, we didn’t need to specify the @ for each record, because
if the resource name is omitted, BIND assumes you are still referring to
the last one mentioned.
If you want to receive email at your domain, you must now specify a mail
exchanger. This is the host to which an MTA (e.g. sendmail, qmail) will
send mail when there is mail for someone@mydomain.com. The MX record
specification looks something like this:
@IN MX10mail.mydomain.com.
IN MX20mail.backupmx.com.
The number between the MX and the mail server hostnames is the preference
number. The lower numbered servers are tried first, and if they fail, mail
is sent to a backup MX — one with a higher preference number. This is so
that if the main mail server is down, mail can be sent to another mail
server that will hold the mail until the primary server comes back up.
Now you should specify an A record for the main domain name so when people
go to http://mydomain.com or ping mydomain.com, they get a valid IP
address:
@IN A128.61.48.46
At this point, you’ve taken care of the basic settings for the base
domain. Notice that we have used several hostnames we haven’t yet
defined, such as ns.mydomain.com and mailmydomain.com. We’ll need to add
those domains below.
First, however, let’s set our origin to mydomain.com. so we can be lazy
and specify only hostnames instead of full domain names:
$ORIGIN mydomain.com.
Again, notice the "." after the domain name, as always.
Now we can proceed to specify our hostnames. One rule is that all
nameserver hostnames and mail exchanger hostnames must be defined by A
records, not with CNAME records. It just creates another level of
indirection for lookups, and is against standards.
mailIN A128.61.48.46
nsIN A128.61.48.46
myboxIN A128.61.48.46
lappyIN A192.168.0.2
"mybox" is a name we’re giving to the host at 128.61.48.46. We want
128.61.48.46 to show up when you look up the name mybox.mydomain.com.
lappy.mydomain.com will resolve to 192.168.0.2, so it is really only
useful inside my network.
Most domains have www and ftp names with it. Assuming we are hosting our
web and ftp service on box1, we can just make these records CNAMEs, so
there will be less switching around of records if we ever change IP
addresses:
wwwCNAMEmybox
ftpCNAMEmybox
Okay, now say your friend Joe wants to have joe.mydomain.com point to the
same IP address as his website so he can set up some virtual hosting.
Let’s say that www.eatatjoes.com is his normal website, and he wants us to
point joe.mydomain.com there. What we can do to avoid having to change
our record every time he changes his IP address is simply create a CNAME
to point to his domain:
joeCNAMEwww.eatatjoes.com.
Again, we put a period at the end this time, since it is not a cname for
www.eatatjoes.com.mydomain.com.
Now assume that jill decides she wants to have control of
jill.mydomain.com and all its subdomains; she’s going to run her own name
server for jill.mydomain.com. What we do, then, is delegate the zone to
her with an NS record:
jillIN NSns.jillsdomain.com.
That way, requests to jill.mydomain.com and *.jill.mydomain.com will all
be redirected to her name server.
All right! It looks like everything is set up and ready to go! You can use
the named-checkzone program distributed with BIND to check the syntax of
your zone, and make any corrections necessary, and then you’ll be ready to
progress to the next step.
4.8. Start named!
Assuming you (or the named installation) created a nonprivileged user
"named" under which to run named, you’re ready to start named as follows:
named -u named
Then we set up our secondary server…
4.9. Add the new zone to your backup server’s named.conf
Now all you’ve gotta do is tell the secondary server that it’s
authoritative as a slave server for mydomain.com. Assuming they’re running
BIND 9, you can accomplish this by adding a block similar to the following
to their named.conf
:
zone "mydomain.com" in {
type slave;
file "com/mydomain";
masters {
128.61.48.46; // ns.mydomain.com
};
};
On this server, the administrator likes to arrange the domain files in a
different structure, placing them in a hierarchy where each component of
the domain name gets its own subdirectory. So if their "directory" was
specified as /etc/named
, then when their named is restarted, it will
transfer the zone from us and place it in /etc/named/com/mydomain
.
Now reload the zones on the slave server (You can use "rndc reload
") and
it should pull down the domain name from your main server.
Congratulations, you’re now hosting your own domain name! Now it’s time to
set up your mail and web servers… But I’m not going to tell you how to do
that here 🙂 That’s for another presentation!
5. Resources
February 19, 2003 / Kurt Nelson / 0 Comments
Table of Contents
1. Introduction
sendmail is one of the most popular SMTP servers available. Other popular
ones include qmail and postfix. People stand behind their choice of mail
server like their choice of editor. I have used qmail and sendmail
extensively, but definitely understand sendmail to the greatest degree.
This presentation is aimed at someone who is interested in setting up
sendmail, but doesn’t understand how it works. I hope we can get some
volunteers to do presentations on other mail servers in the future.
I plan to explain in general how the UNIX mail system is designed to
work, how you can get sendmail up and running, a walk through the
configuration files, and lastly some examples of sendmail configurations.
I am by
no means a sendmail expert, but I understand enough of it to get it
working in several scenarios.
1.1. MTA, MUA, and MDA
The UNIX mail system follows the basic UNIX design principle, that is,
each program really only does one task. The basic components of the mail
system are:
- MTA: mail transport agent (e.g., sendmail)
- MUA: mail user agent (e.g., mutt)
- MDA: mail delivery agent (e.g., procmail)
When you compose a message, you typically do so from within your MUA.
When you send the message, the MUA hands the message to the MTA. The MTA
reads the envelope and directs it to the appropriate system where it is
handled by the MDA for delivery. The MTA is a very important part of the
mail system, so this document is mostly about configuring a popular MTA:
sendmail.
2. sendmail
sendmail is one of the most popular MTA packages available. Other popular
ones include qmail and postfix. Great wars have been fought over which
one is really the best one to use, but in the end it doesn’t really
matter. This document covers sendmail, but I hope future presentations
cover the other MTA choices.
2.1. What and Where
sendmail was written by Eric Allman at UCB for the BSD UNIX operating
system. It has been ported to almost every platform in existence. Most
Linux distributions and commercial UNIX operating systems include sendmail
in one form or another. Linux distributions generally use the latest
releases from sendmail.org, while commercial distributions lag behind
since they like maintaining their own forks of the sendmail source tree.
Whatever system you’re using, if you plan on using sendmail, check to make
sure it is the latest available. If it isn’t, head over to
www.sendmail.org and download the
latest version. Documentation is
included for compiling it on your system.
2.2. How sendmail should run
In the ideal environment, sendmail runs on each machine handling local
mail transport and delivery, as well as talking to a main SMTP node. A
lot of people like this setup, since it allows for mail composition from
almost anywhere. But, unless properly configured, your mail spool ends up
on several machines. The common way to overcome the multiple mail spool
problem is to keep your mail spool on an NFS mount. In my opinion, this
is totally unnecessary. Most people agree and end up with…
2.3. How most people run sendmail
Designate a system as your mail server. Configure sendmail to deliver
mail on this system. Mail composition and reading can be handled through
hear. If remote access is desired, allow POP3 or IMAP access to mail
spools. This configuration is much more desirable than the other
scenario.
2.4. Check your sendmail version
Sendmail is usually installed as /usr/sbin/sendmail or /usr/lib/sendmail.
Some bizarre systems may even install it as /usr/etc/sendmail. If
sendmail is running on your system, check the version by connecting to the
SMTP port:
You can hit the escape sequence to drop back to the shell. Or simply type
‘quit’ and the server will close the connection.
2.5. Installation
Sendmail is pretty easy to build and install by hand. Most all Linux
distributions include it, except Debian. Check and see if you have it
already or if there is a package available. It will save time.
If you must compile and install it by hand, follow these steps:
- Download the source from ftp.sendmail.org. I reference version 8.12.7
here, which is the latest version available. Be sure to use the latest
version of sendmail available.
wget ftp://ftp.sendmail.org/pub/sendmail/sendmail.8.12.7.tar.gz
- Extract
gzip -dc sendmail.8.12.7.tar.gz | tar -xvf -
- Compile
cd sendmail-8.12.7
# Follow the steps in the INSTALL file, which walk you through compiling
# sendmail and setting up your configuration files.
2.6. Ties with procmail
The sendmail software can be combined with procmail which results in a
nice system for mail delivery on top of sendmail. Procmail is a topic for
an entirely different presentation. It’s worth noting here because if you
are installing sendmail from scratch, be sure to get procmail. You will
most certainly want it. Moshe can answer your procmail questions at all
hours of the night. That’s why he has a pager.
3. The configuration files
Sendmail reads several different configuration files to figure out what
it should be doing. These files are explained below. This list is not
complete, but covers the most common files you’re likely to encounter.
3.1. Files
3.1.1. sendmail.mc
The sendmail.mc file is your main sendmail configuration file.
Technically the program reads sendmail.cf and not sendmail.mc. Since
sendmail.cf is not modifiable by humans, we write the mc file and run it
through m4 to generate the sendmail.cf file. The configuration elements
you put in your mc are actually m4 macros that get expanded to the real
configuration elements for sendmail.
Why is it done this way? Well, sendmail was designed to be somewhat
like a machine. It doesn’t really know what to do except load sendmail.cf
and "execute" it. So think of sendmail.cf as an embedded language that
drives sendmail. It may seem silly, but that’s because it is. Sendmail
came out of an era where computing resources were much more scarce, so
saving time and making the most of what you had was important. I don’t
care at all about my sendmail.cf file. I only edit my mc file and have it
automatically generate the cf file. I think the mc file should be named
.cf, but it isn’t, so we get this layer of confusion.
3.1.2. aliases
Ever emailed a webmaster@something email address? It’s fairly common.
Try to make a user account with the name ‘webmaster’. It won’t happen,
usernames are limited to 8 characters. The way we get the webmaster
address is by using an alias. The aliases file maps email address aliases
to something, usually a real user account.
Sendmail doesn’t directly read the aliases file, it reads the
aliases.db file. This is a BerkDB format of aliases. Again, the
historical reasoning comes in to play here. Each time you modify the
aliases file, you need to run newaliases
to update the BerkDB
version.
3.1.3. access
This is a plain text file listing host access rights to the server. The
default policy of sendmail is to accept mail locally or for domains that
it is specifically configured for. However, if you get a steady stream of
spam from a specific host, consider listing that host in the access file.
For example, the entry:
co.kr ERROR:"550 Korea is a gigantic spam house; go away"
Denies all mail coming from servers on the co.kr domain. The error
message defined is returned by sendmail and typically written to the log
files on their end.
3.1.4. genericstable
This provides the outbound name to virtual address mapping, that is, the
reverse of what the virtusertable does. For a proper virtual domain
configuration, you will need to configure this file as well as the
virtusertable (described below).
This is a file that must be compiled to BerkDB format before sendmail
can read it.
3.1.5. mailertable
This file contains custom domain routing information. You may wish to
specifically route all email to addresses on the gatech.edu domain through
a different SMTP server. This is the file where you define that.
This file must be compiled in to a BerkDB file that sendmail actually
reads.
3.1.6. relay-domains
This is a plain text file that lists individual hosts or ranges of hosts
that are allowed to relay mail off your server. You’ll need this if you
want to be able to use your mail server as an SMTP server when configuring
a program like Evolution.
3.1.7. virtusertable
This file maps usernames from one hostname to a real user or another
hostname. This file is used to set up virtual domains and virtual
addresses.
This is another file that must be converted to BerkDB format before
sendmail can read it.
3.1.8. local-host-names
This file lists the domain names that you are delivering mail on. If you
own headnut.org and want to accept AND deliver mail for that domain, you
need to put headnut.org in the local-host-names file.
This file sometimes differs in name across various distributions. Red
Hat used to (or still does) call it sendmail.cw. Some simply call it
locals. The format and purpose are the same, but the name may be
different.
3.2. Inconsistencies
You may have noticed that among the sendmail configuration files there
are several inconsistencies. Namely, some files are plain text, some are
special format files (BerkDB), and some are macro processed files. This
is one of the things that bothers me about sendmail, but it’s not the end
of the world.
3.3. Message Submission Program
If you wish to have users remotely connect to your SMTP to send
outgoing email, you should consider using the message submission program
with sendmail. I won’t go in to the details here, but here is a link:
http://www.sendmail.org/m4/msp.html
With MSP, you can configure sendmail to require user logins and
passwords to connect to the SMTP server.
4. Writing a sendmail configuration file
4.1. sendmail.mc authoring
The format of the mc file is fairly simple. It is a list of m4 macros
and accompanying options. Generally there is one macro per line. Let’s
make a basic configuration file. We’ll start with the basic settings:
divert(0)
VERSIONID(`My very own sendmail.mc')
OSTYPE(linux)
DOMAIN(generic)
We now have some basic settings, but we should add some features to the
mail server. First note the use of capital letters for VERSIONID, OSTYPE,
and DOMAIN. These are the m4 macro names. The values in the parentheses
are the options for that macro. Please take a note to ask me about m4
quoting if I haven’t already explained it.
Features are added using the FEATURE macro. I’m going to add some
common features to my mc file:
FEATURE(access_db, `hash -o -TTMPF /etc/mail/access')
FEATURE(local_lmtp)
We should define a couple of settings for the configuration file.
Namely:
define(`confCW_FILE', `-o /etc/mail/local-host-names')
Lastly, we need to set some mailers for our system:
MAILER(local)
MAILER(smtp)
This configures sendmail for local mail operation and SMTP mail
operation. Now, there are a ton of other settings available for sendmail.
Rather than buying the Bat Book, I recommend you refer to the
documentation on sendmail.org:
http://www.sendmail.org/m4/readme.html
The Sendmail Consortium provides a nice HTML browsable copy of the
documentation that ships with sendmail. This is a very good reference.
4.2. Generating sendmail.cf
Generating the cf file for sendmail uses the m4 macro processor. Some
distributions provide a Makefile in /etc/mail that automatically generates
the configuration file. If you don’t have this, you’ll need to run the m4
command by hand. You run m4 and pass it the path to the sendmail macro
directory and the path to the main sendmail configuration file macro and
your new mc file. That’s a mouthful. Here’s what you do:
m4 -D_CF_DIR=/usr/share/sendmail/cf \
/usr/share/sendmail/cf/m4/cf.m4 \
sendmail.mc sendmail.cf
This assumes your sendmail m4 directory is in /usr/share/sendmail/cf.
It may be in a different location on your system. Sometimes you’ll find
it in /usr/src/sendmail. Once you run the above command, you’ll have a
ready-to-use sendmail.cf file.
5. Starting and Stopping sendmail
5.1. The queue runner and sendmail
To start the server, I run these commands:
/usr/sbin/sendmail -L sm-mta -bd -q25m
/usr/sbin/sendmail -L sm-msp-queue -Ac -q25m
This starts the sendmail MTA as well as the queue runner. Your
distribution probably includes some form of a script that runs the above
two commands in a dozen lines or so. If a script is provided in
/etc/init.d, you should use that.
To stop sendmail, I issue this command:
/sbin/killall sendmail
Sendmail reacts to signals in a normal manner and when it is sent SIGTERM
it will shut itself down.
6. Examples
6.1. Acting as a primary mail server
Acting as a primary mail server means we want to accept and deliver mail
for a specific domain. Assuming you have the proper MX records configured
according to Moshe’s BIND presentation, you’re ready to set up sendmail as
your primary nameserver.
In this case, all we need to do is add the domain to the
/etc/mail/local-host-names
configuration file and restart
sendmail. Mail will be accepted for that domain and sendmail will deliver
it on the system.
6.2. Acting as a backup mail server for your best friend
As Moshe stated in his BIND presentation, sendmail is smart enough to know
if it is a backup MX server. If it determines it is the backup server and
the domain is not listed in local-host-names, sendmail will spool the mail
and try to pass it on to the primary MX at a later time.
You can run sendmail -qf
to give sendmail a swift kick to
deliver any messages it has waiting for a primary MX.
6.3. Masquerading as another server
This idea of masquerading as another server is useful if you access the
Internet through a dialup connection. You can run sendmail locally and
compose messages locally, but sendmail will be configured to pass all
outbound messages to the server it is masquerading as. What you generally
don’t want is for sendmail to pass all mail, even local mail, to the other
server. The options below can be added to your sendmail.mc file to enable
masquerading:
FEATURE(`remote_mode')
define(`SMART_HOST', `mail.example.com')
FEATURE(`masquerade_envelope')
FEATURE(`genericstable', `hash -o /etc/mail/genericstable.db')
GENERICS_DOMAIN(`localhost mybox.example.com')
You should add localhost
and the hostname you are
masquerading as to the local-host-names file.
The genericstable file can be used to map local user names to the
remote address name.
6.4. Running mail for a virtual domain
With a virtual domain configuration, you simply populate the virtusertable
and genericstable with the mappings for the virtual address to the real
user name. Be sure to set up the hostname and usernames first.
7. Resources
There is a wealth of great sendmail information on the Interweb and
probably on your own computer.
7.1. man pages
The commands and files associated with the sendmail system
ship with good man pages. Be sure to consult these when
you’re looking for an answer.
7.2. http://www.sendmail.org/
The Sendmail Consortium (not to be confused with Sendmail, Inc.)
supports the open source releases of sendmail. They provide a
lot of documentation, FAQs, security notices, and links to
other resources.
7.3. http://www.procmail.org/
Not specifically part of this presentation, but it’s worth noting
that procmail ties in very well with the sendmail system. If you
will be using sendmail, consider using it with procmail to get a
nice mail system. Moshe is available to answer all of your
procmail questions.
7.4. Don’t Blame Sendmail
http://www.sendmail.org/tips/DontBlameSendmail.html
This page talks about problems people encounter and blame sendmail for.
Worth a look if you plan on running your own server.
February 16, 2003 / Kurt Nelson / 0 Comments
Table of Contents
1. Overview
Postfix is a mailer daemon maintained by Wietse Venema. It is an open source
project, and was created as an alternative to Sendmail. It is a drop-in
replacement for Sendmail, and includes a ‘sendmail’ wrapper that acts as
a Sendmail program, and allows programs to function with it as if it was
sendmail itself. More information, including FAQs and downloads are
available at http://www.postfix.org/.
2. Why Postfix?
Postfix was designed from the beginning up with speed in mind. Postfix is
running on systems that send over 1,000,000 unique messages a day. Postfix
is also designed differently from Sendmail. Instead of one single, monolithic
binary, Postfix is split up into several smaller programs, which run at
lowered privileges, each of which handles a specific task. There are no
setuid binaries in Postfix, and only one setgid binary. What this means is,
there are less chances for root exploit through postfix.
Postfix is also much easier to configure than Sendmail. it uses a plaintext
configuration file, (no m4 needed) and can be reloaded on the fly without
shutting down.
Postfix also supports alias, virtualhosts, and other databases in several
formats, such as plaintext (hash), berkeley DB, or even MySQL. This provides
more options when setting up mail services for systems with many users and
multiple domains.
3. Installing Postfix
The postfix install is very simple. Most distributions provide packages
for postfix, and if they don’t, simply download the tarball (latest
stable version as of this writing is 2.0.4) and unpack it. then simply run
# make
to compile it. After compilation, make sure you back up your old sendmail:
# mv /usr/sbin/sendmail /usr/sbin/sendmail.OFF
# mv /usr/bin/newaliases /usr/bin/newaliases.OFF
# mv /usr/bin/mailq /usr/bin/mailq.OFF
# chmod 755 /usr/sbin/sendmail.OFF /usr/bin/newaliases.OFF \
/usr/bin/mailq.OFF
Next, make sure you add a postfix user for postfix to run as.
Either add the line
postfix:*:12345:12345:postfix:/no/where:/no/shell
to your /etc/passwd, or run
# useradd -d /no/where -d /no/where -s /no/shell postfix
Next, add a postdrop group. Note that, no user should be a member of this
group, not even the postfix user. This is the group postfix will setgid to
when doing mail delivery.
# groupadd postdrop
Now run:
# make install
This will run an interactive install script, and prompt you where to install
things. The rest of this guide will assume you install stuff to the default
install place.
3.1. Running Postfix
To start the postfix daemon simply run:
# postfix start
Likewise, to stop it, simply:
# postfix stop
To regenerate the aliases and virtual maps databases:
# newaliases
To force reload of the config file:
# postfix reload
To have all queued mail to be flushed out of the queue and attempt delivering
it again:
# postfix flush
4. Configuration
Postfix keeps all its configuration in /etc/postfix (unless you specified
otherwise during install) The main Postfix config file is main.cf. A default
config file will be provided for you with comments. You only need to edit a
few thing and you will be ready to run Postfix.
Files:
main.cf |
Main config file for postfix |
master.cf |
config for the master postfix process, sets limits on its
child processes and functions. This is safe to leave
default. |
aliases |
this file contains the username aliases. useful for having
multiple email addresses point to the same user. |
virtualhosts |
similar to aliases, except this allows you to specify
aliases per-domain instead of just for one domain. Useful
when running multiple domains. |
4.1. main.cf configuration directive style
Directives in popstfix are in the style key = value.
String values don’t need to be quoted. You can use
the value of one directive in another, prefixed by a $.
for example:
mydestination = $myhostname, mail.$mydomain
Lists of comma-separated strings can be extended to multiple lines,
simply by ending a line with a comma.
Boolean values are yes/no.
4.2. Common configuration directives
soft_bounce |
When set to on, soft_bounce will not bounce any emails,
rather keep them in queue. this feature is great for
debugging, when you aren’t sure if your config is
correct. Make sure you turn this off when you are
done debugging, else your queue will fill up. |
queue_directory |
the directory where postfix keeps its spools, usually
/var/spool/postfix. |
mail_owner |
The user to run lowered privilege postfix processes, usually
postfix. |
myhostname |
The hostname of the mail server, given out in the motd. if
not specified, gethostname() is used to find your hostname. |
mydomain |
your local internet domain (ie if your myhostname is
mail.simplecodes.com , mydomain should be simplecodes.com |
myorigin |
the domain that is appended to mail that originates locally. |
inet_interfaces |
set to all to listen on all interfaces, set to an IP or
hostname of an interface or interfaces to bind to otherwise,
comma separated. |
mydestination |
domains to accept mail for, comma separated. |
alias_maps |
this is a file or list of files that map username aliases.
types of files include dbm, hash (plain text), or even mysl.
for example:
alias_maps = hash:/etc/postfix/aliases,dbm:/etc/aliases.db |
mail_spool_directory |
here the mail spool is kept. usually /var/spool/mail |
mailbox_command |
if specified, you can choose another delivery agent, ie
procmail. to set procmail as your delivery agent, use:
mailbox_command = /some/where/procmail |
smtpd_banner |
if specified, you can tell postfix to send an alternate
banner when you connect. ie:
smtpd_banner = $myhostname ESMTP $mail_name |
5. Spam, or how I learned to stop worrying and love the RBL
There are several ways to block spam. You can pipe your mail through a spam
filter, something like procmail or bogofilter, or use an RBL. an RBL is a
list of domains to deny mail from, usually maintained by a third party. These
parties usually list known open relays and spammers. Some charge you, but a
good bit of them are free. Postfix has a smtpd_client_restrictions directive
where you can specify options for blocking. for example:
smtpd_client_restrictions = reject_maps_rbl, reject_unknown_client
The reject_unknown_client rejects mail if the postfix server cannot determine
the client’s hostname. This will not block many spam mails, but it’ll block
some.
The reject_maps_rbl tells postfix to use an RBL list when deciding whether
or not to block a domain. you then use the maps_rbl_domains to specify what
RBL lists to use. ie:
maps_rbl_domains = sbl.spamhaus.org relays.ordb.org
an RBL list is simply a DNS server that you can query with the domain name in
question. if the domain name matches, then that domain is in the RBL list,
and postfix will deny mail from that domain. More info on RBL’s can be found
by looking at the links at the bottom of this document.
Another thing you can do is specify a smtp_sender_restrictions directive. this
directive allows you to set what sender addresses are blocked. this is
another way of blocking common spam. You can specify any number of the
following restrictions, as well as a map or maps.
The restrictions are: (taken from the postfix sample configurations)
for example:
smtpd_sender_restrictions = reject_unknown_sender_domain,
hash:/etc/postfix/access
6. Fun with lookup tables (maps)
Postfix supports a lot more than simply hash maps. You can do maps using dbm,
regular expressions (either PCRE or standard type), even mysql, or even ldap.
For example, you might want to use regular expressions in your access map.
you could tell smtpd_sender_restrictions to use
pcre:/etc/postfix/access-regexp
as its map.
Then you create an access-regexp file like this:
### file start: /etc/postfix/access-regexp
# Protect your outgoing majordomo exploders
/^(?!owner-)(.*)-outgoing@/550 Use ${1}@${2} instead
/^friend@(?!my\.domain)/550 Stick this in your pipe $0
/^.*?(kr|jp|ch)$/i
550 Asia is a big spam house, mail from .$1 is not allowed.
####file end
Sure, you could implement the pattern matching through procmail or some other
mail delivery agent, but the mail will still be accepted by the mail daemon
(which will prompt the spammer to keep sending mail at your box, since the
spammer thinks he got through. Implementing pattern matching at the smtpd
level means
7. Resources
January 7, 2003 / Kurt Nelson / 0 Comments
Table of Contents
1. Overview of Network Filesystems
Network filesystems are used to allow a machine to access another machine’s
files remotely. There are a plethora of ways to do this in Linux. The most
common way today is NFS, however many people are shifting to different methods
because of either security or administration reasons.
2. NFS
2.2. Server Setup
To set up a server, one must have the portmap and nfs-utils installed
(see resources for links). To configure your exported filesystems,
edit the file /etc/exports. for example, if your fileserver BARNEY
wanted to share folders /mnt/mp3 and /mnt/work with system JOHN, and
also share /mnt/work with systems MARY and BETTY, with betty also
having read-only access to /mnt/mp3, your exports file would look like
this:
/etc/exports:
# Shares on BARNEY
/mnt/mp3 JOHN(async,rw) BETTY(async,ro)
/mnt/work JOHN(async,rw) MARY(async,rw) BETTY(async,rw)
As you can see, you merely specify the directory you want to share,
and then follow it by the hosts you want to be able to access it (IP’s
are fine too) and the permissions for that user. Now you are ready to
share your files. start portmap, and then start the nfs service (rc
scripts are included with most distributions to do this)
2.4. Secure NFS over SSH
Okay, so that’s great, now what happens when John is on a business
trip in Paris and wants to mount his network share to grab some files
he forgot for a presentation he’s doing? It’s not very smart to set
up the NFS server to allow connections from outside IP’s, and even if
it was, it would be even stupider to actually access it, since all the
data goes in the clear, and can be inspected at any point by anyone
with half a brain. Well, John uses SSH to get into a secure shell, so
why not use it to forward nfs? Well, NFS uses UDP, and SSH can only
forward TCP, it does not know what to do with UDP datagrams. Enter
SNFS and sec_rpc. How does it work? you ask. sec_rpc basically
translates the UDP datagrams into something that SSH can forward,
and then translates them back on the other side. so you’re still
using NFS, but through a tunnel.
Here’s how you start out. on the server, you create an /etc/exports
file like so:
/etc/exports:
# SSH exports file
/mnt/mp3 localhost(async,rw)
/mnt/work localhost(async,rw)
Now that you’ve exported the filesystems to the local host, you need
to install sec_rpc on both the client and server machines. you do it
the standard autoconf way, ./configure; make; make install. so now on
BARNEY you need to be running nfsd like normal.
On John’s pc is where all the complex stuff is happening. John runs
at his PC:
# snfshost REMOTE:MOUNTPROG
Here MOUNTPROG is a six-digit number chosen between 200000 and
249999 such that both MOUNTPROG and NFSPROG=MOUNTPROG+50000 are
unassigned RPC program numbers (e.g. MOUNTPROG=201000,
NFSPROG=251000). REMOTE is the remote host name.
That will create the config file needed for the RPC numbers. now you
add a line to john’s fstab:
/etc/fstab:
LOCAL.DOMAIN:/DIR /REMOTE/DIR nfs user, noauto, hard, intr, rsize=8192, wsize=8192, mountprog=MOUNTPROG, nfsprog=NFSPROG 0 0
(sorry if that wraps weird, it should go on one line)
in that case above, MOUNTPROG and NFSPROG are the numbers you figured
out before. LOCAL.DOMAIN is john’s fully qualified domain or
localhost.
Now that John is thoroughly confused, he creates the local mount
directory, starts portmap, and now has some more weird commands to
throw at the host:
# smkdirall
# rpc_psrv -r -d /usr/local/etc/snfs/REMOTE
(where REMOTE is the remote host name)
Now john types in the root password for BARNEY, and he should get the
message: "ID String correctly read: RPC Proxy Client". Now he is
ready to mount:
# mount /REMOTE/DIR
# df
woo, you have a mount or something. And now you are thorougly
confused. As you may have noticed, a major disadvantage of SNFS is
that you need to know the host’s root password. also, you need to
have remote root SSH enabled. Not to mention, this is a messy setup.
but it works (sorta).
It is possible to set it up so that the remote
mount runs as non-root, with the correct setuid binaries, but that is
still messy. The people at SFS agree with you. the next section will
show you how to do it using SFS, a more elegant solution.
3. SFS
3.1. Background
NFS was originally developed by Sun for the purpose of mounting a disk
partition on a remote machine as if it was on a local hard drive.
this allowed for fast seamless sharing of files across a network. NFS
is an RPC service, and works through the RPC portmapper daemon. Most
unix-like systems ship with an NFS daemon included, or a NFS daemon
can be obtained, which is why it is the de-facto standard for network
file sharing.
The problem with NFS is that it is inherently insecure. The current
NFS protocol does not support any authentication, and any validation
is only done by IP, which can be easily spoofed. There is no
user-level restriction, so if anyone can exploit a client machine on a
NFS network or in some cases just plug another machine into the
network, they can gain access to the NFS server’s shares.
Linux supports NFS protocols v. 2 and 3. the implementation of NFS in
the linux kernel is partly user-space, and partly in the kernel space,
in order to keep things running fast.
SFS was created because the insecurities of NFS made it easily spoofed
and not very insecure. not to mention, if a client was connecting
with a dynamic IP, the NFS would have to change its exports as per
each time a client’s ip changed, or (very insecure) export it to an
entire IP range.
SFS version uses shared-key authentication at 1024 bits default
(higher bit keys are allowed)
3.2. Server set-up
To set up an SFS server, you create a user sfs and a group sfs, and
then get the sfs-0.6 source from www.fs.net and compile it. then you
set up your /etc/exports to export filesystems to localhost similar to
like you would SNFS:
/etc/exports:
/var/sfs/root localhost(async,rw)
/mnt/mp3 localhost(async,rw)
/mnt/work localhost(async,rw)
Then you create the file /etc/sfs/sfsrwsd_config as follows:
/etc/sfs/sfsrwsd_config:
Export /var/sfs/root /
Export /mnt/mp3 /mp3
Export /mnt/work /work
note sfs requires a ‘root’ for the server. the exports will be under
this root. you now create /var/sfs/root , /var/sfs/root/mp3 , and
/var/sfs/root/work. then chown these to the user sfs group sfs. you
generate a host key for the server like so:
# sfskey gen -P /etc/sfs/sfs_host_key
After doing this, start your nfsd, and sfssd.
4. OpenAFS
4.1. Introduction
AFS was pioneered at Carnegie Mellon University, supported and
developed by TransArc corporation (now owned by IBM). AFS, or the
Andrew File System, is a distributed filesystem. IBM in 2001 branched
out AFS and created OpenAFS, an open source version of AFS. OpenAFS
is unique among this bunch in that it works on windows systems as well
as Linux.
Because openAFS is a distributed filesystem, it allows for high
availability. a cluster of servers can mirror the same AFS cell. Any
client will not know which server they are connected to, and when they
write to a file, the changes are propagated to the other servers in
the cell. if any server goes down, the cell can continue to operate.
4.2. Server setup
Gentoo has a good guide on how to set up OpenAFS. I will put up my own
guide to OpenAFS setup soon.
4.3. Client setup
To be a NFS client, you don’t need to edit any config files. all you
need to do is start portmap, and then provided you have the nfs tools
installed, simply mount the partitions. Let’s say JOHN from the
previous example wanted to mount the mp3 and work folders now. he
simply runs at the command line:
# mount -t nfs BARNEY:/mnt/mp3 /mnt/mp3
# mount -t nfs BARNEY:/mnt/work /mnt/work
It is very simple to mount NFS partitions. because no further
authentication is needed, on a fixed network with fixed hosts, one
can simply set a bootup script to mount these on startup.
Now John wants to connect to sfs. Here’s what he does: on the server
BARNEY (not his local client box) john creates a key from the shell
like this: sfskey register . he can do this from an ssh shell if he
wants to. it’s a one-time process. Now what john does from paris
is, he starts portmap and sfscd from his client box. Now as non-root,
he runs:
# sfsagent john@barney.office.domain.com
sfs will not read your /etc/hosts file, it will totally ignore it (to
prevent spoofing, only a globally-verifiable DNS name will be
accepted) now john logs in with the passcode he put on his key, and
now he is into barney. he now proceeds to access his share:
# cd /sfs/barney.office.domain.com/
# ls
work/
mp3/
All his files will be read over a 1024-bit secure encrypted channel.
SFS is probably the best way for sharing files over an insecure
network, because it uses the tried-and-true nfs protocol, and once a
key is made, john can log in from anywhere.
Check for the next version of this howto to learn about OpenAFS client
setup.
5. Coda
Coda is another distributed filesystem from carnegie-mellon based on
AFS. Coda was open source from the beginning however, and it has been
available in the Linux Kernel since 2.2. Coda has one important
feature which NFS and OpenAFS do not, and that is disconnected
operation. disconnected operation allows a client to disconnect from
the network, and continue to access and modify files. when the client
reconnects to the network, that user’s changes are propagated back to
the coda cell. This is excellent for laptops, because then a user can
take the laptop away, modify his own home directory files, and then
put it back on the network and have all the files be resynched with
the global fileserver. Coda does this by locally caching files.
Because of aggressive local caching, Coda is also faster than OpenAFS.
One downside of all these features is that, since Coda needs to keep
so much metadata on the filesystems, it needs its own raw partition
for data storage.
Check back for the next version of this Howto for more info on setting
up coda.
6. Intermezzo
Intermezzo is another distributed file system, loosely based on coda.
Like coda, it allows disconnected operation. Intermezzo is meant to
be lightweight. It also needs a dedicated raw partition, which can be
any of ext2/3, xfs, or reiserfs. The intermezzo protocol is based on
HTTP, and it can use apache or any cgi-supporting http daemon for
synchronization.
Intermezzo is still in its early stages. However, linus made sure it
was included in the 2.4 kernel starting with 2.4.14, just before he
went off to work on the 2.5 kernel. Intermezzo is still in early
beta, but it shows a lot of promise.
In my tests with Intermezzo, i found it is a little flaky in its
setup, and the setup process is not very well documented. Whatever
documentation exists is inconsistent with the most recent version.
7. Summmation
There are many alternatives to NFS. NFS is a very good protocol, and
it has been tried and tested, but it is too insecure to be used over
wide area networks, unless encapsulated inside a VPN tunnel, or
another more secure protocol such as SNFS or SFS is used. Also, NFS
is limited in that a volume can only be exported by one server, so
that you cannot distribute the load across multiple servers or
implement failover. Coda, Intermezzo, and OpenAFS aim to leverage
that by distributing the filesystem. As of now, Coda looks to be the
best choice for a distributed FS, although IBM is working hard to
improve OpenAFS. Intermezzo is a little too unstable to be considered
usable, and not enough documentation exists.
Which one you choose is dependent on how you want to do things. If
you have a home directory you want to access when on the road and not
connected, then a caching system like Coda is probably for you.
However, if you simply want to access files from two connected systems
over the net (over Resnet or LAWN for example), you probably want to
use SFS as SNFS is a little tricky to setup, and SFS allows connection
from anywhere with the same public key using strong encryption.
8. Resources
November 16, 2002 / Kurt Nelson / 0 Comments
Table of Contents
2. Security Rant
2.1. Common security paradigm
Broadly, computer security is often divided into two fields: host
security, and network security. Host security concerns the integrity
of an individual host, which is a single node on a network. Network
security focuses much more on the integrity of an entire network and
analysis from this point of view. This is a useful paradigm for
constructing a secure computing environment. Secure each host, then
secure the whole network.
For Linux machine, the risks to both host and network security are
great. These risks stem from several root causes. One of the largest
factors is that since the source code for most software on a Linux system
is freely available, would-be attackers are free to analyze it and
locate security holes.
2.2. The Hackers
There are several types of people who carry out this type of analysis.
Some of them are willing to share their findings while many others are
not. Those who share their security related discoveries are often
called "white hat" hackers. Those who do not share their findings
then fall into two categories, the "grey hats" and the "black hats".
These metaphorical hats denote the ethical stance of the people in
question. Never assume that the "White hats" have found all of the
holes.
2.3. Security == Risk Management
For the end administrator, it is important to know that there are a
huge number of possible ways to attack a system and you should never
assume that a given piece of software is completely safe. Measures
should always be taken to mitigate risk. Security is ultimately about
risk management, and that’s how it should be approached.
3. Advanced uses of SSH
3.2. Port Forwarding
SSH port forwarding is a very powerful tool for securely
connecting two hosts. This can be done in several ways.
Relevant excerpt from man page:
ssh [-L port:host:hostport] [-R port:host:hostport] [-D port]
hostname | user@hostname
1) Local port is forwarded to remote port (static) Use ‘-L’ 2)
Remote port is forwarded to Local port (static) Use ‘-R’ 3)
Local port is forwarded to remote port (dynamic, specific to
application) use ‘-D’ (this is new)
The first two are the most common. The third type of forward
is specific to an application protocol, and as of OpenSSH
3.1p1 it only works as a SOCKS4 proxy.
Note that in the syntax for the ‘-L’ and ‘-R’ options, the
"host" entry is a hostname or IP, that is relative to the
machine you are actually connecting to. You are actually
connecting to a machine where you have an account to set up
the forward, and the "host" is contacted from the server you
are connecting to. This is probably the most confusing part
of the port forwarding scheme, and hopefully the examples
below will clear it up.
Examples:
1) My computer is outside of Georgia Tech and I want to connect to the
news server using the NNTP Protocol. The NNTP protocol uses port
119 and the news server is news.gatech.edu. The news server only
accepts connections from hosts inside the GT network, so I need a
port forward from some machine inside the network. I want to
connect to my local machine’s port 9999 and have it forward to
I have an account on acme.gatech.edu, so I will use it to do the
forward to news.gatech.edu. Note that I do not have an account on
news.gatech.edu.
Command Line:
ssh -L 9999:news.gatech.edu:119 gt1234a@acme.gatech.edu -N -C
The ‘-N’ option tells OpenSSH not to execute a remote command, so i
can just background ssh (ctrl-Z) after I authenticate. The ‘-C’
option tells OpenSSH to compress the data (very useful for X11
forwarding, too).
Alternatively, I could have used local port 119 (if I were root)
and then simply told my news-reader the server is "localhost" and
it would happily connect to localhost:119 but in reality connect to
news.gatech.edu. Kinda tricky, eh?
2) I have a local network of a few machines, and I am running a
webserver that only my internal network can see (it’s behind a
firewall). But, I want to be able to use that webserver when I’m
away from the home network in another office on a separate network.
So, here we set up a similar forward, but now we shall initiate the
forward from the server, instead. This time we will forward a
remote port (on another private network) to our local webserver
port. Our webserver machine is running a server on port 8080.
Say we want to forward a port on "workstation1.company.net" to our
internal server. In this case we must have an account on the
machine in question, "workstation1".
Command Line:
ssh -R 1234:localhost:8080 tunnel@workstation1.company.net -N -C
So, when I’m logged into workstation1 at the other office, I point
my web browser to http://localhost:1234/ and I can access the
webserver in my home office just like I was there. Magic.
You might be wondering why I used "localhost" as the hostname in
the ssh command. This is because I was forwarding a port to the
local machine. I could have also forwarded the remote port to
another machine’s server in the network. Now that’s getting
complicated 
You can verify that the tunnels are in place using the netstat
command to examine which ports are open and what IP they are bound
to. OpenSSH always binds to 127.0.0.1 so no one else from another
host can abuse your tunnel.
4. Virtual Private Networks
4.1. Intro
Security is a monstrously huge field and so to impart knowledge, I
must divide and conquer. In this presentation I will cover advanced
usage of OpenSSH and the rudimentary basics of VPNs under Linux and a
small rant about security.
The Secure Shell Protocol is a modern, advanced standard for securely
connecting multiple hosts. For the morbidly curious it is defined in
a RFC (Request For Comment); use Google to find it. The protocol is
designed to do much more than simply substitute for telnet or rsh. It
is a highly layered, configurable communication system which can be
used to connect hosts with several different types of communication
channels.
Examples include:
- X11 forwarding
- SSH port forwarding (tunnels)
I must also point out that OpenSSH is itself open to several forms of
attack, and I encourage administrators to restrict the hosts that are
allowed into your boxen. Other security professionals have
recommended the use of the commercial SSH package instead.
For further information on how to use SSH, there are two excellent
articles on the LUG web server (see Links section below) SSH keys are
particularly useful to me, since I use many systems on a daily basis.
What is a VPN? Broad Definition: (my own) A VPN is a way to
securely connect two or more physically separate networks over the
Internet.
This means that we have a "virtual" network that is comprised of
the two separate network, and nobody in between the networks can
understand the VPN traffic.
Why is it useful?
VPNs are usually used in business settings where sensitive data
must be transmitted between different business locations that are
often very far away physically. It offers a more general solution
than simply using SSH because they encrypt *all* IP traffic, and
they are totally transparent to the end hosts. The burden of
encryption is moved to the VPN gateways, which handle all of the
details of security.
4.2. VPN technologies
There are two major categories of VPN technologies in place today:
- IPSec (part of IPv6)
- non standardized: CIPE, vpnd, etc
IPSec is a standardized VPN technology for IP networks. It is a
part of IPv6 and must be included in any IPv6 implementation and
it is optionally available for IPv4 implementations as well.
There are other methods of connected networks using encryption
that can also be called VPNs, but I will stick to IPSec as it is
most common and a real standard though rather complex.
It would be a waste of space to try and describe the IPSec
protocols in this document, you can read the relavent RFCs/Books
for that. I can describe IPSec as "just another layer" of IP that
adds encryption and uses UDP for authentication. It is very
flexible like SSH and there are many implementations that are very
different in what they provide.
There are two important pieces to IPsec:
- Authentication – proving the identities of the hosts
- Encryption – agreement on the method, key exchange,etc.
VPNs are an extremely complex technology, which is why they are
probably mostly used in business settings where they are needed
most. They can be rather expensive (both money and time) to set
up and maintain. When they break, it is often a nightmare to
debug them. I’m just dripping with optimism, aren’t I? 🙂
4.3. Linux Free S/WAN
For Linux, the foremost software is called Free S/WAN which is
named after a commercial product (Secure WAN). It is a free IPv4
IPSec implementation for Linux Kernel 2.4.x. It comes in one
package with two parts:
1) Kernel Level Support
- requires patching
- requires GMP (GNU MultiPrecision Arithmetic Library)
- Creates IPSec networking modules
- Creates new ipsecX interfaces that correspond to physical
interfaces
2) Proprietary user space tools
- scripts that support SYSV style init
- userspace tools to augment kernel modules
- Can start/stop/reload IPsec
- Supports various levels of logging
- Pluto – Name of the authentication daemon – UDP port 500
- /etc/ipsec.conf is the main config file
To Install it, download the tarball from the site, configure a
Linux kernel source tree, and use the targets from the freeswan
makefile to build your kernel. The Freeswan makefile will also
install the userspace tools for you. I’m not sure if any major
distributiors have created packages of them yet.
The configuration of FreeSwan is rather difficult, ask anyone’s
who has tried it. There are several examples on their online
documentation. But to understand what is going on, the
adminstrator needs a very good understanding of IP networking and
of the IPSec protocols.
Interoperability with other VPNs
- Shared Key
- RSA
- X.509 certs
- difficulties
Unfortunately, it is often very hard to get two different VPN
products which supposedly speak the same language (IPSec) to talk
to each other. Linux Freeswan for example, does not (by default)
support X.509 certificates, which is the most common method of
authentication for commercial products. There is a patch for
support, but it is, again, tricky to get it to work.
IPSec does support a lowest-common-demoninator form of
authentication called shared key which is just that, a shared key
between the two VPN hosts. FreeSwan also supports RSA
authentication, though I haven’t seen any commercial products
which support that method.
4.4. Stability, bugs
In my experiences with some of the earlier versions of FreeSwan,
I have encountered many bugs and problems in the code. Not to
say that it does not work, it certainly does, but be aware that
it is still very much a work in progress.
November 16, 2002 / Kurt Nelson / 0 Comments
Table of Contents
1. What is LDAP?
LDAP stands for "Lightweight Directory Access Protocol". It is a TCP/IP
implementation of the X.500 DAP/OSI protocol.
Note: X.500 = DAP (DAP is just an older, non-standard name).
A Directory is just a database that usually follows these properties:
- designed for reading more than writing
- offers a static view of the data
- simple updates without transactions
A Directory Service adds a network protocol used to access the
directory, on top of the above. We’ve all used a directory service in
the past day: DNS!
LDAP is defined by RFC 1777 (http://www.ietf.org/rfc/rfc1777.txt). Some
common points of the standard are:
- a network protocol for accessing information in the directory
- an information model defining the form and character of the
information
- a namespace defining how information is referenced and organized
- an emerging distributed operation model defining how data may be
distributed and referenced
- designed-in extensibility
2. What good is LDAP?
A Directory holds information. It doesn’t matter what type: text,
photos, urls, pointers to whatever, binary data, public key
certificates, etc. (Note here that the particular LDAP server you use
may have limitations.)
There are different contexts for a Directory (and Directory Service).
- LOCAL – only for a subset of machines/users/etc.
- GLOBAL – can be accessed by anyone
LDAP is a vendor-independent, platform-independent protocol…this means
interconnection is easy! (The Internet, for instance.) Also because of
this same reason, translating from LDAP to another protocol/system is
easy.
Currently existing gateways:
- LDAP to X.500 (and vice versa)
- HTTP to LDAP
- WHOIS++ to LDAP
- FINGER to LDAP
- E-mail to LDAP
- ODBC to LDAP
- and more!!!
Concrete example:
Address books usually use LDAP to store the book on a centralized
server and then pull down the information when requested. Netscape
Communicator uses this model. (Microsoft Exchange/Outlook does
something similar, but Microsoft hacks the protocol some.)
When the user pulls up his/her address book, the request is sent to
the LDAP server. This server then returns each entry in the book in
a standard format, similar to using XML.
3. Schemas
The Directory is actually a distributed, tree-like structure. Every
entry in the directory has a distinguished name (DN) which uniquely
identifies that entry. The DN can be generated by concatenating the
relative distinguished names (RDNs) of entries higher up in the tree.
ROOT
|
---------------------------------
||
C=USC=GB
--------------------
||
O=MITO=GT
-----------------
||
OU=ClassesOU=Clubs
-------------------
|||
CN=LUGCN=LAXCN=Ultimate
If you notice, the RDNs are all of the form parameter>=value>. The
idea behind a schema is related in the following flow chart (read it
like a CFG):
root := root country
| root locality
| root organization
| (epsilon).
country := locality
| organization.
locality := organizational_unit.
organization := organizational_unit.
organizational_unit := organizational_unit container
| (epsilon).
[Here a container is the base object, holding extremely specific]
[data, like a person's name, a department's budget, etc.]
A really good reference for learning about schemas for use in LDAP can
be found at:
http://homes.ukoln.ac.uk/~lisap/ccsap/Directory/Docs/prep.html.
4. Using LDAP in your shtuff
There is no "way" to use LDAP. It’s more of a methodology:
http://www.stanford.edu/~hodges/talks/mactivity.ldap.97/deplconsid1.html
Each language usually has its own hooks into LDAP.
C has a whole API suite.
Java uses the Java Naming and Directory Interface
(http://java.sun.com/products/jndi/).
A good step-by-step HOWTO can be found in Chapter 4 of IBM’s Redbook:
http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg244986.pdf. It uses
the C API to walk through accessing a LDAP server.
5. OpenSource Projects
OpenLDAP is perhaps the best known due to naming popularity and
similarities. The project consists of a stand-alone LDAP server, a
replication server, and client-application libraries. The latest
version is 2.1.8 as of this writing.
Installing and using OpenLDAP is fairly straight-forward. There is a
great online/HTML HOW-TO available from OpenLDAP’s site:
http://www.openldap.org/doc/
6. Other OS Projects
7. Resources