Navigation:

Search



Our Friends

Articles Server Monitoring With Mon
 

Server Monitoring With Mon

This was written by Moshe Jacobson and given on Thu May 08 2003.

Table of Contents


1. Introduction

This is a quick and dirty introduction to Mon, a server monitoring tool that can be used to monitor any number of services running on any number of servers. Mon is useful for a system administrator who needs to be notified as soon as a server or network resource goes down, so that he can respond immediately and have as little downtime as possible.

Mon's strategy is to be highly modular, thereby allowing you to write whatever monitors (programs that check a service's status) and alerts (programs that let you know when a service has gone down or come up) tickle your fancy.

Don't worry if you don't want to or know how to write such scripts, though, because the likelihood is that someone has already contributed the monitor or alert script you're looking for. For the purposes of this introduction, I will assume we don't need to write our own. Otherwise I'd need to spend more than 30 minutes writing this presentation! Besides, if you're at that level already, you can just read the mon manpage and figure it out for yourself.

Mon can also listen as a server daemon on a particular port, which allows other computers running some form of the mon client to contact the mon server when something running on it is down, so that mon can do whatever it has to do to let you know. This feature is called event trapping (or "traps" for short). This is also beyond the scope of this presentation, but is not too difficult to implement.

2. Getting Mon

The first thing you'll need to do is download mon. You can find it at:

http://www.kernel.org/software/mon/

Once you have gotten the mon tarball, follow these steps:

  1. untar it to /usr/local/lib/mon (trust me, it kind of assumes that you will be putting it there).
  2. Move the contents of the etc/ directory into /etc/mon
  3. mkdir /var/state/mon for the mon state information
  4. touch /var/state/mon/disabled

Now you'll want to download all the user contributed programs. There are the

Download them all into /usr/local/lib/mon and untar them into the directories they untar into. Easy enough. Anyway, once they're untarred, and you're still in /usr/local/lib/mon, you should move the resulting files as follows:

# mv monitors/*/* mon.d
# mv alerts/*/* alert.d

3. Configuring Mon

Now you're ready to copy /etc/mon/example.cf to /etc/mon/mon.cf and edit /etc/mon/mon.cf. This file is laid out as follows:

  1. Global options
  2. Hostgroup definitions (assigning names to sets of hosts)
  3. Watch definitions (defining what will be monitored for each host group)

Each watch definition consists of any number of service definitions. A service definition defines one service type that you will be checking on the current host group.

Each service definition consists of:

  1. Various service options, including the frequency with which to check, the monitor program to use for the check, and a description string for the service
  2. One or more period definitions that dictate how to behave if the monitors fail during various times of the day or week

Each of these period definitions consists of various options such as what alert and upalert programs to use, and with what options, as well as options that dictate how frequently to notify you if the service remains down, or how many failures must occur before the alert is sent.

The cool thing is that instead of using /etc/mon/mon.cf, you can call it /etc/mon/mon.m4 (and make sure to start mon with the "-c /etc/mon/mon.m4" option), and mon processes the file with m4 before processing the mon directives in the file. This can be useful for DEFINEs, so you can keep from having to write the same email address, pager number, time interval over and over throughout the file, but instead use the DEFINEd version of it. See the included example.m4 for a good example of this.

4. User contributed scripts

There are so many user contributed scripts, that I figured I'd list them here so you could see all of them.

The monitors:

asyncreboot.monitor
bootp.monitor
cpqhealth.monitor
dialin.monitor
dir_file_age.monitor
dns.monitor
file_change.monitor
flexlm.monitor
foundry-chassis.monitor
fping.monitor
freespace.monitor
ftp.monitor
hpnp.monitor
http.monitor
http_integrity.monitor
http_t.monitor
http_tp.monitor
http_tpp.monitor
https.monitor
icecast.monitor
imap.monitor
informix.monitor
informixdbspace.monitor
ipsec.monitor
ldap.monitor
lwp-http-post.monitor
mailloop.monitor
mon.monitor
msql-mysql.monitor
na_quota.monitor
netappfree.monitor
netsnmp-exec.monitor
netsnmp-freespace.monitor
netsnmp-proc.monitor
nntp.monitor
ntp.monitor
ntservice.monitor
phttp.monitor
ping.monitor
pop3.monitor
postgresql.monitor
printmib.monitor
process-full-command-line.monitor
process.monitor
radius.monitor
rd.monitor
reboot.monitor
remote.monitor
rpc.monitor
rptr.monitor
samba.monitor
seq.monitor
silkworm.monitor
smtp.monitor
smtp3.monitor
smtp_rt.monitor
snmp_interface.monitor
sqlconn.monitor
ssh.monitor
startremote.monitor
tcp.monitor
tcpch.monitor
telnet.monitor
traceroute.monitor
umn_mon.monitor
up_rtt.monitor
xedia-ipsec-tunnel.monitor

The alerts:

bugzilla.alert
file.alert
gnats.alert
hpov.alert
mail.alert
netpage.alert
qpage.alert
remote.alert
simplepage.alert
sms.alert
snapdelete.alert
snpp.alert
test.alert
trap.alert
winpopup.alert

Additionally, there is a cgi program in the cgi-bin package to create a status webpage that the average joe can grok (and even use it to modify mon's parameters while it's running). There is also a GUI configuration utility in the utils package, but it uses Perl/Tk (yuck), and is not too good anyway, but is worth a try.

Anyway, I have about -3 minutes left to write this, so here comes the example configuration, and you can probably find whatever else you need in the manpage!

5. Example configuration

Download this example configuration.

#####################################
# Global Options
#
basedir    = /usr/local/lib/mon
alertdir   = alert.d
mondir     = mon.d
cfbasedir  = /usr/local/lib/mon/etc
dep_behavior = m
dep_recur_limit = 10
dtlogging  = no
histlength = 100
logdir     = /var/log/mon
maxprocs   = 20
pidfile    = /var/run
randstart  = 60s


#####################################
# Host Groups
#
hostgroup dns 66.20.234.14 66.20.234.15

hostgroup ftp ftp linux2 craq01 archer

hostgroup nntp news

hostgroup pop3 mail

hostgroup smtp mxqmail2 mxqmail1 mail craq01


watch dns
    service dns
        description Check DNS services
        interval 1m
        monitor dns.monitor -zone speedfactory.net -master ns.speedfactory.net
        period wd {Sun-Sat}
            alert mail.alert moshe@speedfactory.net
            upalert mail.alert moshe@speedfactory.net
            alertevery 30m

watch ftp
    service ftp
        description Check FTP servers
        interval 5m
        monitor ftp.monitor -p 21 -t 20
        period wd {Sun-Sat}
            alert mail.alert moshe@speedfactory.net
            upalert mail.alert moshe@speedfactory.net
            alertafter 2

watch nntp
    service nntp
        description Check that news server is up
        interval 1m
        monitor nntp.monitor -p 119
        period wd {Sun-Sat}
            alert mail.alert moshe@speedfactory.net
            upalert mail.alert moshe@speedfactory.net
            alertevery 30m

watch pop3
    service pop3
        description Check that the pop3 server is working
        interval 1m
        monitor pop3.monitor -p 110 -t 20
        period wd {Sun-Sat}
            alert mail.alert moshe@speedfactory.net
            upalert mail.alert moshe@speedfactory.net
            alertevery 30m

watch smtp
    service smtp
        description Check mail sending
        interval 1m
        monitor smtp.monitor -p 25 -t 20
        period wd {Sun-Sat}
            alert mail.alert moshe@speedfactory.net
            upalert mail.alert moshe@speedfactory.net
            alertevery 30m
            alertafter 2

watch routers
    service ping
        description Ping our routers
        interval 1m
        monitor fping.monitor
        period wd {Sun-Sat}
            alert mail.alert moshe@speedfactory.net
            upalert mail.alert moshe@speedfactory.net
            alertevery 15m
            alertafter 2

6. Resources

This article has external documents! Click here.