Server Monitoring With Mon
Table of Contents
- 1. Introduction
- 2. Getting Mon
- 3. Configuring Mon
- 4. User contributed scripts
- 5. Example configuration
- 6. Resources
1. Introduction
This is a quick and dirty introduction to Mon, a server monitoring tool that can
be used to monitor any number of services running on any number of servers. Mon
is useful for a system administrator who needs to be notified as soon as a
server or network resource goes down, so that he can respond immediately and
have as little downtime as possible.
Mon’s strategy is to be highly modular, thereby allowing you to write whatever
monitors (programs that check a service’s status) and alerts (programs that let
you know when a service has gone down or come up) tickle your fancy.
Don’t worry if you don’t want to or know how to write such scripts, though,
because the likelihood is that someone has already contributed the monitor or
alert script you’re looking for. For the purposes of this introduction, I will
assume we don’t need to write our own. Otherwise I’d need to spend more than 30
minutes writing this presentation! Besides, if you’re at that level already, you
can just read the mon manpage and figure it out for yourself.
Mon can also listen as a server daemon on a particular port, which allows other
computers running some form of the mon client to contact the mon server when
something running on it is down, so that mon can do whatever it has to do to let
you know. This feature is called event trapping (or "traps" for short). This is
also beyond the scope of this presentation, but is not too difficult to
implement.
2. Getting Mon
The first thing you’ll need to do is download mon. You can find it at:
http://www.kernel.org/software/mon/
Once you have gotten the mon tarball, follow these steps:
- untar it to /usr/local/lib/mon (trust me, it kind of assumes that you will be
putting it there). - Move the contents of the etc/ directory into /etc/mon
- mkdir /var/state/mon for the mon state information
- touch /var/state/mon/disabled
Now you’ll want to download all the user contributed programs. There are the
-
Monitors:
ftp://ftp.kernel.org/pub/software/admin/mon/contrib/all-monitors.tar.gz -
Alerts:
ftp://ftp.kernel.org/pub/software/admin/mon/contrib/all-alerts.tar.gz
-
CGIs:
ftp://ftp.kernel.org/pub/software/admin/mon/contrib/all-cgi-bin.tar.gz -
Utils:
ftp://ftp.kernel.org/pub/software/admin/mon/contrib/all-utils.tar.gz
Download them all into /usr/local/lib/mon and untar them into the directories
they untar into. Easy enough. Anyway, once they’re untarred, and you’re still
in /usr/local/lib/mon, you should move the resulting files as follows:
# mv monitors/*/* mon.d
# mv alerts/*/* alert.d
3. Configuring Mon
Now you’re ready to copy /etc/mon/example.cf to /etc/mon/mon.cf and edit
/etc/mon/mon.cf. This file is laid out as follows:
- Global options
- Hostgroup definitions (assigning names to sets of hosts)
- Watch definitions (defining what will be monitored for each host group)
Each watch definition consists of any number of service definitions. A service
definition defines one service type that you will be checking on the current
host group.
Each service definition consists of:
- Various service options, including the frequency with which to check, the
monitor program to use for the check, and a description string for the
service - One or more period definitions that dictate how to behave if the monitors
fail during various times of the day or week
Each of these period definitions consists of various options such as what alert
and upalert programs to use, and with what options, as well as options that
dictate how frequently to notify you if the service remains down, or how many
failures must occur before the alert is sent.
The cool thing is that instead of using /etc/mon/mon.cf, you can call it
/etc/mon/mon.m4 (and make sure to start mon with the "-c /etc/mon/mon.m4"
option), and mon processes the file with m4 before processing the mon directives
in the file. This can be useful for DEFINEs, so you can keep from having to
write the same email address, pager number, time interval over and over
throughout the file, but instead use the DEFINEd version of it. See the included
example.m4 for a good example of this.
4. User contributed scripts
There are so many user contributed scripts, that I figured I’d list them here so
you could see all of them.
The monitors:
asyncreboot.monitor bootp.monitor cpqhealth.monitor dialin.monitor dir_file_age.monitor dns.monitor file_change.monitor flexlm.monitor foundry-chassis.monitor fping.monitor freespace.monitor ftp.monitor hpnp.monitor http.monitor http_integrity.monitor http_t.monitor http_tp.monitor http_tpp.monitor https.monitor icecast.monitor imap.monitor informix.monitor informixdbspace.monitor ipsec.monitor ldap.monitor lwp-http-post.monitor mailloop.monitor mon.monitor msql-mysql.monitor na_quota.monitor netappfree.monitor netsnmp-exec.monitor netsnmp-freespace.monitor netsnmp-proc.monitor |
nntp.monitor ntp.monitor ntservice.monitor phttp.monitor ping.monitor pop3.monitor postgresql.monitor printmib.monitor process-full-command-line.monitor process.monitor radius.monitor rd.monitor reboot.monitor remote.monitor rpc.monitor rptr.monitor samba.monitor seq.monitor silkworm.monitor smtp.monitor smtp3.monitor smtp_rt.monitor snmp_interface.monitor sqlconn.monitor ssh.monitor startremote.monitor tcp.monitor tcpch.monitor telnet.monitor traceroute.monitor umn_mon.monitor up_rtt.monitor xedia-ipsec-tunnel.monitor |
The alerts:
bugzilla.alert file.alert gnats.alert hpov.alert mail.alert |
netpage.alert qpage.alert remote.alert simplepage.alert sms.alert |
snapdelete.alert snpp.alert test.alert trap.alert winpopup.alert |
Additionally, there is a cgi program in the cgi-bin package to create a status
webpage that the average joe can grok (and even use it to modify mon’s
parameters while it’s running). There is also a GUI configuration utility in the
utils package, but it uses Perl/Tk (yuck), and is not too good anyway, but is
worth a try.
Anyway, I have about -3 minutes left to write this, so here comes the example
configuration, and you can probably find whatever else you need in the manpage!
5. Example configuration
Download
this example configuration.
#####################################
# Global Options
#
basedir
alertdir
mondir
cfbasedir
dep_behavior = m
dep_recur_limit = 10
dtlogging
histlength = 100
logdir
maxprocs
pidfile
randstart
#####################################
# Host Groups
#
hostgroup dns 66.20.234.14 66.20.234.15
hostgroup ftp ftp linux2 craq01 archer
hostgroup nntp news
hostgroup pop3 mail
hostgroup smtp mxqmail2 mxqmail1 mail craq01
watch dns
watch ftp
watch nntp
watch pop3
watch smtp
watch routers