dmon - a distributed monitor
Usage: dmon [OPTIONS] [COMMANDS] OPTIONS : [-s] [-q] [-v] [-d] [-h] [-i] [-t] [-c file] COMMANDS : stop|start|reload|state option s : be silent option q : be quiet ; only errors option v : be verbose ; show all actions option d : show debug info ; internals option T : trace ; show all debug info option h : show help ; exit option t : test config and work ; exit option i : interactive ; don't fork the daemon option c : use config <file> ; default [dmon.conf,/etc/dmon/conf]
Program dmon provides system monitoring on a collection of hosts.
Each host runs dmon
as a daemon.
On each host, dmon
is a client ;
on one host dmon
is also a server ;
and one host is also the (web) pagemaker.
On each host, dmon
monitors items ;
things like cpu-load, disk-usage,
the status of a couple of daemons etc.
Every minute, dmon
computes a fresh value for each item,
and stores these values in a local (sqlite) database.
Values of items have a fitness-level ; typically :
fine soso sick critical dead
The fitness of an item is determined by a configurable (per [host-]item) fitness-function.
Every 5 minutes dmon
sends a report to the server ;
this reports contains current information about the items.
If/when the fitness of an host-item changes,
dmon
sends an event to the server.
On receiving events, the server may send mails to selected users, depending on the event (host, item, current level, previous level).
In a few web-pages, dmon shows you :
all problems : host-items that aren't fine
recent events ; host-item fitness changes
the history of each host-item ; in a graph ; per day, hour, day, month or year
various overviews ; items by host, hosts by item ; for various intervals
What the dmon-system does, is determined by a work-config file. It contains (among other things) :
a list of hosts and host-groups
for each host and/or host-group, the items it must monitor
the designated pagemaker
fitness-levels and (per host and/or item) fitness-functions
a list of users and user-groups
for all fitness-changes, the users that must receive an alert
Program mk-dmon-work reads the work-config file and generates a work-file for each client, the server and the pagemaker.
The system is distributed in the sense that :
each client-dmon runs on its own ; there is no central (work)flow-control
each client-dmon keeps its own history in a local (sqlite) database
each client-dmon can talk to the server ; the server can talk to each client ; all connections are short-lived.
be silent
be quiet ; only errors
be verbose ; show all actions
show debug info ; internals
trace ; show all debug info
show help ; exit
test config and work ; exit
interactive ; don't fork the daemon
use config file ; default : dmon.conf
, /etc/dmon/conf
The default locations of the config file are :
A config file looks like this :
+-------------------------------------------------- |# lines that start with '#' are comment |# blank lines are ignored too |# tabs are replaced by a space | |# the config entries are 'key' and 'value' pairs |# a 'key' begins in column 1 |# the 'value' is the rest of the line |somekey part1 part2 part3 ... |otherkey part1 part2 part3 ... | |# indented lines are glued |# the next three lines mean 'somekey part1 part2 part3' |somekey part1 | part2 | part3 +--------------------------------------------------
Specify the (fully qualified) hostname of the server.
Specify a fully qualified hostname for the client. The default is :
hostname `hostname`
If `hostname`
is not fully qualified (does not contain a dot),
dmon attempts to append a a domain-string.
If defined, dmon uses config-option domain ;
otherwise file /etc/resolv.conf
is searched for a search list
and the first name found used.
If $hostname resolves to a CNAME, the pointed-to name is used.
Using dmon -t -v
tells you which (canonical) $hostname dmon uses.
Specify a number ; the default is :
PORT 22007
The server listens for connections on port $PORT. The client listens for connections on port $PORT+1.
Specify the interval after which the client will compute the next state. The default is (one minute) :
ival_make_state 1m
An interval-spec can be given in seconds (as in 22 or 22s), minutes [m], hours [h], days [d] and/or weeks [w].
The interval-specs can be combined in any order :
dw # a day and a week 7d+24h # same thing w-0.5h # a week minus half an hour hm6 # 3666 seconds
Specify the interval after which the client will send the next report to the server. The default is :
ival_send_report 5m
Specify the interval after which the client will request a fresh work-file from the server. The default is :
ival_check_work 10m
Specify how long the server must store events. The default is four weeks :
ival_keep_events 4w
The server cleans up the event-history hourly.
Specify a log-level ; the default is
loglvl Terse
Script gen-dmon-page generates references to plotter plotter.php as $url ; the default is :
plot_url /plotter.php
A leading slash is interpreted as the DOCUMENTROOT of the vhost running gen-dmon-page.
Specify parameters for log rotation ; the default is
rotate 8 1d
On start-up, and after interval-spec seconds, dmon will
rotate its logfile (/var/log/dmon/dmon.log
),
saving num files.
Specify the directory where the programs live. The default is :
bindir /usr/sbin
Option $bindir is used by the UPGRADE
-facility.
The server sends (and the clients installs in) $bindir/dmon
.
Option $bindir is ignored by root
.
Specify the directory where the logfiles live. The default is :
logdir /var/log/dmon
Specify a dmon keeps it files. The default is :
vardir /var/dmon
Specify a run-dir. The default is :
rundir /var/run/dmon
This is where the daemon keeps its files.
Specify a lock-dir. The default is :
rundir /var/lock/subsys
There is no default. Specify the group (name or number) under which the httpd is running on the pagemaker.
Gid $httpdgid is used by program users-dmon to make the user-database writable for the http-daemon.
Specify a (https) url ; there is no default.
$page_sec is used on the pagemaker by cgi-script gen-dmon-page to refer to the secure dmon-web-site, where users can login.
By default, the clients listens on port 22008 for connections. The client accepts connections from localhost and the server. It expects a command from the list below.
To send a command to the client, use netcat (nc
) :
echo <command> | nc localhost 22007 echo <command> | nc <client-hostname> 22007
The server responds :
COMMAND PING PONG from Client <hostname> dmon-0.05-p179 COMMAND DONE
The client responds with something like :
COMMAND STATE -- version dmon-0.05-p184 -- logfile /var/log/dmon/dmon.log -- loglevel Trace -- listening on port 22008 as a Client -- Client is processing a command-session << 127.0.0.1 port 22008 -- hostname science-bs32.science.uu.nl -- server down.science.uu.nl -- work Thu Jan 28 14:21:27 2016 -- Client state : { ... } COMMAND DONE
Command STOP
stops the client.
On startup, the daemon generates a secret and stores it (by default)
in file /var/run/dmon/dmon.stp
, mode 0600
.
This secret must be supplied in the STOP
command.
Since the secret is only available to the owner of the daemon,
only the owner of the daemon can use the STOP
command.
where
ival represents an interval : Hour, Day, Week, Month or Year ; only the first letter is used ; default H
pnts is ignored ; default 100
host is a hostname ; default : the client's canonical hostname
name is an item-name ; default : all item-names for host
The client queries its history-database, grouping rows and averaging over groups.
+----------+------------+------+ | Interval | Group by | rows | |----------+------------+------| | Hour | 1 minute | 60 | | Day | 10 minutes | 145 | | Week | 1 hour | 168 | | Month | 4 hours | 186 | | Year | 2 days | 184 | +----------+------------+------+
Note ; instead of these constants, we should be able to specify
the approximate number of rows we want ($pnts).
There is a relation with ZAP
.
The client responds with a json-encoded summary of the host-item history ; something like
COMMAND HIST { "resp" : "ok rows 145" , "data" : { "cols" : [ "TIME", item1, ... ] , "rows" : [ [ time, value, ... ], ... ] } } COMMAND DONE
where cols is a list of (requested) item-names (ordered alphbetically), and rows is a list of [time,values] tuples.
The HIST
command is used by the plotter to retrieve historical data
from a client, so it can generate a plot.
Command SEND
instructs the client to send a report to the server.
The client re-schedules sending its next report to a randomised time
in the near future ; this avoids congestion on the server if/when
all clients receive a SEND
command as a result of a server's
ALLSEND
command.
The client responds with :
COMMAND SEND next_send down.science.uu.nl Sat Feb 13 18:45:39 2016 COMMAND DONE
The client rsyncs $server::$upgr_mod into $vardir/upgrade/ ;
if succesful, it runs cd $vardir/upgrade/ ; make upgrade
.
The make
tests the new program and, if ok,
installs the program, sends a reponse, and schedules
a re-exec.
The client responds with something like :
COMMAND UPGRADE upgrade science-vs14.science.uu.nl ... ok dmon-0.05-p183 → dmon-0.05-p184 ... COMMAND DONE
Note: by default (or when running as root
), the server sends
(and the client installs in) /local/sbin/dmon
.
Note: a client accepts an UPGRADE
command only from the server.
The client connects to the server and issues command cmd. The clients responds with the server's response.
This facility is used for testing.
The client responds with a json-dump of its current state.
This facility is used for testing.
The ZAP
command instructs the client to zap its history database :
the client reduces the number of rows in its history by replacing
groups of rows by a single row containing the average values of the group.
This facility is used for testing ; clients zaps their history just after startup, and subsequently every hour.
By default, the server listens on port 22007 for connections. The server accepts connections from localhost and the clients. It expects a command from the list below.
To send a command to the server, use netcat (nc
) :
echo <command> | nc localhost 22007 echo <command> | nc <server-hostname> 22007
The server responds :
COMMAND PING PONG from Server <hostname> <dmon-version> COMMAND DONE
The server responds with something like :
COMMAND STATE -- version dmon-0.05-p179 -- logfile /var/log/dmon/dmon.log -- loglevel Trace -- listening on port 22008 as a Client -- listening on port 22007 as a Server -- Server is processing a command-session << 127.0.0.1 port 22007 -- Server state : keeping state for 74 clients COMMAND DONE
where prog == dmon-server
| dmon-client
| dmon-pmaker
The server responds with something like :
COMMAND WORK { "resp" : "ok work" , "data" : contents of the work-file as a string , "lm" : work-file's last-modified timestamp } COMMAND DONE
where (by default) the work-file is :
/var/dmon/works/<prog>/<hostname>.txt
On the server, dmon
retrieves work-files from the file-system.
The clients and the pagemaker retrieve a work-file from the server ; except when its (config) server is equal to its (canonical) hostname, in which case the work-file is retrieved from the file-system.
Same as command WORK
except that data is always null
.
After startup, periodicly (default every 10 minutes), a client issues a WORK_LM command to the server, which responds with the current last-modified timestamp of the client's work-file.
If the timestamp has changed, the client reloads itself.
The server expects a client report : a one-line json-hash describing the current state of the hosts the client monitors (a client can monitor other (dumb) hosts like upses, using SNMP).
The json-hash contains the state of one or more hosts :
{ host1 : { item1 : ... , ... } , host2 : ... }
where per item it has a value, fitness, probe-errors etc.
The server just timestamps and stores the hash ; it responds with something like :
COMMAND REPORT { "resp": "ok report from <client> [<ip>] for <host1, ...>" , "work": <timestamp work-file> } COMMAND DONE
The work-element is the timestamp of the workfile of the requesting host.
The client caches this info, to avoid a WORK_LM
request.
The pagemaker retrieves the combined state of all the hosts with server-command CLIENTS ; see below.
The server expects a list of events. Each event describes a change of the fitness of an item on a host ; attributes : hostname, item-name, old fitness-level, new fitness-level, old value, new value.
The server stores the events in a database.
The pagemaker may retrieve events from the server using the server command CLIENTS ; see below.
The server responds with something like :
COMMAND CLIENTS { "resp" : "ok clients" , "cdmp" : { host : state of host, ... } , "events" : { recent events ; optionally selected by time or count } } COMMAND DONE
If HOST is specified, cdmp only contains the state of HOST.
If HOST and/or ITEM is specified, events only contains event pertaining to HOST and/or ITEM.
The default for the last argument is 0 (meaning all events) ; if the argument looks like a number, only the last arg events are sent ; otherwise only the events in the specified interval are sent.
The CLIENTS
command is issued by the pagemaker when it generates
a html report-page.
The server connects with client hostname and issues command cmd. The server responds with the client's response.
This facility enables a client-host to talk with another client-host. It is used by the pagemaker to retrieve history information from clients, unless the pagemaker runs on the same host as the server (in which case it can talk directly to any client).
The server issues (in parallel) a PING
command to each client.
The server's response is the concatenation of the client-responses.
The server issues (in parallel) a SEND
command to each client.
The server's response is the concatenation of the client-responses ; something like :
COMMAND SEND next_send down.science.uu.nl Sat Feb 13 18:09:11 2016 ... COMMAND DONE
On the server-host, the client sends an ALLSEND
to its server-part,
on start-up and after a reload ; so the server is quickly up-to-date.
The server issues a UPGRADE
command to each client
(or only clients hostname ...).
Sent with the command is a version-string, or undef
if -f
is specified.
The client compares its version with the version-string (if defined) ;
if versions are equal, the client ignores the UPGRADE
-command.
Otherwise, the client rsyncs new software from the server,
and runs a make upgrade
; this tests and installs the new stuff
and, if ok, the client schedules a re-exec.
The server's response is the concatenation of the client-responses ; something like :
COMMAND UPGRADE upgrade science-vs14.science.uu.nl ... ok dmon-0.05-p183 → dmon-0.05-p184 ... COMMAND DONE
Note: by default (or when running as root
), the server sends
(and the client installs in) /local/sbin/dmon
.
Note: the server accepts an UPGRADE
command only from localhost ;
clients accept an UPGRADE
command only from the server.
Daemon dmon
must run as root
because many system diagnostic tools
can only be used as root
.
The server must be able to connect to all the clients on port 22008. The server listens on port 22007 ; It allows connections from localhost and the clients, rejecting others.
A client must be able to connect to the server on port 22007. A client listens on port 22008 ; it allows connections from localhost and the server, rejecting others.
Program dmon
requires a bunch of perl modules, most of which
are CORE (come with perl
) ; the others are widely available
from platform-repo's.
A dmon
system works best if the server runs on real hardware
that is not dependent on other hosts you want to monitor.
This gives you the best chance that monitoring is available when
some calamity occurs.
Whatever your initial setup may be, it is easy to (later) move the
dmon
-server to another host ; idem for the required web-service.
To use dmon
, one of the monitored hosts must be a web-server ;
it is efficient (but not necessary) to have the server and
the web-service on the same host.
Create some directory ; for instance /local/dmon/
; then get dmon
:
% rsync -avz archive.science.uu.nl::dmon-dev/ /local/dmon/
% cd /local/dmon/ % perl dmon -v -t
Perl will probably complain with something like :
Can't locate xxx.pm ...
... where xxx is a missing modules.
Install the module with your favorite tool (yum, apt-get),
or use program cpanm
.
Installing modules with cpan(1)
is usually horrible ;
use cpanm
instead ; see the INSTALL
file for hints ; also CPAN's
How to install CPAN modules.
If cpanm
fails, view the cpanm
-build-log ;
some modules require gcc(1)
.
Install missing perl modules until dmon
complains about
a missing config file.
/etc/dmon/conf
Supply the server's hostname or (preferably) a CNAME for your server-host, so you can switch later.
server ... loglvl Verbose
/etc/dmon/work
# define a host host hostname-of-your-server myhost host hostname-of-your-webserver www pmaker www # set fitness levels fit_level fine fit_level soso fit_level sick fit_level crit critical fit_level dead # on myhost, monitor some items get myhost cpu_load root_avail root_usage uptime
... where hostname-of-your-server is a fully-quallified hostname, and myhost is just a unique tag, used to refer to the host.
/var/dmon/works/
Run program mk-dmon-work :
% mk-dmon-work -v -f
Start the daemon ; for now, don't fork (use -i
) :
% dmon start -i # stop it with ^C
If all is well, you will see dmon
start up, first as server, then as client.
After a while,
every minute : the daemon (as client) computes fresh item values ;
every five minutes :
the daemon (as client) sends a report to the server ;
the daemon (as server) acceps the report, responds to the client ;
the client receives the response from the server ; closes the connection.
dmon
as a service
copy init.d
:
% cp init.d /etc/init.d/dmon
Make sure dmon
is started after a reboot ;
use chkconfig
:
% chkconfig --add dmon
or (on Ubuntu) :
% update-rc.d dmon defaults 50 50
Same as server-install.
Same as server-install.
/etc/dmon/conf
Supply the server's hostname or (preferably) a CNAME for your server-host, so you can switch later.
server ...
Configure hostname
if necessary ; see config option hostname
.
dmon
as a service
Same as server-install.
The pagemaker must be a dmon-host and a web-server.
Make sure the webserver runs gen-dmon-page as a cgi-script.
Script gen-dmon-page generates a login-reference ; preferably to a secure site ; configure something like :
page_sec https://dmon.your.org/cgi-bin/gen-dmon-page
Just use http instead of https if you don't have a secure site (yet).
Make sure the webserver runs plotter.php as a php-script.
Script gen-dmon-page generates references to the plotter as /plotter.php
.
Configure option plot_url
to change that.
The plotter uses package jpgraph ; install the package and make sure the plotter can do :
require_once ( 'jpgraph/jpgraph.php' ) ; require_once ( 'jpgraph/jpgraph_canvas.php' ) ; require_once ( 'jpgraph/jpgraph_line.php' ) ; require_once ( 'jpgraph/jpgraph_date.php' ) ;
That is, if your plotter lives in :
/path/to/plotter.php
install jpgraph
in :
/path/to/jpgraph-<VERSION>/
and make a symlink :
% ln -s /path/to/jpgraph-<VERSION>/src/ /path/to/jpgraph/
Here are some tips for using dmon :
In general, monitoring works best if the server runs on stand-alone hardware, that doesn't use any other resources. Dmon is very light-weight, so some old box will probably work.
Dress up the server as a web-server and run the pagemaker (gen-dmon-page and plotter.php) on the server ; this is a little more efficient.
To make installing dmon on clients easier, dress up the server as a rsync-server ; then :
Create a module [dmon]
containing the downloaded dmon distro.
In module [dmon]
copy the Makefile
that comes with dmon,
to file makefile
:
cp Makefile makefile
Tweak the makefile
to meet your local needs.
Note : for make
, the makefile
takes precedence over Makefile
.
On a client, rsync dmon from the rsync server :
% mkdir -p /local/dmon/ % rsync -avz dmon.your.org::dmon/ /local/dmon/
Then, on the client you can run a make
:
% cd /local/dmon/ # repeat until all perl-modules are installed : % ./dmon -t -v % make install # start dmon % /etc/init.d/dmon start # fix iptables ; allow tcp connections from dmon.your.org:22008
HUP : dmon reloads ; dmon re-reads the config ; re-inits client and server USR1 : dmon re-execs ; dmon stops with END { exec $PROGRAM_NAME, @ARGV }
/etc/dmon/conf dmon-configuration /etc/init.d/dmon dmon start/stop script /var/dmon/data.lite database for item-history data /var/dmon/probes dmon probes ; installed by dmon on startup /var/lock/subsys/dmon touched on dmon-startup ; removed on stop /var/log/dmon logfiles /var/run/dmon/dmon.lck lock-file /var/run/dmon/dmon.pid pid-file /var/run/dmon/dmon.stp dmon stop-secret ; for client STOP-command
/etc/dmon/work work-configuration /var/dmon/works per/client work-files /var/dmon/works.tmp staging directory for /var/dmon/works
/var/dmon/cgi-data database for gen-dmon-page ; login/out-log /var/dmon/cgi-secret secret for cookie-verification ; should be 0600
mk-dmon-work, gen-dmon-page, users-dmon
You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl 5.10.0 README file.