Use Nagios to keep tabs on your network.
Since remote exploits can often crash the service that is being broken into or cause its CPU use to skyrocket, you should monitor the services that are running on your network. Just looking for an open port (such as by using Nmap [Hack #42]) isn’t enough. The machine may be able to respond to a TCP connect request, but the service may be unable to respond (or worse, could be replaced by a different program entirely!). One tool that can help you verify your services at a glance is Nagios (http://www.nagios.org).
Nagios is a network-monitoring application that monitors not only the services running on the hosts on your network, but also the resources on each host, such as CPU usage, disk space, memory usage, running processes, log files, and much more. In the advent of a problem it can notify you through email, pager, or any other method that you define, and you can check the status of your network at a glace by using the web GUI. Nagios is also easily extensible through its plug-in API.
To install Nagios, download the source distribution from the Nagios web site. Then, unpack the source distribution and go into the directory it creates:
$ tar xfz nagios-1.1.tar.gz
$ cd nagios-1.1
Before running Nagios’s configure script, you should create a user and group for Nagios to run as (e.g., nagios). Then run the configure script with a command similar to this:
$ ./configure –with-nagios-user=nagios –with-nagios-grp=nagios
This will install Nagios in /usr/local/nagios. As usual, you can modify this behavior by using the –prefix switch. After the configure script finishes, compile Nagios by running make all. Then become root and run make install to install it. In addition, you can optionally install Nagios’s initialization scripts by running make install-init.
If you take a look into the /usr/local/nagios directory right now, you will see that there are four directories. The bin directory contains a single file, nagios, that is the core of the package. This application does the actual monitoring. The sbin directory contains the CGI scripts that will be used in the web-based interface. Inside the share directory, you’ll find the HTML files and documentation. Finally, the var directory is where Nagios will store its information once it starts running.
Before you can use Nagios, you will need a couple of configuration files. These files go into the etc directory, which will be created when you run make install-config. This command also creates a sample copy of each required configuration file and puts them into the etc directory.
At this point the Nagios installation is complete. However, it is not very useful in its current state, because it lacks the actual monitoring applications. These applications, which check whether a particular monitored service is functioning properly, are called plug-ins. Nagios comes with a default set of plug-ins, but they must be downloaded and installed separately.
Download the latest Nagios Plugins package and decompress it. You will need to run the provided configure script to prepare the package for compilation on your system. You will find that the plug-ins are installed in a fashion similar to the actual Nagios program.
To compile the plug-ins, run commands similar to these:
$ ./configure –prefix=/usr/local/nagios \
–with-nagios-user=nagios –with-nagis-grp=nagios
$ make all
You might get notifications about missing programs or Perl modules while the script is running. These are mostly fine, unless you specifically need the mentioned applications to monitor a service.
After compilation is finished, become root and run make install to install the plug-ins. The plug-ins will be installed in the libexec directory of your Nagios base directory (e.g., /usr/local/nagios/libexec).
There are a few rules that all Nagios plug-ins should implement, making them suitable for use by Nagios. All plug-ins provide a –help option that displays information about the plug-in and how it works. This feature is very helpful when you’re trying to monitor a new service using a plug-in you haven’t used before.
For instance, to learn how the check_ssh plug-in works, run the following command:
$ /usr/local/nagios/libexec/check_ssh
check_ssh (nagios-plugins 1.4.0alpha1) 1.13
The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute
copies of the plugins under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
Copyright (c) 1999 Remi Paulmier <remi@sinfomic.fr>
Copyright (c) 2000-2003 Nagios Plugin Development Team
<nagiosplug-devel@lists.sourceforge.net>
Try to connect to SSH server at specified server and port
Usage: check_ssh [-46] [-t <timeout>] [-p <port>] <host>
check_ssh (-h | –help) for detailed help
check_ssh (-V | –version) for version information
Options:
-h, –help
Print detailed help screen
-V, –version
Print version information
-H, –hostname=ADDRESS
Host name or IP Address
-p, –port=INTEGER
Port number (default: 22)
-4, –use-ipv4
Use IPv4 connection
-6, –use-ipv6
Use IPv6 connection
-t, –timeout=INTEGER
Seconds before connection times out (default: 10)
-v, –verbose
Show details for command-line debugging (Nagios may truncate output)
Send email to nagios-users@lists.sourceforge.net if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to nagiosplug-devel@lists.sourceforge.net
Now that both Nagios and the plug-ins are installed, we are almost ready to begin monitoring our servers. However, Nagios will not even start before it’s configured properly.
The sample configuration files provide a good starting point:
$ cd /usr/local/nagios/etc
$ ls -1
cgi.cfg-sample
checkcommands.cfg-sample
contactgroups.cfg-sample
contacts.cfg-sample
dependencies.cfg-sample
escalations.cfg-sample
hostgroups.cfg-sample
hosts.cfg-sample
misccommands.cfg-sample
nagios.cfg-sample
resource.cfg-sample
services.cfg-sample
timeperiods.cfg-sample
Since these are sample files, the Nagios authors added a .cfg-sample suffix to each file. First, we need to copy or rename each one to end in .cfg, so that the software can use them properly. (If you don’t change the configuration filenames, Nagios will not be able to find them.)
You can either rename each file manually or use the following command to take care of them all at once. Type the following script on a single line:
# for i in *cfg-sample; do mv $i `echo $i | \
sed -e s/cfg-sample/cfg/`; done;
First there is the main configuration file, nagios.cfg. You can pretty much leave everything as is—the Nagios installation process will make sure the file paths used in the configuration file are correct. There’s one option, however, that you might want to change: check_external_commands, which is set to 0 by default. If you would like to be able to directly run commands through the web interface, you will want to set this to 1. Depending on your network environment, this may or may not be an acceptable security risk, as enabling this option will permit the execution of scripts from the web interface. Other options you need to set in cgi.cfg configure which usernames are allowed to run external commands.
To get Nagios running, you must modify all but a few of the sample configuration files. Configuring Nagios to monitor your servers is not as difficult as it looks. To help you, you can use the verbose mode of the Nagios binary by running:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
This command will go through the configuration files and report any errors. Start fixing the errors one by one, and run the command again to find the next error. For testing purposes, it is easiest to disable all hosts and services definitions in the sample configuration files and merely use the files as templates for your own hosts and services. You can keep most of the files as is, but remove the following, which will be created from scratch:
hosts.cfg
services.cfg
contacts.cfg
contactgroups.cfg
hostgroups.cfg
Start by configuring a host to monitor. We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one for the sake of simplicity.
Here are the contents of hosts.cfg:
# Generic host definition template
define host{
# The name of this host template – referenced i
name generic-host
n other host definitions, used for template recursion/resolution
# Host notifications are enabled
notifications_enabled 1
# Host event handler is enabled
event_handler_enabled 1
# Flap detection is enabled
flap_detection_enabled 1
# Process performance data
process_perf_data 1
# Retain status information across program restarts
retain_status_information 1
# Retain non-status information across program restarts
retain_nonstatus_information 1
# DONT REGISTER THIS DEFINITION – ITS NOT A REAL HOST,
# JUST A TEMPLATE!
register 0
}
# Host Definition
define host{
# Name of host template to use
use generic-host
host_name freelinuxcd.org
alias Free Linux CD Project Server
address www.freelinuxcd.org
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24×7
notification_options d,u,r
}
The first host defined is not a real host but a template from which other host definitions are derived. This mechanism can be seen in other configuration files and makes configuration based on a predefined set of defaults a breeze.
With this setup we are monitoring only one host, www.freelinuxcd.org, to see if it is alive. The host_name parameter is important because other configuration files will refer to this server by this name. Now the host needs to be added to a hostgroup, so that the application knows which contact group to send notifications to.
Here’s what hostgroups.cfg looks like:
define hostgroup{
hostgroup_name flcd-servers
alias The Free Linux CD Project Servers
contact_groups flcd-admins
members freelinuxcd.org
}
This defines a new hostgroup and associates the flcd-admins contact_group with it. Now you’ll need to define that contact group in contactgroups.cfg:
define contactgroup{
contactgroup_name flcd-admins
alias FreeLinuxCD.org Admins
members oktay, verty
}
Here the flcd-admins contact_group is defined with two members, oktay and verty. This configuration ensures that both users will be notified when something goes wrong with a server that flcd-admins is responsible for. The next step is to set the contact information and notification preferences for these users.
Here are the definitions for those two members in contacts.cfg:
define contact{
contact_name oktay
alias Oktay Altunergil
service_notification_period 24×7
host_notification_period 24×7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email,host-notify-by-epager
email oktay@freelinuxcd.org
pager dummypagenagios-admin@localhost.localdomain
}
define contact{
contact_name Verty
alias David ‘Verty’ Ky
service_notification_period 24×7
host_notification_period 24×7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email
email verty@flcd.org
}
In addition to providing contact details for a particular user, the contact_name in the contacts.cfg file is also used by the CGI scripts (i.e., the web interface) to determine whether a particular user is allowed to access a particular resource. Now that your hosts and contacts are configured, you can start to configure monitoring for individual services on your server.
This is done in services.cfg :
# Generic service definition template
define service{
# The ‘name’ of this service template, referenced in other service definitions
name generic-service
# Active service checks are enabled
active_checks_enabled 1
# Passive service checks are enabled/accepted
passive_checks_enabled 1
# Active service checks should be parallelized
# (disabling this can lead to major performance problems)
parallelize_check 1
# We should obsess over this service (if necessary)
obsess_over_service 1
# Default is to NOT check service ‘freshness’
check_freshness 0
# Service notifications are enabled
notifications_enabled 1
# Service event handler is enabled
event_handler_enabled 1
# Flap detection is enabled
flap_detection_enabled 1
# Process performance data
process_perf_data 1
# Retain status information across program restarts
retain_status_information 1
# Retain non-status information across program restarts
retain_nonstatus_information 1
# DONT REGISTER THIS DEFINITION – ITS NOT A REAL SERVICE, JUST A TEMPLATE!
register 0
}
# Service definition
define service{
# Name of service template to use
use generic-service
host_name freelinuxcd.org
service_description HTTP
is_volatile 0
check_period 24×7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups flcd-admins
notification_interval 120
notification_period 24×7
notification_options w,u,c,r
check_command check_http
}
# Service definition
define service{
# Name of service template to use
use generic-service
host_name freelinuxcd.org
service_description PING
is_volatile 0
check_period 24×7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups flcd-admins
notification_interval 120
notification_period 24×7
notification_options c,r
check_command check_ping!100.0,20%!500.0,60%
}
This setup configures monitoring for two services. The first service definition, which has been called HTTP, will monitor whether the web server is up and will notify you if there’s a problem. The second definition monitors the ping statistics from the server and notifies you if the response time or packet loss become too high. The commands used are check_http and check_ping, which were installed into the libexec directory during the plug-in installation. Please take your time to familiarize yourself with all other available plug-ins and configure them similarly to the previous example definitions.
Once you’re happy with your configuration, run Nagios with the -v switch one last time to make sure everything checks out. Then run it as a daemon by using the -d switch:
# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
That’s all there is to it. Give Nagios a couple of minutes to generate some data, and then point your browser to the machine and look at the pretty service warning lights.
Use Nagios to keep tabs on your network.
Since remote exploits can often crash the service that is being broken into or cause its CPU use to skyrocket, you should monitor the services that are running on your network. Just looking for an open port (such as by using Nmap [Hack #42]) isn’t enough. The machine may be able to respond to a TCP connect request, but the service may be unable to respond (or worse, could be replaced by a different program entirely!). One tool that can help you verify your services at a glance is Nagios (http://www.nagios.org).
Nagios is a network-monitoring application that monitors not only the services running on the hosts on your network, but also the resources on each host, such as CPU usage, disk space, memory usage, running processes, log files, and much more. In the advent of a problem it can notify you through email, pager, or any other method that you define, and you can check the status of your network at a glace by using the web GUI. Nagios is also easily extensible through its plug-in API.
To install Nagios, download the source distribution from the Nagios web site. Then, unpack the source distribution and go into the directory it creates:
$ tar xfz nagios-1.1.tar.gz
$ cd nagios-1.1
Before running Nagios’s configure script, you should create a user and group for Nagios to run as (e.g., nagios). Then run the configure script with a command similar to this:
$ ./configure –with-nagios-user=nagios –with-nagios-grp=nagios
This will install Nagios in /usr/local/nagios. As usual, you can modify this behavior by using the –prefix switch. After the configure script finishes, compile Nagios by running make all. Then become root and run make install to install it. In addition, you can optionally install Nagios’s initialization scripts by running make install-init.
If you take a look into the /usr/local/nagios directory right now, you will see that there are four directories. The bin directory contains a single file, nagios, that is the core of the package. This application does the actual monitoring. The sbin directory contains the CGI scripts that will be used in the web-based interface. Inside the share directory, you’ll find the HTML files and documentation. Finally, the var directory is where Nagios will store its information once it starts running.
Before you can use Nagios, you will need a couple of configuration files. These files go into the etc directory, which will be created when you run make install-config. This command also creates a sample copy of each required configuration file and puts them into the etc directory.
At this point the Nagios installation is complete. However, it is not very useful in its current state, because it lacks the actual monitoring applications. These applications, which check whether a particular monitored service is functioning properly, are called plug-ins. Nagios comes with a default set of plug-ins, but they must be downloaded and installed separately.
Download the latest Nagios Plugins package and decompress it. You will need to run the provided configure script to prepare the package for compilation on your system. You will find that the plug-ins are installed in a fashion similar to the actual Nagios program.
To compile the plug-ins, run commands similar to these:
$ ./configure –prefix=/usr/local/nagios \
–with-nagios-user=nagios –with-nagis-grp=nagios
$ make all
You might get notifications about missing programs or Perl modules while the script is running. These are mostly fine, unless you specifically need the mentioned applications to monitor a service.
After compilation is finished, become root and run make install to install the plug-ins. The plug-ins will be installed in the libexec directory of your Nagios base directory (e.g., /usr/local/nagios/libexec).
There are a few rules that all Nagios plug-ins should implement, making them suitable for use by Nagios. All plug-ins provide a –help option that displays information about the plug-in and how it works. This feature is very helpful when you’re trying to monitor a new service using a plug-in you haven’t used before.
For instance, to learn how the check_ssh plug-in works, run the following command:
$ /usr/local/nagios/libexec/check_ssh
check_ssh (nagios-plugins 1.4.0alpha1) 1.13
The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute
copies of the plugins under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
Copyright (c) 1999 Remi Paulmier <remi@sinfomic.fr>
Copyright (c) 2000-2003 Nagios Plugin Development Team
<nagiosplug-devel@lists.sourceforge.net>
Try to connect to SSH server at specified server and port
Usage: check_ssh [-46] [-t <timeout>] [-p <port>] <host>
check_ssh (-h | –help) for detailed help
check_ssh (-V | –version) for version information
Options:
-h, –help
Print detailed help screen
-V, –version
Print version information
-H, –hostname=ADDRESS
Host name or IP Address
-p, –port=INTEGER
Port number (default: 22)
-4, –use-ipv4
Use IPv4 connection
-6, –use-ipv6
Use IPv6 connection
-t, –timeout=INTEGER
Seconds before connection times out (default: 10)
-v, –verbose
Show details for command-line debugging (Nagios may truncate output)
Send email to nagios-users@lists.sourceforge.net if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to nagiosplug-devel@lists.sourceforge.net
Now that both Nagios and the plug-ins are installed, we are almost ready to begin monitoring our servers. However, Nagios will not even start before it’s configured properly.
The sample configuration files provide a good starting point:
$ cd /usr/local/nagios/etc
$ ls -1
cgi.cfg-sample
checkcommands.cfg-sample
contactgroups.cfg-sample
contacts.cfg-sample
dependencies.cfg-sample
escalations.cfg-sample
hostgroups.cfg-sample
hosts.cfg-sample
misccommands.cfg-sample
nagios.cfg-sample
resource.cfg-sample
services.cfg-sample
timeperiods.cfg-sample
Since these are sample files, the Nagios authors added a .cfg-sample suffix to each file. First, we need to copy or rename each one to end in .cfg, so that the software can use them properly. (If you don’t change the configuration filenames, Nagios will not be able to find them.)
You can either rename each file manually or use the following command to take care of them all at once. Type the following script on a single line:
# for i in *cfg-sample; do mv $i `echo $i | \
sed -e s/cfg-sample/cfg/`; done;
First there is the main configuration file, nagios.cfg. You can pretty much leave everything as is—the Nagios installation process will make sure the file paths used in the configuration file are correct. There’s one option, however, that you might want to change: check_external_commands, which is set to 0 by default. If you would like to be able to directly run commands through the web interface, you will want to set this to 1. Depending on your network environment, this may or may not be an acceptable security risk, as enabling this option will permit the execution of scripts from the web interface. Other options you need to set in cgi.cfg configure which usernames are allowed to run external commands.
To get Nagios running, you must modify all but a few of the sample configuration files. Configuring Nagios to monitor your servers is not as difficult as it looks. To help you, you can use the verbose mode of the Nagios binary by running:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
This command will go through the configuration files and report any errors. Start fixing the errors one by one, and run the command again to find the next error. For testing purposes, it is easiest to disable all hosts and services definitions in the sample configuration files and merely use the files as templates for your own hosts and services. You can keep most of the files as is, but remove the following, which will be created from scratch:
hosts.cfg
services.cfg
contacts.cfg
contactgroups.cfg
hostgroups.cfg
Start by configuring a host to monitor. We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one for the sake of simplicity.
Here are the contents of hosts.cfg:
# Generic host definition template
define host{
# The name of this host template – referenced i
name generic-host
n other host definitions, used for template recursion/resolution
# Host notifications are enabled
notifications_enabled 1
# Host event handler is enabled
event_handler_enabled 1
# Flap detection is enabled
flap_detection_enabled 1
# Process performance data
process_perf_data 1
# Retain status information across program restarts
retain_status_information 1
# Retain non-status information across program restarts
retain_nonstatus_information 1
# DONT REGISTER THIS DEFINITION – ITS NOT A REAL HOST,
# JUST A TEMPLATE!
register 0
}
# Host Definition
define host{
# Name of host template to use
use generic-host
host_name freelinuxcd.org
alias Free Linux CD Project Server
address www.freelinuxcd.org
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24×7
notification_options d,u,r
}
The first host defined is not a real host but a template from which other host definitions are derived. This mechanism can be seen in other configuration files and makes configuration based on a predefined set of defaults a breeze.
With this setup we are monitoring only one host, www.freelinuxcd.org, to see if it is alive. The host_name parameter is important because other configuration files will refer to this server by this name. Now the host needs to be added to a hostgroup, so that the application knows which contact group to send notifications to.
Here’s what hostgroups.cfg looks like:
define hostgroup{
hostgroup_name flcd-servers
alias The Free Linux CD Project Servers
contact_groups flcd-admins
members freelinuxcd.org
}
This defines a new hostgroup and associates the flcd-admins contact_group with it. Now you’ll need to define that contact group in contactgroups.cfg:
define contactgroup{
contactgroup_name flcd-admins
alias FreeLinuxCD.org Admins
members oktay, verty
}
Here the flcd-admins contact_group is defined with two members, oktay and verty. This configuration ensures that both users will be notified when something goes wrong with a server that flcd-admins is responsible for. The next step is to set the contact information and notification preferences for these users.
Here are the definitions for those two members in contacts.cfg:
define contact{
contact_name oktay
alias Oktay Altunergil
service_notification_period 24×7
host_notification_period 24×7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email,host-notify-by-epager
email oktay@freelinuxcd.org
pager dummypagenagios-admin@localhost.localdomain
}
define contact{
contact_name Verty
alias David ‘Verty’ Ky
service_notification_period 24×7
host_notification_period 24×7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email
email verty@flcd.org
}
In addition to providing contact details for a particular user, the contact_name in the contacts.cfg file is also used by the CGI scripts (i.e., the web interface) to determine whether a particular user is allowed to access a particular resource. Now that your hosts and contacts are configured, you can start to configure monitoring for individual services on your server.
This is done in services.cfg :
# Generic service definition template
define service{
# The ‘name’ of this service template, referenced in other service definitions
name generic-service
# Active service checks are enabled
active_checks_enabled 1
# Passive service checks are enabled/accepted
passive_checks_enabled 1
# Active service checks should be parallelized
# (disabling this can lead to major performance problems)
parallelize_check 1
# We should obsess over this service (if necessary)
obsess_over_service 1
# Default is to NOT check service ‘freshness’
check_freshness 0
# Service notifications are enabled
notifications_enabled 1
# Service event handler is enabled
event_handler_enabled 1
# Flap detection is enabled
flap_detection_enabled 1
# Process performance data
process_perf_data 1
# Retain status information across program restarts
retain_status_information 1
# Retain non-status information across program restarts
retain_nonstatus_information 1
# DONT REGISTER THIS DEFINITION – ITS NOT A REAL SERVICE, JUST A TEMPLATE!
register 0
}
# Service definition
define service{
# Name of service template to use
use generic-service
host_name freelinuxcd.org
service_description HTTP
is_volatile 0
check_period 24×7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups flcd-admins
notification_interval 120
notification_period 24×7
notification_options w,u,c,r
check_command check_http
}
# Service definition
define service{
# Name of service template to use
use generic-service
host_name freelinuxcd.org
service_description PING
is_volatile 0
check_period 24×7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups flcd-admins
notification_interval 120
notification_period 24×7
notification_options c,r
check_command check_ping!100.0,20%!500.0,60%
}
This setup configures monitoring for two services. The first service definition, which has been called HTTP, will monitor whether the web server is up and will notify you if there’s a problem. The second definition monitors the ping statistics from the server and notifies you if the response time or packet loss become too high. The commands used are check_http and check_ping, which were installed into the libexec directory during the plug-in installation. Please take your time to familiarize yourself with all other available plug-ins and configure them similarly to the previous example definitions.
Once you’re happy with your configuration, run Nagios with the -v switch one last time to make sure everything checks out. Then run it as a daemon by using the -d switch:
# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
That’s all there is to it. Give Nagios a couple of minutes to generate some data, and then point your browser to the machine and look at the pretty service warning lights.