Wednesday, August 23, 2017

Monitoring My Pi Cluster WIth Nagios, Part II - Adding the Pi's

In Part 1, I compiled Nagios from source, installed it and got it running.

Here,  I'm going to describe how I added my Pi's.


It's hard toknow where to start first, the config files are inter-related. I guess I'll start where I started, but I'd imagine it will be confusing because it didn't work. So bear with me and I'll try my best to explain as I go.


The config files are located in /usr/local/nagios/etc. If I refer to a config file - it will be in that directory, or a sub-directory. Inside that directory is a sub-directory called objects. I started out by copying the file localhost.cfg to new_server.cfg. I then changed the IP address and host_name properties and tried restarting Nagios.

Nagios ignored the file completely.

I had to let Nagios know that the file existed and to use it. I opened nagios.cfg and copied the line for localhost.cfg - substituting new_server.cfg for localhost.cfg. The line looks like: cfg_file=/usr/local/nagios/etc/objects/new_server.cfg.

Nagios did not like that file and would not restart. If I remember correctly, the error was pretty generic - "bad config file" or something similar. The reason was that inside localhost.cfg it defines a hostgroup. The new server was in the same hostgroup as localhost (as defined in both config files). So, I created a new hostgroup. The other way I could have solved this would have been to add new_server to the linux-servers hostgroup in localhost.cfg. A third way would have been to remove the hostgroup definition from new_server.cfg completely.

What you need to take away from that first error is this:
1) The definitions must be unique.
2) All elements must be defined.

This is from the localhost.cfg file. I changed the host_name and alias, but notice that the hostgroup is in the same file as the host, and that the members of the hostgroup has only this host. What I'm driving at is that they match. 

# Define a host for the local machine

define host{
        use                     linux-server            ; Name of host template to use
                                                        ; This host definition will inherit all variables that are defined
                                                        ; in (or inherited by) the linux-server host template definition.
        host_name               Pine64_Nagios_Server
        alias                   Pine64 Nagios Server
        address                 127.0.0.1
        }

# Define an optional hostgroup for Linux machines

define hostgroup{
        hostgroup_name  linux-servers ; The name of the hostgroup
        alias           Linux Servers ; Long name of the group
        members         Pine64_Nagios_Server     ; Comma separated list of hosts that belong to this group
        }

So - a little planning can go a long way. If you want to group your servers (nice for display purposes, and easier to wrap your head around), make sure that they are defined in a hostgroup. You can create a separate file for the hostgroup and put them in there (e.g. pi_servers_hostgroup.cfg).

Then, by reading the comments in nagios.cfg, I found a better way - I created a directory called servers and uncommented the line in nagios.cfg that causes nagios to read all files in that directory: cfg_dir=/usr/local/nagios/etc/servers.

I mv'd new_server.cfg over to the servers directory and restarted nagios. It worked as expected. I then created a server config file for each of the servers that I wanted to monitor. I added all of the servers to the hostgroup - which is now a separate file. Pretty much, I just copied the working server config file, changed the IP address, and,  with vim - ':%s/old_hostname/new_hostname/g' and saved it.

Now all the servers have been added - but the checks don't work. Ping and ssh work - but all of the others are really only checking the nagios host. Next post - I'll explain ssh_check and how to monitor remote servers.

No comments:

Post a Comment