Heartbeat configuration requires three files located in
/etc/ha.d
. The ha.cf
contains the main heartbeat configuration, including the list of
the nodes and times for identifying failures.
haresources
contains the list of resources to
be managed within the cluster. The authkeys
file contains the security information for the cluster.
The contents of these files should be identical on each host within the Heartbeat cluster. It is important that you keep these files in sync across all the hosts. Any changes in the information on one host should be copied to the all the others.
For these examples n example of the ha.cf
file is shown below:
logfacility local0 keepalive 500ms deadtime 10 warntime 5 initdead 30 mcast bond0 225.0.0.1 694 2 0 mcast bond1 225.0.0.2 694 1 0 auto_failback off node drbd1 node drbd2
The individual lines in the file can be identified as follows:
logfacility
— sets the logging, in
this case setting the logging to use
syslog.
keepalive
— defines how frequently
the heartbeat signal is sent to the other hosts.
deadtime
— the delay in seconds before
other hosts in the cluster are considered 'dead' (failed).
warntime
— the delay in seconds
before a warning is written to the log that a node cannot be
contacted.
initdead
— the period in seconds to
wait during system startup before the other host is considered
to be down.
mcast
— defines a method for sending
a heartbeat signal. In the above example, a multicast network
address is being used over a bonded network device. If you
have multiple clusters then the multicast address for each
cluster should be unique on your network. Other choices for
the heartbeat exchange exist, including a serial connection.
If you are using multiple network interfaces (for example, one interface for your server connectivity and a secondary and/or bonded interface for your DRBD data exchange) then you should use both interfaces for your heartbeat connection. This decreases the chance of a transient failure causing a invalid failure event.
auto_failback
— sets whether the
original (preferred) server should be enabled again if it
becomes available. Switching this to on
may
cause problems if the preferred went offline and then comes
back on line again. If the DRBD device has not been synced
properly, or if the problem with the original server happens
again you may end up with two different datasets on the two
servers, or with a continually changing environment where the
two servers flip-flop as the preferred server reboots and then
starts again.
node
— sets the nodes within the
Heartbeat cluster group. There should be one
node
for each server.
An optional additional set of information provides the
configuration for a ping test that will check the connectivity to
another host. You should use this to ensure that you have
connectivity on the public interface for your servers, so the ping
test should be to a reliable host such as a router or switch. The
additional lines specify the destination machine for the
ping
, which should be specified as an IP
address, rather than a host name; the command to run when a
failure occurs, the authority for the failure and the timeout
before an nonresponse triggers a failure. A sample configure is
shown below:
ping 10.0.0.1 respawn hacluster /usr/lib64/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster deadping 5
In the above example, the ipfail command, which
is part of the Heartbeat solution, is called on a failure and
'fakes' a fault on the currently active server. You need to
configure the user and group ID under which the command should be
executed (using the apiauth
). The failure will
be triggered after 5 seconds.
The deadping
value must be less than the
deadtime
value.
The authkeys
file holds the authorization
information for the Heartbeat cluster. The authorization relies on
a single unique 'key' that is used to verify the two machines in
the Heartbeat cluster. The file is used only to confirm that the
two machines are in the same cluster and is used to ensure that
the multiple clusters can co-exist within the same network.