How to change post_fail_delay parameter in RHEL 6 cluster without downtime ?

Post_fail_delay is very crucial component in redhat cluster configuration. We can change this parameter online in RHEL 6.

Most of the time we are changing this parameter so that kdump will get sufficient time to capture the dump.

Step 1 : Check the current parameter whether it is set or not. In my case it was not set.

[root@Node1 ~]# cat /etc/cluster/cluster.conf | grep -i post

Step 2 : I changed the parameter to 30s.

[root@Node1 ~]# ccs -f /etc/cluster/cluster.conf –setfencedaemon post_fail_delay=30

Step 3 : After making above change on one node we need to propagate this to all nodes of cluster using below command. My cluster is three node cluster.

[root@Node1 ~]# cman_tool -r version
You have not authenticated to the ricci daemon on 192.168.111.150
Password:
You have not authenticated to the ricci daemon on 192.168.111.151
Password:
You have not authenticated to the ricci daemon on 192.168.111.152
Password:

Step 4 : To verify that change brought it into effect. We can use either of two methods or both 🙂

[root@Node1 ~]# cat /etc/cluster/cluster.conf | grep -i post
<fence_daemon post_fail_delay=”30″/>

[root@Node1 ~]# fence_tool dump | grep -a -ie post_fail
1412581969 delay post_fail_delay 0 quorate_from_last_update 0
1412581969 192.168.111.152 not a cluster member after 0 sec post_fail_delay
1412586938 delay post_fail_delay 0 quorate_from_last_update 0
1412586938 192.168.111.151 not a cluster member after 0 sec post_fail_delay
1412587601 delay post_fail_delay 0 quorate_from_last_update 0
1412587601 192.168.111.152 not a cluster member after 0 sec post_fail_delay
1412589100 delay post_fail_delay 0 quorate_from_last_update 0
1412589100 192.168.111.152 not a cluster member after 0 sec post_fail_delay
1412589702 delay post_fail_delay 0 quorate_from_last_update 0
1412589702 192.168.111.152 not a cluster member after 0 sec post_fail_delay
1412591120 delay post_fail_delay 0 quorate_from_last_update 0
1412591120 192.168.111.152 not a cluster member after 0 sec post_fail_delay
1412591626 /cluster/fence_daemon/@post_fail_delay is 30

I used to have look at /var/log/message while making any change. Kindly find the below snippet during making this change from one of cluster node.

Oct  6 06:33:42 Node2 modcluster: Updating cluster.conf
Oct  6 06:33:47 Node2 corosync[1602]:   [QUORUM] Members[3]: 1 2 3
Oct  6 06:33:47 Node2 rgmanager[2320]: Reconfiguring
Oct  6 06:33:47 Node2 rgmanager[2320]: Loading Service Data
Oct  6 06:33:49 Node2 rgmanager[2320]: Stopping changed resources.
Oct  6 06:33:49 Node2 rgmanager[2320]: Restarting changed resources.
Oct  6 06:33:49 Node2 rgmanager[2320]: Starting changed resources.

Tip : How to fence the node using verbosity.

[root@Node1 ~]# fence_node -vv 192.168.111.152
fence 192.168.111.152 dev 0.0 agent fence_vmware_soap result: success
agent args: port=Red-Linux-3 ssl=on uuid=42132ed4-a929-c17a-ce5a-6a61e0df1b8a nodename=192.168.111.152 agent=fence_vmware_soap ipaddr=192.168.111.130 login=root passwd=root123
fence 192.168.111.152 success

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s