keepalived过渡未按预期发生

Question

I am trying to implement keepalived based failover for my service. 我正在尝试为我的服务实施基于keepalived的故障转移。 Please find below my configurations for the master and backup nodes. 请在下面找到我的主节点和备份节点的配置。

Master node: 主节点：

vrrp_script chk_splunkd {
    script "pidof splunkd"
    interval 2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    advert_int 1
    virtual_router_id 51
    priority 200
    nopreempt
    smtp_alert
    authentication {
            auth_type PASS
            auth_pass passme
    }
    virtual_ipaddress {
            10.126.246.245
    }
    track_script {
            chk_splunkd
    }
    notify_master /etc/keepalived/scripts/master.sh
    notify_backup /etc/keepalived/scripts/stop_service.sh
    notify_fault /etc/keepalived/scripts/stop_service.sh
}

Back up node: 备份节点：

vrrp_script chk_splunkd {
    script "pidof splunkd"
    interval 2
    fall 2
    rise 2
}
vrrp_instance VI_1 {
    interface eth0
    state BACKUP
    advert_int 1
    virtual_router_id 51
    priority 100
    nopreempt
    smtp_alert
    authentication {
            auth_type PASS
            auth_pass passme
    }

    virtual_ipaddress {
           10.126.246.245
    }
    track_script {
            chk_splunkd
    }
    notify_master /etc/keepalived/scripts/master.sh
    notify_backup /etc/keepalived/scripts/stop_service.sh
    notify_fault /etc/keepalived/scripts/stop_service.sh
}

However, I find that even when one node goes into fault state and stops sending VRRP advertisements, the other node doesn't automatically transition to master state. 但是，我发现即使一个节点进入故障状态并停止发送VRRP通告，另一节点也不会自动过渡到主状态。 When I tried to monitor the VRRP advertisement packets using tcpdump -vv -i eth0 vrrp I find that even after the advertisement from one node stops, the other node doesn't automatically start sending the advertisements indicating that it has now become the master. 当我尝试使用tcpdump -vv -i eth0 vrrp监视VRRP通告数据包时，我发现即使来自一个节点的通告停止了，另一个节点也不会自动开始发送通告，表明它已经成为主节点。

Please help me find out what I'm missing. 请帮助我找出我所缺少的。

Thanks, 谢谢，

Keerthana Keerthana

Answer 1

The issue was that during startup when one node became the master, the other one went into fault mode due to the pidof splunkd command which will return 1 as my splunk service should be up on only the master node. 问题在于，在启动过程中，当一个节点成为主节点时，另一个节点由于pidof splunkd命令而进入故障模式，该命令将返回1，因为我的splunk服务应仅在主节点上启动。 Once I edited the notify script to write current state to an external file and read the state to take action in my notify scripts, things started working fine. 一旦我编辑了通知脚本以将当前状态写入外部文件并读取该状态以在通知脚本中采取措施，一切就开始正常工作。

keepalived过渡未按预期发生

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-06-05 11:39:47

keepalived过渡未按预期发生

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-06-05 11:39:47

解决方案1
0 已采纳 2017-06-05 11:39:47