如何使用 nagios 監控 elasticsearch

Question

我想使用 nagios 監控 elasticsearch。 基本上，我想知道 elasticsearch 是否啟動。

我想我可以使用 elasticsearch Cluster Health API（見這里）

並使用我返回的“狀態”（綠色、黃色或紅色），但我仍然不知道如何使用 nagios（nagios 在一台服務器上，elasticsearch 在另一台服務器上）。

有沒有另一種方法可以做到這一點？

編輯：我剛剛發現 - check_http_json 。 我想我會試試的。

Answer 1

一段時間后 - 我已經設法使用 nrpe 監控 elasticsearch。 我想使用 elasticsearch Cluster Health API - 但由於安全問題，我無法在另一台機器上使用它......所以，在監控服務器中我創建了一個新服務 - check_command 是check_command check_nrpe!check_elastic 。 現在在彈性搜索所在的遠程服務器中，我已經使用以下內容編輯了 nrpe.cfg 文件：

command[check_elastic]=/usr/local/nagios/libexec/check_http -H localhost -u /_cluster/health -p 9200 -w 2 -c 3 -s green

這是允許的，因為這個命令是從遠程服務器運行的 - 所以這里沒有安全問題......

有用！！！ 我仍然會嘗試我在 qeustion 中發布的 check_http_json 命令 - 但就目前而言，我的解決方案已經足夠好了。

Answer 2

在嘗試了這篇文章中的建議之后，我編寫了一個簡單的check_elasticsearch腳本。 它返回狀態為OK 、 WARNING和CRITICAL與集群健康響應中的“status”參數對應（分別為“green”、“yellow”和“red”）。

它還從健康頁面獲取所有其他參數，並以標准 Nagios 格式將它們轉儲出來。

享受！

Answer 3

無恥插件： https : //github.com/jersten/check-es

您可以將它與 ZenOSS/Nagios 結合使用來監控集群健康狀況、數據索引和單個節點堆的使用情況。

Answer 4

您可以使用這個很酷的 Python 腳本來監控您的 Elasticsearch 集群。 此腳本檢查您的 IP:port 以獲取 Elasticsearch 狀態。 可以在此處找到用於監控 Elasticsearch 的這一和多個 Python 腳本。

#!/usr/bin/python
from nagioscheck import NagiosCheck, UsageError
from nagioscheck import PerformanceMetric, Status
import urllib2
import optparse

try:
    import json
except ImportError:
    import simplejson as json


class ESClusterHealthCheck(NagiosCheck):

    def __init__(self):

        NagiosCheck.__init__(self)

        self.add_option('H', 'host', 'host', 'The cluster to check')
        self.add_option('P', 'port', 'port', 'The ES port - defaults to 9200')

    def check(self, opts, args):
        host = opts.host
        port = int(opts.port or '9200')

        try:
            response = urllib2.urlopen(r'http://%s:%d/_cluster/health'
                                       % (host, port))
        except urllib2.HTTPError, e:
            raise Status('unknown', ("API failure", None,
                         "API failure:\n\n%s" % str(e)))
        except urllib2.URLError, e:
            raise Status('critical', (e.reason))

        response_body = response.read()

        try:
            es_cluster_health = json.loads(response_body)
        except ValueError:
            raise Status('unknown', ("API returned nonsense",))

        cluster_status = es_cluster_health['status'].lower()

        if cluster_status == 'red':
            raise Status("CRITICAL", "Cluster status is currently reporting as "
                         "Red")
        elif cluster_status == 'yellow':
            raise Status("WARNING", "Cluster status is currently reporting as "
                         "Yellow")
        else:
            raise Status("OK",
                         "Cluster status is currently reporting as Green")

if __name__ == "__main__":
    ESClusterHealthCheck().run()

Answer 5

我在一百萬年前寫了這個，它可能仍然有用： https : //github.com/radu-gheorghe/check-es

但這實際上取決於您要監視的內容。 以上措施：

如果 Elasticsearch 響應 HTTP
如果攝取率低於定義的水平
如果文檔總數下降到定義的水平

但當然還有更多有趣的事情。 從查詢時間到 JVM 堆使用情況。 我們在這里寫了一篇關於最重要的博客文章： https : //sematext.com/blog/top-10-elasticsearch-metrics-to-watch/

Elasticsearch 具有所有這些的 API，因此您可以使用通用的check_http_json來獲取所需的指標。 或者，您可能希望使用Sematext Monitoring for Elasticsearch 之類的東西，它可以開箱即用地獲取這些指標，然后將閾值/異常警報轉發到 Nagios 。 （披露：我為 Sematext 工作）

如何使用 nagios 監控 elasticsearch

問題描述

5 個解決方案

解決方案1
14 已采納 2012-05-09 12:28:28

解決方案2
6 2012-09-21 23:46:50

解決方案3
2 2014-11-03 23:09:06

解決方案4
2 2016-09-14 09:17:57

解決方案5
0 2020-03-05 11:03:17

如何使用 nagios 監控 elasticsearch

問題描述

5 個解決方案

解決方案1 14 已采納 2012-05-09 12:28:28

解決方案2 6 2012-09-21 23:46:50

解決方案3 2 2014-11-03 23:09:06

解決方案4 2 2016-09-14 09:17:57

解決方案5 0 2020-03-05 11:03:17

解決方案1
14 已采納 2012-05-09 12:28:28

解決方案2
6 2012-09-21 23:46:50

解決方案3
2 2014-11-03 23:09:06

解決方案4
2 2016-09-14 09:17:57

解決方案5
0 2020-03-05 11:03:17