简体   繁体   English

如何使用Kibana + Elastic Search检索字段的唯一计数

[英]How to retrieve unique count of a field using Kibana + Elastic Search

Is it possible to query for a distinct/unique count of a field using Kibana? 是否可以使用Kibana查询字段的不同/唯一计数? I am using elastic search as my backend to Kibana. 我使用弹性搜索作为我的Kibana后端。

If so, what is the syntax of the query? 如果是这样,查询的语法是什么? Heres a link to the Kibana interface I would like to make my query: http://demo.kibana.org/#/dashboard 下面是我想查询Kibana界面的链接: http ://demo.kibana.org/#/dashboard

I am parsing nginx access logs with logstash and storing the data into elastic search. 我正在使用logstash解析nginx访问日志并将数据存储到弹性搜索中。 Then, I use Kibana to run queries and visualize my data in charts. 然后,我使用Kibana运行查询并在图表中可视化我的数据。 Specifically, I want to know the count of unique IP addresses for a specific time frame using Kibana. 具体来说,我想知道使用Kibana的特定时间范围内唯一IP地址的数量。

For Kibana 4 go to this answer 对于Kibana 4,请转到这个答案

This is easy to do with a terms panel: 使用术语面板很容易做到这一点:

将条款面板添加到Kibana

If you want to select the count of distinct IP that are in your logs, you should specify in the field clientip , you should put a big enough number in length (otherwise, it will join different IP under the same group) and specify in the style table. 如果要选择日志中不同IP的计数,则应在字段clientip指定,应该在长度上放置足够大的数字(否则,它将在同一组下加入不同的IP)并在风格表。 After adding the panel, you will have a table with IP, and the count of that IP: 添加面板后,您将拥有一个包含IP的表,以及该IP的计数:

具有IP和计数的表

Now Kibana 4 allows you to use aggregations. 现在,Kibana 4允许您使用聚合。 Apart from building a panel like the one that was explained in this answer for Kibana 3, now we can see the number of unique IPs in different periods, that was (IMO) what the OP wanted at the first place. 除了建立一个像Kibana 3的答案中所解释的那样的小组,现在我们可以看到不同时期的独特IP的数量,这是(IMO)OP首先想要的。

To build a dashboard like this you should go to Visualize -> Select your Index -> Select a Vertical Bar chart and then in the visualize panel: 要构建这样的仪表板,您应该进入可视化 - >选择索引 - >选择垂直条形图,然后在可视化面板中:

  • In the Y axis we want the unique count of IPs (select the field where you stored the IP) and in the X axis we want a date histogram with our timefield. 在Y轴中,我们需要唯一的IP数(选择存储IP的字段),在X轴中我们需要一个带有时间字段的日期直方图。

构建可视化

  • After pressing the Apply button, we should have a graph that shows the unique count of IP distributed on time. 按下Apply按钮后,我们应该有一个图表,显示按时分配的IP的唯一计数。 We can change the time interval on the X axis to see the unique IPs hourly/daily... 我们可以在X轴上更改时间间隔,以便每小时/每天查看唯一的IP ...

最后的情节

Just take into account that the unique counts are approximate . 只要考虑到唯一的计数是近似的 For more information check also this answer . 有关更多信息,请查看此答案

Be aware with Unique count you are using 'cardinality' metric, which does not always guarantee exact unique count. 请注意您使用“基数”指标的唯一计数,这并不总能保证确切的唯一计数。 :-) :-)

the cardinality metric is an approximate algorithm. 基数度量是近似算法。 It is based on the HyperLogLog++ (HLL) algorithm. 它基于HyperLogLog ++(HLL)算法。 HLL works by hashing your input and using the bits from the hash to make probabilistic estimations on the cardinality. HLL通过散列您的输入并使用散列中的位来对基数进行概率估计。

Depending on amount of data I can get differences of 700+ entries missing in a 300k dataset via Unique Count in Elastic which are otherwise really unique. 根据数据量,我可以通过Elastic中的Unique Count获得300k数据集中缺失的700多个条目的差异,否则这些条目确实是唯一的。

Read more here: https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html 在此处阅读更多内容: https//www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html

Create "topN" query on "clientip" and then histogram with count on "clientip" and set "topN" query as source. 在“clientip”上创建“topN”查询,然后在“clientip”上计数直方图,并将“topN”查询设置为源。 Then you will see count of different ips per time. 然后你会看到每次不同的ips数。

Unique counts of field values are achieved by using facets. 通过使用facet实现唯一的字段值计数。 See ES documentation for the full story, but the gist is that you will create a query and then ask ES to prepare facets on the results for counting values found in fields. 有关完整的故事,请参阅ES文档 ,但要点是您将创建一个查询,然后要求ES在结果上准备facet以计算字段中找到的值。 It's up to you to customize the fields used and even describe how you want the values returned. 您可以自定义所使用的字段,甚至可以描述您希望返回值的方式。 The most basic of facet types is just to group by terms, which would be like an IP address above. 最基本的facet类型只是按术语分组,就像上面的IP地址一样。 You can get pretty complex with these, even requiring a query within your facet! 您可以使用这些内容变得相当复杂,甚至需要在您的方面进行查询!

{
    "query": {
        "match_all": {}
    },
    "facets": {
        "terms": {
            "field": "ip_address"
        }
    }
}

Using Aggs u can easily do that. 使用Aggs你可以很容易地做到这一点。 Writing down query for now. 现在写下查询。

GET index/_search
{
  "size":0,
  "aggs": {
    "source": {
      "terms": {
        "field": "field",
        "size": 100000
      }
    }
  }
 }

This would return the different values of field with there doc counts. 这将返回具有doc计数的field的不同值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM