简体   繁体   English

弹性搜索聚合和范围

[英]Elastic search aggregation and range

We want to aggregate some value. 我们希望聚集一些价值。 For example sake, let's say we are indexing users, who register in an organization. 例如,假设我们正在索引在组织中注册的用户。

We want to get the registered users count splitted like : 我们想让注册用户数像这样分裂:

  • registered from gmail : 900 从gmail注册:900
  • registered via fb : 800 通过fb注册:800
  • registered via yahoo : 700 通过雅虎注册:700
  • registered via own application : 1500 通过自己的应用程序注册:1500
  • registered via others : 1600 通过他人注册:1600

Expected we need to bucket 0 to 1000 users(gmail,fb,yahoo - 3 applications).And 1001 to 2000(own app,other app - 2 applications).Need to bucket like above scenario. 预期我们需要存储0到1000个用户(gmail,fb,yahoo-3个应用程序)。以及1001到2000个用户(自己的应用程序,其他应用程序-2个应用程序)。

How do we achieve this in elastic search? 我们如何在弹性搜索中实现这一目标? Any suggestions ? 有什么建议么 ?

Thanks 谢谢

Let's say you are indexing user object looks like this : 假设您正在索引用户对象,如下所示:

POST users/user
{
  "login":"user1",
  "organization":"fb"
}

You are trying to aggregate your users by their organization value. 您正在尝试通过用户organization价值来汇总他们。 For this purpose, you have to use a terms aggregation. 为此,您必须使用terms汇总。

Your query will look like : 您的查询将如下所示:

POST users/_search?search_type=count
{
  "aggs": {
    "by_organization": {
      "terms": {
        "field": "organization"
      }
    }
  }
}

Note: the search_type=count is here only to have a shorter response as result hits won't be returned (see here ). 注意:此处的search_type = count响应时间较短,因为不会返回结果匹配(请参阅此处 )。

Your search response will be something like : 您的搜索响应将类似于:

{
   (...)
   "aggregations": {
      "by_organization": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "app",
               "doc_count": 4
            },
            {
               "key": "fb",
               "doc_count": 3
            },
            {
               "key": "gmail",
               "doc_count": 2
            }
         ]
      }
   }
}

You can see the buckets corresponding to the each organization value. 您可以看到与每个组织值相对应的存储桶。

Be aware that: 意识到:

  • Only the top 10 buckets will be returned by default (see size parameter of the terms aggregation) 默认情况下,仅返回前10个存储桶(请参阅terms聚合”的size参数)
  • This simple example works as the organization values are simple, but in real life, you will have to set your organization field to not_analyzed in order to aggregate on the original value (and not the terms obtained via analysis) 这个简单的示例可以简化组织价值,但是在现实生活中,您必须将组织字段设置为not_analyzed才能汇总原始价值(而不是通过分析获得的条件)

I strongly invite you to read more about analysis, and the terms aggregation documentation . 我强烈邀请您阅读有关分析的更多信息,以及terms汇总文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM