简体   繁体   English

Elasticsearch:聚合结果后我应该使用哪个嵌套映射来存储数据

[英]Elasticsearch: Which Nested mapping I should use to store data after aggregation result

Let's say I have a lot of elastic docs as following: 假设我有很多弹性文档,如下所示:

{
        "_index": "f2016-07-17",
        "_type": "trkvjadsreqpxl.gif",
        "_id": "AVX2N3dl5siG6SyfyIjb",
        "_score": 1,
        "_source": {
          "time": "1468714676424",
          "meta": {
            "cb_id": 25681,
            "mt_id": 649,
            "c_id": 1592,
            "revenue": 2.5,
            "mt_name": "GMS-INAPP-EN-2.5",
            "c_description": "COULL-INAPP-EN-2.5",
            "domain": "wv.inner-active.mobi",
            "master_domain": "649###wv.inner-active.mobi",
            "child_domain": "1592###wv.inner-active.mobi",
            "combo_domain": "25681###wv.inner-active.mobi",
            "ip": "52.42.87.73"
          }
        }....
      }

My purpose is to make simple histogram aggregation with term aggs' ,and insert back the aggregated result into new index/structure. 我的目的是使用术语aggs进行简单的直方图聚合,然后将聚合结果重新插入新的索引/结构中。

The Aggregation is: 聚合为:

{
  "aggs": {
    "hour":{
      "date_histogram": {
        "field": "time",
        "interval": "hour"
      },
      "aggs":{
            "hour_m_tag":{
               "terms":{
                  "field":"meta.mt_id"
               }
            }
         }
    }
  }
} 

The Result is as expected: 结果符合预期:

"aggregations": {
    "hour": {
      "buckets": [
        {
          "key_as_string": "2016-07-17T00:00:00.000Z",
          "key": 1468713600000,
          "doc_count": 94411750,
          "hourly_m_tag": {
            "doc_count_error_upper_bound": 1485,
            "sum_other_doc_count": 30731646,
            "buckets": [
              {
                "key": 10,
                "doc_count": 10175501
              },
              {
                "key": 649,
                "doc_count": 200000
              }....
            ]
          }
        },
        {
          "key_as_string": "2016-07-17T01:00:00.000Z",
          "key": 1468717200000,
          "doc_count": 68738743,
          "hourly_m_tag": {
            "doc_count_error_upper_bound": 2115,
            "sum_other_doc_count": 22478590,
            "buckets": [
              {
                "key": 559,
                "doc_count": 8307018
              },
              {
                "key": 649,
                "doc_count" :100000
              }...

My Question 我的问题

I want to parse the result which is no problem , and store it back into new index, 我想解析没有问题的结果,并将其存储回新索引中,

What Nested mapping should I use on the new Index in order to fetch the aggregated data later. 我应该在新Index上使用什么嵌套映射,以便稍后获取聚合数据。

Expected data structure: 预期的数据结构:

{
  "hour": [
    {
      "time": "00:00",
      "child_tag": {
        "300": 100,
        "310": 200
      },
      "master_tag": {
        "1000": 300,
         "1001": 400
        "1010": 400
      }
    },
    {
      "time": "01:00",
      "child_tag": {
        "300": 500,
        "310": 600
      },
      "master_tag": {
        "1000": 700,
        "1010": 800
      }
    }

  ]...
}

PS 聚苯乙烯

The aggregation later should make sum on master_tag/child_tag keys: between hours. 稍后的聚合应在master_tag / child_tag密钥上求和:小时之间。

for instance: query between 00:00-01:00 例如:00:00-01:00之间的查询

{

      "child_tag": {
        "300": 600,//100+500
        "310": 800 //200+600
      },
      "master_tag": {
        "1000": 1000, //300+700
         "1001": 400
        "1010": 1200 //400+800
      }
    }

Thanks a lot! 非常感谢!

According to your comment and edits, I suggest storing one document per hour in your new index, so it'll be easier to query documents based on specific hours. 根据您的评论和修改,我建议每小时在新索引中存储一个文档,这样可以更轻松地根据特定时间查询文档。

The mapping I suggest is as follows: 我建议的映射如下:

PUT /agg_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "time": {
          "type": "date",
          "format": "HH:mm"
        },
        "child_tag": {
          "type": "nested"
        },
        "master_tag": {
          "type": "nested"
        }
      }
    }
  }
}

Then you can index your new documents like this: 然后,您可以像这样对新文档建立索引:

PUT /agg_index/doc/1
{
  "time": "00:00",
  "child_tag": {
    "300": 100,
    "310": 200
  },
  "master_tag": {
    "1000": 300,
    "1001": 400,
    "1010": 400
  }
}

PUT /agg_index/doc/2
{
  "time": "01:00",
  "child_tag": {
    "300": 500,
    "310": 600
  },
  "master_tag": {
    "1000": 700,
    "1010": 800
  }
}

And you'll be able to query documents and run aggregations on the nested child_tag and master_tag elements. 您将能够查询文档并在嵌套的child_tagmaster_tag元素上运行聚合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM