简体   繁体   中英

Elasticsearch: Which Nested mapping I should use to store data after aggregation result

Let's say I have a lot of elastic docs as following:

{
        "_index": "f2016-07-17",
        "_type": "trkvjadsreqpxl.gif",
        "_id": "AVX2N3dl5siG6SyfyIjb",
        "_score": 1,
        "_source": {
          "time": "1468714676424",
          "meta": {
            "cb_id": 25681,
            "mt_id": 649,
            "c_id": 1592,
            "revenue": 2.5,
            "mt_name": "GMS-INAPP-EN-2.5",
            "c_description": "COULL-INAPP-EN-2.5",
            "domain": "wv.inner-active.mobi",
            "master_domain": "649###wv.inner-active.mobi",
            "child_domain": "1592###wv.inner-active.mobi",
            "combo_domain": "25681###wv.inner-active.mobi",
            "ip": "52.42.87.73"
          }
        }....
      }

My purpose is to make simple histogram aggregation with term aggs' ,and insert back the aggregated result into new index/structure.

The Aggregation is:

{
  "aggs": {
    "hour":{
      "date_histogram": {
        "field": "time",
        "interval": "hour"
      },
      "aggs":{
            "hour_m_tag":{
               "terms":{
                  "field":"meta.mt_id"
               }
            }
         }
    }
  }
} 

The Result is as expected:

"aggregations": {
    "hour": {
      "buckets": [
        {
          "key_as_string": "2016-07-17T00:00:00.000Z",
          "key": 1468713600000,
          "doc_count": 94411750,
          "hourly_m_tag": {
            "doc_count_error_upper_bound": 1485,
            "sum_other_doc_count": 30731646,
            "buckets": [
              {
                "key": 10,
                "doc_count": 10175501
              },
              {
                "key": 649,
                "doc_count": 200000
              }....
            ]
          }
        },
        {
          "key_as_string": "2016-07-17T01:00:00.000Z",
          "key": 1468717200000,
          "doc_count": 68738743,
          "hourly_m_tag": {
            "doc_count_error_upper_bound": 2115,
            "sum_other_doc_count": 22478590,
            "buckets": [
              {
                "key": 559,
                "doc_count": 8307018
              },
              {
                "key": 649,
                "doc_count" :100000
              }...

My Question

I want to parse the result which is no problem , and store it back into new index,

What Nested mapping should I use on the new Index in order to fetch the aggregated data later.

Expected data structure:

{
  "hour": [
    {
      "time": "00:00",
      "child_tag": {
        "300": 100,
        "310": 200
      },
      "master_tag": {
        "1000": 300,
         "1001": 400
        "1010": 400
      }
    },
    {
      "time": "01:00",
      "child_tag": {
        "300": 500,
        "310": 600
      },
      "master_tag": {
        "1000": 700,
        "1010": 800
      }
    }

  ]...
}

PS

The aggregation later should make sum on master_tag/child_tag keys: between hours.

for instance: query between 00:00-01:00

{

      "child_tag": {
        "300": 600,//100+500
        "310": 800 //200+600
      },
      "master_tag": {
        "1000": 1000, //300+700
         "1001": 400
        "1010": 1200 //400+800
      }
    }

Thanks a lot!

According to your comment and edits, I suggest storing one document per hour in your new index, so it'll be easier to query documents based on specific hours.

The mapping I suggest is as follows:

PUT /agg_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "time": {
          "type": "date",
          "format": "HH:mm"
        },
        "child_tag": {
          "type": "nested"
        },
        "master_tag": {
          "type": "nested"
        }
      }
    }
  }
}

Then you can index your new documents like this:

PUT /agg_index/doc/1
{
  "time": "00:00",
  "child_tag": {
    "300": 100,
    "310": 200
  },
  "master_tag": {
    "1000": 300,
    "1001": 400,
    "1010": 400
  }
}

PUT /agg_index/doc/2
{
  "time": "01:00",
  "child_tag": {
    "300": 500,
    "310": 600
  },
  "master_tag": {
    "1000": 700,
    "1010": 800
  }
}

And you'll be able to query documents and run aggregations on the nested child_tag and master_tag elements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM