简体   繁体   中英

(Elasticsearch) How to get the last element of a nested field of all documents then perform sub-aggregations

I have an index called socialmedia and trying to create queries with this field called eng (omitted some unnecessary fields)

"id" : "1",
"eng": 
[
{
  "soc_mm_score" : "3",
  "date_updated" : "1520969306",
},
{
  "soc_mm_score" : "1",
  "date_updated" : "1520972191",
},
{
  "soc_mm_score" : "4",
  "date_updated" : "1520937222",
}
]

I have a lot of documents from this index that contains eng nested field that also contains a lot of "sub-objects"

Now, my main goal is, what Elasticsearch query should I formulate to filter out these nested objects

STEP 1
Get the nested object with the highest date_updated value

STEP 2
After getting those nested objects, perform a sum aggregation so I could add all the values of the soc_mm_score field for the corresponding "latest nested object"

I have tried this query but seems to fail

ATTEMPT # 1 (I'm using elasticsearch-php API so please trust my query that it's working with this format)

'aggs' => [
    'ENG' => [
        'nested' => [
            'path' => 'eng'
        ],
        'aggs' => [
            'FILTER' => [
                'filter' => [
                    'bool' => [
                        'must' => [
                            [
                                // I'm thinking of using max aggregation here
                            ]
                        ]
                    ]
                ]
            ]
            'LATEST' => [
                'top_hits' => [
                    'size' => 1,
                    'sort' => [
                        'eng.date_updated' => [
                            'order' => 'desc'
                        ]
                    ]
                ]
            ]
        ]
    ]
]

PRO/S: it is returning the correct nested object CON/S: I cannot perform further aggregations

Sample Output
输出 1

Then I tried adding sub-aggregation
输出 2

Then this is the output 输出 3

Is there any other ways that I can perform this?

To review my ideal steps:

  1. Access my eng nested field
  2. Get the "latest" / most recent element for that eng nested field (indicated by the element with the highest value of date_updated field)
  3. Now, after getting those "most recent" nested elements, make sub-aggregations for its sibling nested fields, for example: getting the sum of the soc_like_count or soc_share_count of all the most recent element of the eng field

Formulated an answer!

"aggs":{
        "LATEST": {
            "scripted_metric": {
                "init_script" : """
                  state.te = []; 
                  state.g = 0;
                  state.d = 0;
                  state.a = 0;
                """, 
                "map_script" : """
                  if(state.d != doc['_id'].value){
                      state.d = doc['_id'].value;
                      state.te.add(state.a);
                      state.g = 0;
                      state.a = 0;
                  } 
                  if(state.g < doc['eng.date_updated'].value){ 
                    state.g = doc['eng.date_updated'].value; 
                    state.a = doc['eng.soc_te_score'].value;
                  }
                  """,
                "combine_script" : """
                    state.te.add(state.a);
                    double count = 0; 
                    for (t in state.te) { 
                      count += t 
                    }

                    return count
                  """,
                "reduce_script" : """
                    double count = 0; 
                    for (a in states) { 
                      count += a 
                    }

                    return count
                """
            }
        }
      }

Metric aggregations can't support sub-aggregations and top_hits is a metric aggregations.

One solution is to do the summing after you get the results from elasticsearch.

I created something that might be helpful but you will have to customize it to your needs.

Assuming your mappins

{ 
"my_index": {
    "mappings": {
      "doc": {
        "properties": {
          "eng": {
            "type": "nested",
            "properties": {
              "date_updated": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "soc_like_count": {
                "type": "long"
              },
              "soc_mm_score": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "id": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

The query

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "ENG": {
      "nested": {
        "path": "eng"
      },
      "aggs": {
        "sum_soc_top_hits_by_date": {
          "scripted_metric": {
            "init_script": "params._agg.map = new HashMap();params._agg.results = new HashMap();params._agg.size = 1;params._agg.date_arr = null",
            "map_script": "params._agg.map[doc['eng.date_updated.keyword'].value] = doc['eng.soc_like_count'].value;params._agg.date_arr = new ArrayList(params._agg.map.keySet());Collections.sort(params._agg.date_arr, Collections.reverseOrder())",
            "combine_script": "params._agg.size = params._agg.size > params._agg.date_arr.length - 1 ?  params._agg.date_arr.length : params._agg.size;double soc= 0; for (t in params._agg.date_arr.subList(0,params._agg.size)) { params._agg.results[t] = params._agg.map[t];soc += params._agg.map[t]}params._agg.results.total = soc; return params._agg.results",
            "reduce_script": "return params._aggs"
          }
        }
      }
    }
  }
}

Change params._agg.size = 1 to change number of top hits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM