简体   繁体   English

按文本字段对 elasticsearch 聚合桶进行排序

[英]Sort elasticsearch aggregation buckets by text field

I'm trying to sort the result buckets of an elasticsearch aggregation.我正在尝试对 elasticsearch 聚合的结果桶进行排序。 I have a large set of documents:我有一大堆文件:

"mappings": {
    "properties": {
        "price": {
            "type": "double"
        },
        "product_name": {
            "type": "text"
        },
        "product_id": {
            "type": "keyword"
        },
        "timestamp": {
            "type": "date"
        }
    }
}

What I'm currently doing is getting the latest sell for each product_id using composite and top_hits aggregations:我目前正在做的是使用compositetop_hits聚合获得每个product_id的最新销售:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "timestamp": {
                            "gte": "2019-10-25T00:00:00Z",
                            "lte": "2019-10-26T00:00:00Z"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "distinct_products": {
            "composite": {
                "sources": [
                    {
                        "distinct_ids": {
                            "terms": {
                                "field": "product_id"
                            }
                        }
                    }
                ],
                "size": 10000
            },
            "aggs": {
                "last_timestamp": {
                    "top_hits": {
                        "sort": {
                            "timestamp": {
                                "order": "desc"
                            }
                        },
                        "size": 1
                    }
                }
            }
        }
    }
}

Now I want to sort the resulting buckets by an arbitrary field.现在我想按任意字段对结果桶进行排序。 If I want to sort by price , I can use the solution in this question by adding a max aggregation which extracts the product_price field from each bucket, and a bucket_sort aggregation at the end which will sort the results of max :如果我想按price排序,我可以通过添加一个max聚合来使用这个问题中的解决方案,该聚合从每个存储桶中提取product_price字段,并在末尾添加一个bucket_sort聚合,它将对max的结果进行排序:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "timestamp": {
                            "gte": "2019-10-25T00:00:00Z",
                            "lte": "2019-10-26T00:00:00Z"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "distinct_products": {
            "composite": {
                "sources": [
                    {
                        "distinct_ids": {
                            "terms": {
                                "field": "product_id"
                            }
                        }
                    }
                ],
                "size": 10000
            },
            "aggs": {
                "last_timestamp": {
                    "top_hits": {
                        "sort": {
                            "timestamp": {
                                "order": "desc"
                            }
                        },
                        "size": 1,
                        "_source": {
                            "excludes": []
                        }
                    }
                },
                "latest_sell": {
                    "max": {
                        "field": "product_price"
                    }
                },
                "latest_sell_secondary": {
                    "max": {
                        "field": "timestamp"
                    }
                },
                "sort_sells": {
                    "bucket_sort": {
                        "sort": {
                            "latest_sell": {
                                "order": "desc"
                            },
                            "latest_sell_secondary": {
                                "order": "desc"
                            }
                        },
                        "from": 0,
                        "size": 10000
                    }
                }
            }
        }
    }
}

If I want to sort alphabetically by product_name instead of product_price , I cannot use the max aggregation since it only works on numeric fields.如果我想按product_name而不是product_price的字母顺序排序,我不能使用max聚合,因为它只适用于数字字段。

What can I do to sort the last_timestamp buckets (each with only one document) by a text field?我该怎么做才能按文本字段对last_timestamp存储桶(每个存储桶只有一个文档)进行排序?

The elasticsearch version I'm using is 7.2.0.我使用的 elasticsearch 版本是 7.2.0。

From docs来自文档

Each bucket may be sorted based on its _key, _count or its sub-aggregations每个桶可以根据其_key、_count或其子聚合进行排序

Instead of product Id you can use product_name.keyword in terms aggregation and sort on the key您可以使用 product_name.keyword 术语聚合和排序,而不是产品 ID

"order": { "_key" : "asc" }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM