[英]How to quickly aggregate large amount of data
我需要聚合一段時間內新聞中的所有關鍵詞,例如:
{
"news_ID":"123456",
"news_content":"Apple pencil",
"keywords": {
[
{
"word" : "Apple",
"score" : 0.0653220043
},
{
"word" : "pencil",
"score" : 0.7096893191
}
]
},
"publish_time":"2020-01-03"
}
我想知道apple
在2020-01到2020-02之間出現了多少次,但是關鍵詞太多了...
您能否就我應該如何根據最佳實踐來處理此要求提出建議?
同步示例文檔:
PUT tester/_doc/1
{
"news_ID":"123456",
"news_content":"Apple pencil",
"keywords":[
"apple",
"pencil"
],
"publish_time":"2020-01-03"
}
在頂層使用帶有范圍過濾器的術語聚合:
GET tester/_search
{
"size": 0,
"query": {
"range": {
"publish_time": {
"gte": "2020-01-01",
"lt": "2020-02-01"
}
}
},
"aggs": {
"by_keywords": {
"terms": {
"field": "keywords.keyword"
}
}
}
}
您還可以使用過濾聚合來聚合多個月度存儲桶:
GET tester/_search
{
"size": 0,
"aggs": {
"2020-01_2020-02": {
"filter": {
"range": {
"publish_time": {
"gte": "2020-01-01",
"lt": "2020-02-01"
}
}
},
"aggs": {
"by_keywords": {
"terms": {
"field": "keywords.keyword"
}
}
}
},
"2020-02_2020-03": {
"filter": {
"range": {
"publish_time": {
"gte": "2020-02-01",
"lt": "2020-03-01"
}
}
},
"aggs": {
"by_keywords": {
"terms": {
"field": "keywords.keyword"
}
}
}
}
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.