簡體   English   中英

如何快速聚合大量數據

[英]How to quickly aggregate large amount of data

我需要聚合一段時間內新聞中的所有關鍵詞,例如:

{
  "news_ID":"123456",
  "news_content":"Apple pencil",
  "keywords": {
      [
        {
         "word" : "Apple",
         "score" : 0.0653220043
        },
        {
         "word" : "pencil",
         "score" : 0.7096893191
        }
      ]
    },
  "publish_time":"2020-01-03"
}

我想知道apple在2020-01到2020-02之間出現了多少次,但是關鍵詞太多了...

您能否就我應該如何根據最佳實踐來處理此要求提出建議?

同步示例文檔:

PUT tester/_doc/1
{
  "news_ID":"123456",
  "news_content":"Apple pencil",
  "keywords":[
    "apple",
    "pencil"
  ],
  "publish_time":"2020-01-03"
}

在頂層使用帶有范圍過濾器的術語聚合

GET tester/_search
{
  "size": 0,
  "query": {
    "range": {
      "publish_time": {
        "gte": "2020-01-01",
        "lt": "2020-02-01"
      }
    }
  },
  "aggs": {
    "by_keywords": {
      "terms": {
        "field": "keywords.keyword"
      }
    }
  }
}

您還可以使用過濾聚合來聚合多個月度存儲桶:

GET tester/_search
{
  "size": 0,
  "aggs": {
    "2020-01_2020-02": {
      "filter": {
        "range": {
          "publish_time": {
            "gte": "2020-01-01",
            "lt": "2020-02-01"
          }
        }
      },
      "aggs": {
        "by_keywords": {
          "terms": {
            "field": "keywords.keyword"
          }
        }
      }
    },
    "2020-02_2020-03": {
      "filter": {
        "range": {
          "publish_time": {
            "gte": "2020-02-01",
            "lt": "2020-03-01"
          }
        }
      },
      "aggs": {
        "by_keywords": {
          "terms": {
            "field": "keywords.keyword"
          }
        }
      }
    }
  }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM