如何快速聚合大量数据

Question

I need to aggregate all the keywords in the news for a period of time, for example:我需要聚合一段时间内新闻中的所有关键词，例如：

{
  "news_ID":"123456",
  "news_content":"Apple pencil",
  "keywords": {
      [
        {
         "word" : "Apple",
         "score" : 0.0653220043
        },
        {
         "word" : "pencil",
         "score" : 0.7096893191
        }
      ]
    },
  "publish_time":"2020-01-03"
}

I want to know how many times apple appeared between 2020-01 and 2020-02, but there are too many keywords...我想知道apple在2020-01到2020-02之间出现了多少次，但是关键词太多了...

Could you please advise me on how I should approach this requirement as per best practices?您能否就我应该如何根据最佳实践来处理此要求提出建议？

Answer 1

Syncing a sample doc:同步示例文档：

PUT tester/_doc/1
{
  "news_ID":"123456",
  "news_content":"Apple pencil",
  "keywords":[
    "apple",
    "pencil"
  ],
  "publish_time":"2020-01-03"
}

Using a terms aggregation w/ a range filter on the top level:在顶层使用带有范围过滤器的术语聚合：

GET tester/_search
{
  "size": 0,
  "query": {
    "range": {
      "publish_time": {
        "gte": "2020-01-01",
        "lt": "2020-02-01"
      }
    }
  },
  "aggs": {
    "by_keywords": {
      "terms": {
        "field": "keywords.keyword"
      }
    }
  }
}

You can also use a filtered aggregation to aggregate on multiple monthly buckets:您还可以使用过滤聚合来聚合多个月度存储桶：

GET tester/_search
{
  "size": 0,
  "aggs": {
    "2020-01_2020-02": {
      "filter": {
        "range": {
          "publish_time": {
            "gte": "2020-01-01",
            "lt": "2020-02-01"
          }
        }
      },
      "aggs": {
        "by_keywords": {
          "terms": {
            "field": "keywords.keyword"
          }
        }
      }
    },
    "2020-02_2020-03": {
      "filter": {
        "range": {
          "publish_time": {
            "gte": "2020-02-01",
            "lt": "2020-03-01"
          }
        }
      },
      "aggs": {
        "by_keywords": {
          "terms": {
            "field": "keywords.keyword"
          }
        }
      }
    }
  }
}

如何快速聚合大量数据

问题描述

1 个解决方案

解决方案1
0 2020-03-24 16:23:59

如何快速聚合大量数据

问题描述

1 个解决方案

解决方案1 0 2020-03-24 16:23:59

解决方案1
0 2020-03-24 16:23:59