简体   繁体   English

如何快速聚合大量数据

[英]How to quickly aggregate large amount of data

I need to aggregate all the keywords in the news for a period of time, for example:我需要聚合一段时间内新闻中的所有关键词,例如:

{
  "news_ID":"123456",
  "news_content":"Apple pencil",
  "keywords": {
      [
        {
         "word" : "Apple",
         "score" : 0.0653220043
        },
        {
         "word" : "pencil",
         "score" : 0.7096893191
        }
      ]
    },
  "publish_time":"2020-01-03"
}

I want to know how many times apple appeared between 2020-01 and 2020-02, but there are too many keywords...我想知道apple在2020-01到2020-02之间出现了多少次,但是关键词太多了...

Could you please advise me on how I should approach this requirement as per best practices?您能否就我应该如何根据最佳实践来处理此要求提出建议?

Syncing a sample doc:同步示例文档:

PUT tester/_doc/1
{
  "news_ID":"123456",
  "news_content":"Apple pencil",
  "keywords":[
    "apple",
    "pencil"
  ],
  "publish_time":"2020-01-03"
}

Using a terms aggregation w/ a range filter on the top level:在顶层使用带有范围过滤器的术语聚合

GET tester/_search
{
  "size": 0,
  "query": {
    "range": {
      "publish_time": {
        "gte": "2020-01-01",
        "lt": "2020-02-01"
      }
    }
  },
  "aggs": {
    "by_keywords": {
      "terms": {
        "field": "keywords.keyword"
      }
    }
  }
}

You can also use a filtered aggregation to aggregate on multiple monthly buckets:您还可以使用过滤聚合来聚合多个月度存储桶:

GET tester/_search
{
  "size": 0,
  "aggs": {
    "2020-01_2020-02": {
      "filter": {
        "range": {
          "publish_time": {
            "gte": "2020-01-01",
            "lt": "2020-02-01"
          }
        }
      },
      "aggs": {
        "by_keywords": {
          "terms": {
            "field": "keywords.keyword"
          }
        }
      }
    },
    "2020-02_2020-03": {
      "filter": {
        "range": {
          "publish_time": {
            "gte": "2020-02-01",
            "lt": "2020-03-01"
          }
        }
      },
      "aggs": {
        "by_keywords": {
          "terms": {
            "field": "keywords.keyword"
          }
        }
      }
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM