简体   繁体   English

如何从650 M数据中获取Elasticsearch中特定字段值的出现次数

[英]How to get occurrence count of specific field value in elasticsearch from 650 M data

I have indexed Twitter data in ES. 我已经在ES中索引了Twitter数据。 There are 110 M Twitter unique users profiles and there 650 M Tweets . Twitter共有1.1亿个Twitter独特用户资料6.5亿条Tweets Both are in seperate index (index: twitter-profiles, type: profiles), for tweets (index: twitter-tweets, type: tweets). 两者都在单独的索引(索引:twitter-profiles,类型:profiles)中,用于推文(索引:twitter-tweets,类型:tweets)。

There is user_id_str of profile is attached with every tweet. 每条推文都附有个人资料的user_id_str

I am running into a problem to get occurrence count of specific user . 我遇到问题以获取特定用户的出现次数 I used Facet/terms and Aggregation/Terms but both give me exception PartialShardFailureException because there are lot of data to make calculation. 我使用了Facet / terms和Aggregation / Terms,但是都给了我PartialShardFailureException异常,因为有很多数据需要进行计算。 I used following query 我使用以下查询

{
"aggs" : {
    "userCount" : {
        "terms" : { "field" : "user_id_str" }
    }
  }
}

Then I give another Try. 然后我再试一次。

I used second method Scan . 我使用第二种方法Scan Here I get ids of profiles from profiles type then search it in tweet type. 在这里,我从个人档案类型获取个人档案的ID,然后在推文类型中进行搜索。 it give me results but a single result came after 2seconds OOps. 它给了我结果,但是在2 OOps之后出现了单个结果。 There are 110 M users mean I have to wait for days. 有1.1亿用户,这意味着我不得不等待几天。

Please give me any reasonable solution for this situation. 对于这种情况,请给我任何合理的解决方案。

您可以结合使用基数聚合和术语过滤器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM