[英]How to get occurrence count of specific field value in elasticsearch from 650 M data
I have indexed Twitter data in ES. 我已经在ES中索引了Twitter数据。 There are 110 M Twitter unique users profiles and there 650 M Tweets . Twitter共有1.1亿个Twitter独特用户资料和6.5亿条Tweets 。 Both are in seperate index (index: twitter-profiles, type: profiles), for tweets (index: twitter-tweets, type: tweets). 两者都在单独的索引(索引:twitter-profiles,类型:profiles)中,用于推文(索引:twitter-tweets,类型:tweets)。
There is user_id_str of profile is attached with every tweet. 每条推文都附有个人资料的user_id_str 。
I am running into a problem to get occurrence count of specific user . 我遇到问题以获取特定用户的出现次数 。 I used Facet/terms and Aggregation/Terms but both give me exception PartialShardFailureException because there are lot of data to make calculation. 我使用了Facet / terms和Aggregation / Terms,但是都给了我PartialShardFailureException异常,因为有很多数据需要进行计算。 I used following query 我使用以下查询
{
"aggs" : {
"userCount" : {
"terms" : { "field" : "user_id_str" }
}
}
}
Then I give another Try. 然后我再试一次。
I used second method Scan . 我使用第二种方法Scan 。 Here I get ids of profiles from profiles type then search it in tweet type. 在这里,我从个人档案类型获取个人档案的ID,然后在推文类型中进行搜索。 it give me results but a single result came after 2seconds OOps. 它给了我结果,但是在2 秒 OOps之后出现了单个结果。 There are 110 M users mean I have to wait for days. 有1.1亿用户,这意味着我不得不等待几天。
Please give me any reasonable solution for this situation. 对于这种情况,请给我任何合理的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.