[英]How to use aggregation in mongodb with Java to find occurence of a field multiple times?
I have a collection in mongodb - "text_failed" which has all the numbers on which I failed to send an SMS, the time they failed and some other information. 我在mongodb中有一个集合-“ text_failed”,其中包含我发送短信失败的所有号码,失败的时间以及一些其他信息。
A document in this collection looks like this: 此集合中的文档如下所示:
{
_id(ObjectId): xxxxxx2af8....
failTime(String): 2015-05-15 01:15:48
telNum(String): 95634xxxxx
//some other information
}
I need to fetch the top 500 numbers which failed the most in a month's duration. 我需要获取在一个月的时间里失败最多的前500个数字。 A number can occur any number of time during this month.(Eg: a number failed 143 times, other 46 etc.)
这个月内可能出现任意次数的数字(例如:某个数字失败143次,其他失败46次,等等)。
The problem I have is that during this duration the numbers failed crossed 7M. 我的问题是,在此期间,失败的数字超过了7M。 It's difficult to process this much information using the following code which doesn't use aggregation:
使用以下不使用聚合的代码很难处理这么多信息:
DBCollection collection = mongoDB.getCollection("text_failed");
BasicDBObject query = new BasicDBObject();
query.put("failTime", new BasicDBObject("$gt", "2015-05-15 00:00:00").append("$lt", "2015-06-15 00:00:00"));
BasicDBObject field = new BasicDBObject();
field.put("telNum", 1);
DBCursor cursor = collection.find(query, field);
HashMap<String, Integer> hm = new HashMap<String, Integer>();
//int count = 1;
System.out.println(cursor);
while(cursor.hasNext()) {
//System.out.println(count);
//count++;
DBObject object = cursor.next();
if(hm.containsKey(object.get("telNum").toString())) {
hm.put(object.get("telNum").toString(), hm.get(object.get("telNum").toString()) + 1);
}
else {
hm.put(object.get("telNum").toString(), 1);
}
}
This fetches 7M+ documents for me. 这为我获取了7M +文档。 I need only the top 500 numbers.
我只需要前500个号码。 The result should look something like this:
结果应如下所示:
{
telNum: xxxxx54654 //the number which failed
count: 129 //number of times it failed
}
I used aggregation myself but didn't get the desired results. 我本人使用聚合,但未获得理想的结果。 Can this be accomplished by aggregation?
可以通过聚合来完成吗? Or is there any other way more efficient in which I can do this?
还是有其他更有效的方法可以做到这一点?
You could try the following aggregation pipeline: 您可以尝试以下聚合管道:
db.getCollection("text_failed").aggregate([
{
"$match": {
"failTime": { "$gt": "2015-05-01 00:00:00", "$lt": "2015-06-01 00:00:00" }
}
},
{
"$group": {
"_id": "$telNum",
"count": { "$sum": 1 }
}
},
{
"$sort": { "count": -1 }
},
{
"$limit": 500
}
])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.