简体   繁体   English

如何在Java的mongodb中使用聚合来多次查找字段?

[英]How to use aggregation in mongodb with Java to find occurence of a field multiple times?

I have a collection in mongodb - "text_failed" which has all the numbers on which I failed to send an SMS, the time they failed and some other information. 我在mongodb中有一个集合-“ text_failed”,其中包含我发送短信失败的所有号码,失败的时间以及一些其他信息。

A document in this collection looks like this: 此集合中的文档如下所示:

{
    _id(ObjectId): xxxxxx2af8....
    failTime(String): 2015-05-15 01:15:48
    telNum(String): 95634xxxxx
    //some other information    
}

I need to fetch the top 500 numbers which failed the most in a month's duration. 我需要获取在一个月的时间里失败最多的前500个数字。 A number can occur any number of time during this month.(Eg: a number failed 143 times, other 46 etc.) 这个月内可能出现任意次数的数字(例如:某个数字失败143次,其他失败46次,等等)。

The problem I have is that during this duration the numbers failed crossed 7M. 我的问题是,在此期间,失败的数字超过了7M。 It's difficult to process this much information using the following code which doesn't use aggregation: 使用以下不使用聚合的代码很难处理这么多信息:

    DBCollection collection = mongoDB.getCollection("text_failed");
    BasicDBObject query = new BasicDBObject();
    query.put("failTime", new BasicDBObject("$gt", "2015-05-15 00:00:00").append("$lt", "2015-06-15 00:00:00"));
    BasicDBObject field = new BasicDBObject();
    field.put("telNum", 1);

    DBCursor cursor = collection.find(query, field);
    HashMap<String, Integer> hm = new HashMap<String, Integer>();

    //int count = 1;
    System.out.println(cursor);
    while(cursor.hasNext()) {

        //System.out.println(count);
        //count++;
        DBObject object = cursor.next();

        if(hm.containsKey(object.get("telNum").toString())) {
            hm.put(object.get("telNum").toString(), hm.get(object.get("telNum").toString()) + 1);
        } 
        else {
            hm.put(object.get("telNum").toString(), 1);
        }

    }

This fetches 7M+ documents for me. 这为我获取了7M +文档。 I need only the top 500 numbers. 我只需要前500个号码。 The result should look something like this: 结果应如下所示:

{
    telNum: xxxxx54654 //the number which failed
    count: 129 //number of times it failed    
}

I used aggregation myself but didn't get the desired results. 我本人使用聚合,但未获得理想的结果。 Can this be accomplished by aggregation? 可以通过聚合来完成吗? Or is there any other way more efficient in which I can do this? 还是有其他更有效的方法可以做到这一点?

You could try the following aggregation pipeline: 您可以尝试以下聚合管道:

db.getCollection("text_failed").aggregate([
    {
        "$match": {
            "failTime": { "$gt": "2015-05-01 00:00:00", "$lt": "2015-06-01 00:00:00" }
        }
    },
    {
        "$group": {
            "_id": "$telNum",
            "count": { "$sum": 1 }                
        }
    },
    {
        "$sort": { "count": -1 }
    },
    {
        "$limit": 500
    }
])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM