简体   繁体   中英

How to use aggregation in mongodb with Java to find occurence of a field multiple times?

I have a collection in mongodb - "text_failed" which has all the numbers on which I failed to send an SMS, the time they failed and some other information.

A document in this collection looks like this:

{
    _id(ObjectId): xxxxxx2af8....
    failTime(String): 2015-05-15 01:15:48
    telNum(String): 95634xxxxx
    //some other information    
}

I need to fetch the top 500 numbers which failed the most in a month's duration. A number can occur any number of time during this month.(Eg: a number failed 143 times, other 46 etc.)

The problem I have is that during this duration the numbers failed crossed 7M. It's difficult to process this much information using the following code which doesn't use aggregation:

    DBCollection collection = mongoDB.getCollection("text_failed");
    BasicDBObject query = new BasicDBObject();
    query.put("failTime", new BasicDBObject("$gt", "2015-05-15 00:00:00").append("$lt", "2015-06-15 00:00:00"));
    BasicDBObject field = new BasicDBObject();
    field.put("telNum", 1);

    DBCursor cursor = collection.find(query, field);
    HashMap<String, Integer> hm = new HashMap<String, Integer>();

    //int count = 1;
    System.out.println(cursor);
    while(cursor.hasNext()) {

        //System.out.println(count);
        //count++;
        DBObject object = cursor.next();

        if(hm.containsKey(object.get("telNum").toString())) {
            hm.put(object.get("telNum").toString(), hm.get(object.get("telNum").toString()) + 1);
        } 
        else {
            hm.put(object.get("telNum").toString(), 1);
        }

    }

This fetches 7M+ documents for me. I need only the top 500 numbers. The result should look something like this:

{
    telNum: xxxxx54654 //the number which failed
    count: 129 //number of times it failed    
}

I used aggregation myself but didn't get the desired results. Can this be accomplished by aggregation? Or is there any other way more efficient in which I can do this?

You could try the following aggregation pipeline:

db.getCollection("text_failed").aggregate([
    {
        "$match": {
            "failTime": { "$gt": "2015-05-01 00:00:00", "$lt": "2015-06-01 00:00:00" }
        }
    },
    {
        "$group": {
            "_id": "$telNum",
            "count": { "$sum": 1 }                
        }
    },
    {
        "$sort": { "count": -1 }
    },
    {
        "$limit": 500
    }
])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM