简体   繁体   中英

MongoDB Java retrieval becomes very slow

I am running a Java problem as below:

MongoDBManager db = new MongoDBManager(dbName, "FreqUserLog");
List<Object> distinctUIDs = db.getDistinct("uid");

int userNum = 0;
LinkedList<DBObject> samples = new LinkedList<DBObject>();
for( Object uid_obj : distinctUIDs ) {
    System.out.format( "user %d%n", ++userNum );

    BasicDBObject filter = new BasicDBObject();
    filter.put( "uid", String.valueOf(uid_obj) );
    DBCursor cursor = db.findAll(filter);

    /////////////////////////////////////// 
    while( cursor.hasNext() ) {
        DBObject userlog = cursor.next();

        // do nothing temporarily   
    }
    ///////////////////////////////////////
}

BACKGROUND: The program first gets distinct user IDs and then retrieves all logs of each user. In the MongoDB, there are 47,000 users. I set the VM variable to "-Xms20480m".

PROBLEM: The program runs very fast initially (5s for 1000 users). but after processing 1000 users, it becomes slow (1s for 5 users). Sometimes that number is 1300 or 1900. It seems that it will take one day to process all user logs. I also used Python+PyMongo to write a same program. I met the same problem.

I also tried to comment the block between "/////////////////". The program finished very fast. The count of log in each cursor is about 200. I do not know what the problem is.

EDIT: I have indexes for "uid" and "url". The log structure is like:

{"_id": *****
 "url": *****
 "Geo": *****
 "Log count": 3
 "Log0":{
          "event":*****
          "eventcode":*****
          "time": *****
          "ip": *****
          }
  "Log1":{
          "event":*****
          "eventcode":*****
          "time": *****
          "ip": *****
          }
   "Log3":{
          "event":*****
          "eventcode":*****
          "time": *****
          "ip": *****
          }
 }

Is there any reason you're doing a new query for each distinct uid and not using a $in? You're likely doing thousands of queries to get all your data back out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM