I am running a Java problem as below:
MongoDBManager db = new MongoDBManager(dbName, "FreqUserLog");
List<Object> distinctUIDs = db.getDistinct("uid");
int userNum = 0;
LinkedList<DBObject> samples = new LinkedList<DBObject>();
for( Object uid_obj : distinctUIDs ) {
System.out.format( "user %d%n", ++userNum );
BasicDBObject filter = new BasicDBObject();
filter.put( "uid", String.valueOf(uid_obj) );
DBCursor cursor = db.findAll(filter);
///////////////////////////////////////
while( cursor.hasNext() ) {
DBObject userlog = cursor.next();
// do nothing temporarily
}
///////////////////////////////////////
}
BACKGROUND: The program first gets distinct user IDs and then retrieves all logs of each user. In the MongoDB, there are 47,000 users. I set the VM variable to "-Xms20480m".
PROBLEM: The program runs very fast initially (5s for 1000 users). but after processing 1000 users, it becomes slow (1s for 5 users). Sometimes that number is 1300 or 1900. It seems that it will take one day to process all user logs. I also used Python+PyMongo to write a same program. I met the same problem.
I also tried to comment the block between "/////////////////". The program finished very fast. The count of log in each cursor is about 200. I do not know what the problem is.
EDIT: I have indexes for "uid" and "url". The log structure is like:
{"_id": *****
"url": *****
"Geo": *****
"Log count": 3
"Log0":{
"event":*****
"eventcode":*****
"time": *****
"ip": *****
}
"Log1":{
"event":*****
"eventcode":*****
"time": *****
"ip": *****
}
"Log3":{
"event":*****
"eventcode":*****
"time": *****
"ip": *****
}
}
Is there any reason you're doing a new query for each distinct uid
and not using a $in? You're likely doing thousands of queries to get all your data back out.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.