简体   繁体   中英

MongoDB get total count as result using aggregate is very slow

I am using Mongodb 3.2.0 with aggregate query to get the total distinct "userId" by "itemId". In my collection, I have more than 20 million documents. The document in my collection looks like below.

{
    itemId : ObjectId('59c0a50f6ca8a1545bf1d206'),
    regionId : ObjectId('59c11af56ca8a1545bb32665'),
    userId : ObjectId('59c3cd626ca8a12e70866b0c')
  },
  {
    itemId : ObjectId('59c0a50f6ca8a1545bf1d206'),
    regionId : ObjectId('59c11af56ca8a1545bb32665'),
    userId : ObjectId('59c3cd626ca8a12e70865678')
  } 

From this, using "itemId" as my selector, I am computing the total distinct "userId" available within the collection. The below config I am using as index in my collection.

db.items.endureIndex({"itemId" : 1})
db.items.endureIndex({"userId" : 1})

My aggregate query is

db.items.aggregate([
    { $match: { itemId: { $in: [ ObjectId('59c0a50f6ca8a1545bf1d206'),  ObjectId('59c0a50f6ca8a1545bf1d207')] } } },
    { $group: { _id: "$userId"}},
    { $group: { _id: null, count : {$sum : 1}}}
    ])

I have also given "allowDiskUse" as true.

The query is executing more than 20 seconds and giving the result. Is there any other way i can improve the execution speed?

I am executing via NodeJS native mongodb driver. Using the distinct query fails with "Exceeding with 16 MB Limit". So, I preferred to go with "aggregate" query.

There are totally 600 000 unique userId as (ObjectId) getting as a result. The total document available in the collection is 8 397 727.

Can try this to get distinct userId and filtered by itemId

db.collectionName.distinct('userId', 
  {itemId: {$in: [ObjectId('59c0a50f6ca8a1545bf1d206'), ObjectId('59c0a50f6ca8a1545bf1d207')]}}
).length

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM