I am using Mongodb 3.2.0
with aggregate query to get the total distinct "userId" by "itemId". In my collection, I have more than 20 million documents. The document in my collection looks like below.
{
itemId : ObjectId('59c0a50f6ca8a1545bf1d206'),
regionId : ObjectId('59c11af56ca8a1545bb32665'),
userId : ObjectId('59c3cd626ca8a12e70866b0c')
},
{
itemId : ObjectId('59c0a50f6ca8a1545bf1d206'),
regionId : ObjectId('59c11af56ca8a1545bb32665'),
userId : ObjectId('59c3cd626ca8a12e70865678')
}
From this, using "itemId" as my selector, I am computing the total distinct "userId" available within the collection. The below config I am using as index in my collection.
db.items.endureIndex({"itemId" : 1})
db.items.endureIndex({"userId" : 1})
My aggregate query is
db.items.aggregate([
{ $match: { itemId: { $in: [ ObjectId('59c0a50f6ca8a1545bf1d206'), ObjectId('59c0a50f6ca8a1545bf1d207')] } } },
{ $group: { _id: "$userId"}},
{ $group: { _id: null, count : {$sum : 1}}}
])
I have also given "allowDiskUse" as true.
The query is executing more than 20 seconds and giving the result. Is there any other way i can improve the execution speed?
I am executing via NodeJS native mongodb driver. Using the distinct query fails with "Exceeding with 16 MB Limit". So, I preferred to go with "aggregate" query.
There are totally 600 000 unique userId as (ObjectId) getting as a result. The total document available in the collection is 8 397 727.
Can try this to get distinct userId
and filtered by itemId
db.collectionName.distinct('userId',
{itemId: {$in: [ObjectId('59c0a50f6ca8a1545bf1d206'), ObjectId('59c0a50f6ca8a1545bf1d207')]}}
).length
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.