简体   繁体   中英

Find the same element in MongoDB Collection

I have a data structure like this :

myStructure = {
    1 : ['ab','bc','cd','gh'] , 
    2 : ['bc','cd','de'] , 
    3 : ['cd','de','ef12','xz','ygd']
}

I want to find the element which has been present in all of the arrays inside 'myStructure' which would be : 'cd'

I'm going to input lots of data into MongoDB and I want to find patterns/duplicates like the example above...

Is there any way to do this with MongoDB ? Are there better ways to do this without MongoDB ?

Update 1 :

I noticed my data structure is not a preferable one... I don't want to be limited to only a couple of keys like '1,2,3' thus I changed the structure to this :

    myStructure = [
        {key: 1, value: ['ab','bc','cd']} ,
        {key: 2, value: ['bc','cd','de']} ,
        {key: 3, value: ['cd','de','ef']},
        ...
    ]

Thanks for the answers up to now but I'd be thankful if you could answer the question according to the new structure... Thanks...

What you need is aggregation using the $setIntersection operator.

db.test.aggregate(
    [
        { $project: { "commonElement": { $setIntersection: [ "$1", "$2", "$3" ]}}}
    ]
)

If you meant that all arrays are consistently present then you can do this using $setIntersection and $redact :

db.collection.aggregate([
    { "$redact": {
        "$cond": {
           "if": { 
               "$gt": [
                   { "$size": { "$setIntersection": ["$1","$2", "$3"] } },
                   0
               ]
           },
           "then": "$$KEEP",
           "else": "$$PRUNE"
       }
    }},
    { "$project": {
        "intersection": { "$setIntersection": ["$1","$2","$3"] }
    }}
])

First to filter anything that does not intersect and then to show the intersection.

So with all arrays in the same document:

{ 
    "_id" : ObjectId("559a22f8369e4e157fe17338"), 
    "1" : [ "ab", "bc", "cd" ], 
    "2" : [ "bc", "cd", "de" ], 
    "3" : [ "cd", "de", "ef" ]
}
{ 
   "_id" : ObjectId("559a2ebc369e4e157fe17339"), 
   "1" : [ "bc", "ab" ], 
   "2" : [ "de", "ef" ], 
   "3" : [ "aj", "kl" ]
}

You get:

{ 
    "_id" : ObjectId("559a22f8369e4e157fe17338"),
    "intersection" : [ "cd" ]
}

One the changed question

With individual documents like:

    { "key": 1, "value": ['ab','bc','cd']} ,
    { "key": 2, "value": ['bc','cd','de']},
    { "key": 3, "value": ['cd','de','ef']}

Then process like this:

db.collection.aggregate([
    { "$unwind": "$value" },
    { "$group": {
        "_id": "$value",
        "keys": { "$push": "$key" },
        "count": { "$sum": 1 }
    }},
    { "$match": { "count": { "$gt": 1 } } }
])

To get the intersection of arrays within arrays in a single document:

{
    "id": 1,
    "someKey": "abc",
    "items": [
        { "key": 1, "value": ['ab','bc','cd']} ,
        { "key": 2, "value": ['bc','cd','de']},
        { "key": 3, "value": ['cd','de','ef']}
    ]
}

Then $unwind mutiple times and process:

db.collection.aggregate([
   { "$unwind": "$items" },
   { "$unwind": "$items.value" },
   { "$group": {
       "_id": {
          "_id": "$_id",
          "value": "$items.value" 
       },
       "keys": { "$push": "$items.key" },
       "count": { "$sum": 1 }
   }},
   { "$match": { "count": { "$gt": 1 } } }
])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM