简体   繁体   中英

Integrate between two collections

I have two collections:

'DBVisit_DB':

"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : NumberLong(1479252157766),
"visitStart" : NumberLong(1479249815749),
"fuseLocation" : {....    }
"userId" : "A926D9E4853196A98D1E4AC6006DAF00@1927cc81cfcf7a467e9d4f4ac7a1534b",
"modificationTimeInMillis" : NumberLong(1479263563107),
"objectId" : "C4B4CE9B-3AF1-42BC-891C-C8ABB0F8DC40",
"creationTime" : NumberLong(1479252167996),
"lastUserInteractionTime" : NumberLong(1479252167996)

}

'device_data':

"_id" : { "$binary" : "AN6GmE7Thi+Sd/dpLRjIilgsV/4AAAg=", "$type" : "00" },
"auditVersion" : "1.0",
"currentTime" : NumberLong(1479301118381),
"data" : {
    "networkOperatorName" : "Cellcom",...
},
"timezone" : "Asia/Jerusalem",
"collectionAlias" : "DEVICE_DATA",
"shortDate" : 17121,
"userId" : "00DE86984ED3862F9277F7692D18C88A@1927cc81cfcf7a467e9d4f4ac7a1534b"

In DBVisit_DB I need to show all visits only for Cellcom users which took more than 1 hour. (visitEnd - visitStart > 1 hour). by matching the userId value in both the collection. this is what I did so far:

//create an array that contains all the rows that "Cellcom" is their networkOperatorName
    var users = db.device_data.find({  "data.networkOperatorName": "Cellcom" },{ userId: 1, _id: 0}).toArray();

//create an array that contains all the rows that the visit time is more then one hour
        var time = db.DBVisit_DB.find( { $where: function() { 
          timePassed = new Date(this.visitEnd - this.visitStart).getHours();
          return timePassed > 1}},
          { userId: 1, _id: 0, "visitEnd" : 1, "visitStart":1} ).toArray();

//merge between the two arrays
        var result = [];
        var i, j;
        for (i = 0; i < time; i++) {
            for (j = 0; j < users; j++) {
                if (time[i].userId == users[j].userId) {
                    result.push(time[i]);
                }
            }
        }

        for (var i = 0; i < result.length; i++) {
           print(result[i].userId);
        }

but it doesn't show anything although I know for sure that there is id's that can be found in both the array I created. *for verification: I'm not 100% sure that I calculated the visit time correctly. btw I'm new to both javaScript and mongodb

                          ********update********

in the "device_data" there are different rows but with the same "userId" field. in the "device_data" I have also the "data.networkOperatorName" field which contains different types of cellular companies. I've been asked to show all "Cellcom" users that based on the 'DBVisit_DB' collection been connected more then an hour means, based on the field "visitEnd" and "visitStart" I need to know if ("visitEnd" - "visitStart" > 1)

    {  "userId" : "457A7A0097F83074DA5E05F7E05BEA1D@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "E0F5C56AC227972CFAFC9124E039F0DE@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "309FA12926EC3EB49EB9AE40B6078109@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "B10420C71798F1E8768ACCF3B5E378D0@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "EE5C11AD6BFBC9644AF3C742097C531C@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "20EA1468672EFA6793A02149623DA2C4@1927cc81cfcf7a467e9d4f4ac7a1534b" }

each array contains this format, after my queries, I need to merge them into one. that I'll have the intersection between them.

thanks a lot for all the help!

With the aggregation framework, you can achieve the desired result by making use of the $lookup operator which allows you to do a "left-join" operation on collections in the same database as well as taking advantage of the $redact pipeline operator which can accommodate arithmetic operators that manipulate timestamps and converting them to minutes which you can query.

To show a simple example how useful the above aggregate operators are, you can run the following pipeline on the DBVisit_DB collection to see the actual time difference in minutes:

db..getCollection('DBVisit_DB').aggregate([
    {
        "$project": {
             "visitStart": { "$add": [ "$visitStart", new Date(0) ] },
             "visitEnd": { "$add": [ "$visitEnd", new Date(0) ] },
             "timeDiffInMinutes": { 
                "$divide": [
                    { "$subtract": ["$visitEnd", "$visitStart"] }, 
                    1000 * 60
                ] 
            },
            "isMoreThanHour": {
                "$gt": [
                    { 
                        "$divide": [
                            { "$subtract": ["$visitEnd", "$visitStart"] }, 
                            1000 * 60
                        ] 
                    }, 60
                ]
            }
        }
    }
])

Sample Output

{
    "_id" : ObjectId("582bc54958f2245b05b455c6"),
    "visitEnd" : ISODate("2016-11-15T23:22:37.766Z"),
    "visitStart" : ISODate("2016-11-15T22:43:35.749Z"),
    "timeDiffInMinutes" : 39.0336166666667,
    "isMoreThanHour" : false
}

Now, having an understanding of how the above operators work, you can now apply it in the following example, where running the following aggregate pipeline will use the device_data collection as the main collection, first filter the documents on the specified field using $match and then do the join to DBVisit_DB collection using $lookup . $redact will process the logical condition of getting visits which are more than an hour long within $cond and uses the special system variables $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "discard" the document where the condition was false.

The arithmetic operators $divide and $subtract allow you to calculate the difference between the two timestamp fields as minutes, and the $gt logical operator then evaluates the condition:

db.device_data.aggregate([
    /* Filter input documents */
    { "$match": { "data.networkOperatorName": "Cellcom" } },

    /* Do a left-join to DBVisit_DB collection */
    {
        "$lookup": {
            "from": "DBVisit_DB",
            "localField": "userId",
            "foreignField": "userId",
            "as": "userVisits"
        }
    },

    /* Flatten resulting array */
    { "$unwind": "$userVisits" },

    /* Redact documents */
    {
        "$redact": {
            "$cond": [
                {
                    "$gt": [
                        { 
                            "$divide": [
                                { "$subtract": [
                                        "$userVisits.visitEnd", 
                                        "$userVisits.visitStart"
                                ] }, 
                                1000 * 60
                            ] 
                        },
                        60
                    ]
                },
                "$$KEEP",
                "$$PRUNE"
            ]
        }
    }
])

There are couple of things incorrect in your java script.

Replace time and users condition with time.length and users.length in for loops.

Your timePassed calculation should be

timePassed = this.visitEnd - this.visitStart
      return timePassed > 3600000

You have couple of data related issues.

You don't have matching userId and difference between visitEnd and visitStart is less than an hour for the documents you posted in the question.

For mongo based query you should checkout the other answer .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM