简体   繁体   English

整合两个馆藏

[英]Integrate between two collections

I have two collections: 我有两个收藏:

'DBVisit_DB': 'DBVisit_DB':

"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : NumberLong(1479252157766),
"visitStart" : NumberLong(1479249815749),
"fuseLocation" : {....    }
"userId" : "A926D9E4853196A98D1E4AC6006DAF00@1927cc81cfcf7a467e9d4f4ac7a1534b",
"modificationTimeInMillis" : NumberLong(1479263563107),
"objectId" : "C4B4CE9B-3AF1-42BC-891C-C8ABB0F8DC40",
"creationTime" : NumberLong(1479252167996),
"lastUserInteractionTime" : NumberLong(1479252167996)

} }

'device_data': 'device_data':

"_id" : { "$binary" : "AN6GmE7Thi+Sd/dpLRjIilgsV/4AAAg=", "$type" : "00" },
"auditVersion" : "1.0",
"currentTime" : NumberLong(1479301118381),
"data" : {
    "networkOperatorName" : "Cellcom",...
},
"timezone" : "Asia/Jerusalem",
"collectionAlias" : "DEVICE_DATA",
"shortDate" : 17121,
"userId" : "00DE86984ED3862F9277F7692D18C88A@1927cc81cfcf7a467e9d4f4ac7a1534b"

In DBVisit_DB I need to show all visits only for Cellcom users which took more than 1 hour. 在DBVisit_DB中,我只需要显示Cellcom用户的所有访问,这些访问花费了超过1个小时。 (visitEnd - visitStart > 1 hour). (visitEnd-visitStart> 1小时)。 by matching the userId value in both the collection. 通过匹配两个集合中的userId值。 this is what I did so far: 这是我到目前为止所做的:

//create an array that contains all the rows that "Cellcom" is their networkOperatorName
    var users = db.device_data.find({  "data.networkOperatorName": "Cellcom" },{ userId: 1, _id: 0}).toArray();

//create an array that contains all the rows that the visit time is more then one hour
        var time = db.DBVisit_DB.find( { $where: function() { 
          timePassed = new Date(this.visitEnd - this.visitStart).getHours();
          return timePassed > 1}},
          { userId: 1, _id: 0, "visitEnd" : 1, "visitStart":1} ).toArray();

//merge between the two arrays
        var result = [];
        var i, j;
        for (i = 0; i < time; i++) {
            for (j = 0; j < users; j++) {
                if (time[i].userId == users[j].userId) {
                    result.push(time[i]);
                }
            }
        }

        for (var i = 0; i < result.length; i++) {
           print(result[i].userId);
        }

but it doesn't show anything although I know for sure that there is id's that can be found in both the array I created. 尽管我确定在我创建的两个数组中都可以找到ID,但是它什么也没显示。 *for verification: I'm not 100% sure that I calculated the visit time correctly. *用于验证:我不确定100%是否正确计算了访问时间。 btw I'm new to both javaScript and mongodb 顺便说一句,我是javaScript和mongodb的新手

                          ********update********

in the "device_data" there are different rows but with the same "userId" field. 在“ device_data”中,存在不同的行,但具有相同的“ userId”字段。 in the "device_data" I have also the "data.networkOperatorName" field which contains different types of cellular companies. 在“ device_data”中,我还具有“ data.networkOperatorName”字段,其中包含不同类型的移动电话公司。 I've been asked to show all "Cellcom" users that based on the 'DBVisit_DB' collection been connected more then an hour means, based on the field "visitEnd" and "visitStart" I need to know if ("visitEnd" - "visitStart" > 1) 我被要求向所有“ Cellcom”用户显示基于“ DBVisit_DB”集合的连接已超过一个小时,根据字段“ visitEnd”和“ visitStart”,我需要知道是否(““ visitEnd”-“ visitStart“> 1)

    {  "userId" : "457A7A0097F83074DA5E05F7E05BEA1D@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "E0F5C56AC227972CFAFC9124E039F0DE@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "309FA12926EC3EB49EB9AE40B6078109@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "B10420C71798F1E8768ACCF3B5E378D0@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "EE5C11AD6BFBC9644AF3C742097C531C@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{  "userId" : "20EA1468672EFA6793A02149623DA2C4@1927cc81cfcf7a467e9d4f4ac7a1534b" }

each array contains this format, after my queries, I need to merge them into one. 每个数组都包含这种格式,在查询之后,我需要将它们合并为一种。 that I'll have the intersection between them. 我将在他们之间有交集。

thanks a lot for all the help! 非常感谢您提供的所有帮助!

With the aggregation framework, you can achieve the desired result by making use of the $lookup operator which allows you to do a "left-join" operation on collections in the same database as well as taking advantage of the $redact pipeline operator which can accommodate arithmetic operators that manipulate timestamps and converting them to minutes which you can query. 使用聚合框架,您可以通过使用$lookup运算符来实现所需的结果,该运算符使您可以对同一数据库中的集合执行“左连接”操作,并利用$redact管道运算符,该运算符可以容纳可操作时间戳并将其转换为分钟以供查询的算术运算符。

To show a simple example how useful the above aggregate operators are, you can run the following pipeline on the DBVisit_DB collection to see the actual time difference in minutes: 为了显示一个简单的示例,上述聚合运算符有多么有用,您可以在DBVisit_DB集合上运行以下管道,以DBVisit_DB为单位查看实际时差:

db..getCollection('DBVisit_DB').aggregate([
    {
        "$project": {
             "visitStart": { "$add": [ "$visitStart", new Date(0) ] },
             "visitEnd": { "$add": [ "$visitEnd", new Date(0) ] },
             "timeDiffInMinutes": { 
                "$divide": [
                    { "$subtract": ["$visitEnd", "$visitStart"] }, 
                    1000 * 60
                ] 
            },
            "isMoreThanHour": {
                "$gt": [
                    { 
                        "$divide": [
                            { "$subtract": ["$visitEnd", "$visitStart"] }, 
                            1000 * 60
                        ] 
                    }, 60
                ]
            }
        }
    }
])

Sample Output 样本输出

{
    "_id" : ObjectId("582bc54958f2245b05b455c6"),
    "visitEnd" : ISODate("2016-11-15T23:22:37.766Z"),
    "visitStart" : ISODate("2016-11-15T22:43:35.749Z"),
    "timeDiffInMinutes" : 39.0336166666667,
    "isMoreThanHour" : false
}

Now, having an understanding of how the above operators work, you can now apply it in the following example, where running the following aggregate pipeline will use the device_data collection as the main collection, first filter the documents on the specified field using $match and then do the join to DBVisit_DB collection using $lookup . 现在,在了解了上述运算符的工作原理之后,您现在可以将其应用在以下示例中,其中运行以下聚合管道将把device_data集合用作主集合,首先使用$match和指定字段过滤文档。然后使用$lookup DBVisit_DBDBVisit_DB集合。 $redact will process the logical condition of getting visits which are more than an hour long within $cond and uses the special system variables $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "discard" the document where the condition was false. $redact将处理在$cond长达一个小时以上的访问的逻辑条件,并使用特殊系统变量$$KEEP来“保留”逻辑条件为真的文档,或使用$$PRUNE来“丢弃”条件为假的文件。

The arithmetic operators $divide and $subtract allow you to calculate the difference between the two timestamp fields as minutes, and the $gt logical operator then evaluates the condition: 算术运算符$divide$subtract允许您以分钟为单位计算两个时间戳字段之间的差,然后$gt逻辑运算符计算条件:

db.device_data.aggregate([
    /* Filter input documents */
    { "$match": { "data.networkOperatorName": "Cellcom" } },

    /* Do a left-join to DBVisit_DB collection */
    {
        "$lookup": {
            "from": "DBVisit_DB",
            "localField": "userId",
            "foreignField": "userId",
            "as": "userVisits"
        }
    },

    /* Flatten resulting array */
    { "$unwind": "$userVisits" },

    /* Redact documents */
    {
        "$redact": {
            "$cond": [
                {
                    "$gt": [
                        { 
                            "$divide": [
                                { "$subtract": [
                                        "$userVisits.visitEnd", 
                                        "$userVisits.visitStart"
                                ] }, 
                                1000 * 60
                            ] 
                        },
                        60
                    ]
                },
                "$$KEEP",
                "$$PRUNE"
            ]
        }
    }
])

There are couple of things incorrect in your java script. 您的Java脚本中有几处错误。

Replace time and users condition with time.length and users.length in for loops. for循环中fortime.lengthusers.length替换timeusers条件。

Your timePassed calculation should be 您的timePassed计算应为

timePassed = this.visitEnd - this.visitStart
      return timePassed > 3600000

You have couple of data related issues. 您有几个与数据有关的问题。

You don't have matching userId and difference between visitEnd and visitStart is less than an hour for the documents you posted in the question. 您没有匹配的userId ,对于您在问题中发布的文档, visitEndvisitStart之间的visitEnd不到一个小时。

For mongo based query you should checkout the other answer . 对于基于mongo的查询,您应该签出其他答案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM