[英]Integrate between two collections
I have two collections: 我有两个收藏:
'DBVisit_DB': 'DBVisit_DB':
"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : NumberLong(1479252157766),
"visitStart" : NumberLong(1479249815749),
"fuseLocation" : {.... }
"userId" : "A926D9E4853196A98D1E4AC6006DAF00@1927cc81cfcf7a467e9d4f4ac7a1534b",
"modificationTimeInMillis" : NumberLong(1479263563107),
"objectId" : "C4B4CE9B-3AF1-42BC-891C-C8ABB0F8DC40",
"creationTime" : NumberLong(1479252167996),
"lastUserInteractionTime" : NumberLong(1479252167996)
} }
'device_data': 'device_data':
"_id" : { "$binary" : "AN6GmE7Thi+Sd/dpLRjIilgsV/4AAAg=", "$type" : "00" },
"auditVersion" : "1.0",
"currentTime" : NumberLong(1479301118381),
"data" : {
"networkOperatorName" : "Cellcom",...
},
"timezone" : "Asia/Jerusalem",
"collectionAlias" : "DEVICE_DATA",
"shortDate" : 17121,
"userId" : "00DE86984ED3862F9277F7692D18C88A@1927cc81cfcf7a467e9d4f4ac7a1534b"
In DBVisit_DB I need to show all visits only for Cellcom users which took more than 1 hour. 在DBVisit_DB中,我只需要显示Cellcom用户的所有访问,这些访问花费了超过1个小时。 (visitEnd - visitStart > 1 hour).
(visitEnd-visitStart> 1小时)。 by matching the userId value in both the collection.
通过匹配两个集合中的userId值。 this is what I did so far:
这是我到目前为止所做的:
//create an array that contains all the rows that "Cellcom" is their networkOperatorName
var users = db.device_data.find({ "data.networkOperatorName": "Cellcom" },{ userId: 1, _id: 0}).toArray();
//create an array that contains all the rows that the visit time is more then one hour
var time = db.DBVisit_DB.find( { $where: function() {
timePassed = new Date(this.visitEnd - this.visitStart).getHours();
return timePassed > 1}},
{ userId: 1, _id: 0, "visitEnd" : 1, "visitStart":1} ).toArray();
//merge between the two arrays
var result = [];
var i, j;
for (i = 0; i < time; i++) {
for (j = 0; j < users; j++) {
if (time[i].userId == users[j].userId) {
result.push(time[i]);
}
}
}
for (var i = 0; i < result.length; i++) {
print(result[i].userId);
}
but it doesn't show anything although I know for sure that there is id's that can be found in both the array I created. 尽管我确定在我创建的两个数组中都可以找到ID,但是它什么也没显示。 *for verification: I'm not 100% sure that I calculated the visit time correctly.
*用于验证:我不确定100%是否正确计算了访问时间。 btw I'm new to both javaScript and mongodb
顺便说一句,我是javaScript和mongodb的新手
********update********
in the "device_data" there are different rows but with the same "userId" field. 在“ device_data”中,存在不同的行,但具有相同的“ userId”字段。 in the "device_data" I have also the "data.networkOperatorName" field which contains different types of cellular companies.
在“ device_data”中,我还具有“ data.networkOperatorName”字段,其中包含不同类型的移动电话公司。 I've been asked to show all "Cellcom" users that based on the 'DBVisit_DB' collection been connected more then an hour means, based on the field "visitEnd" and "visitStart" I need to know if ("visitEnd" - "visitStart" > 1)
我被要求向所有“ Cellcom”用户显示基于“ DBVisit_DB”集合的连接已超过一个小时,根据字段“ visitEnd”和“ visitStart”,我需要知道是否(““ visitEnd”-“ visitStart“> 1)
{ "userId" : "457A7A0097F83074DA5E05F7E05BEA1D@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "E0F5C56AC227972CFAFC9124E039F0DE@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "309FA12926EC3EB49EB9AE40B6078109@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "B10420C71798F1E8768ACCF3B5E378D0@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "EE5C11AD6BFBC9644AF3C742097C531C@1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "20EA1468672EFA6793A02149623DA2C4@1927cc81cfcf7a467e9d4f4ac7a1534b" }
each array contains this format, after my queries, I need to merge them into one. 每个数组都包含这种格式,在查询之后,我需要将它们合并为一种。 that I'll have the intersection between them.
我将在他们之间有交集。
thanks a lot for all the help! 非常感谢您提供的所有帮助!
With the aggregation framework, you can achieve the desired result by making use of the $lookup
operator which allows you to do a "left-join" operation on collections in the same database as well as taking advantage of the $redact
pipeline operator which can accommodate arithmetic operators that manipulate timestamps and converting them to minutes which you can query. 使用聚合框架,您可以通过使用
$lookup
运算符来实现所需的结果,该运算符使您可以对同一数据库中的集合执行“左连接”操作,并利用$redact
管道运算符,该运算符可以容纳可操作时间戳并将其转换为分钟以供查询的算术运算符。
To show a simple example how useful the above aggregate operators are, you can run the following pipeline on the DBVisit_DB
collection to see the actual time difference in minutes: 为了显示一个简单的示例,上述聚合运算符有多么有用,您可以在
DBVisit_DB
集合上运行以下管道,以DBVisit_DB
为单位查看实际时差:
db..getCollection('DBVisit_DB').aggregate([
{
"$project": {
"visitStart": { "$add": [ "$visitStart", new Date(0) ] },
"visitEnd": { "$add": [ "$visitEnd", new Date(0) ] },
"timeDiffInMinutes": {
"$divide": [
{ "$subtract": ["$visitEnd", "$visitStart"] },
1000 * 60
]
},
"isMoreThanHour": {
"$gt": [
{
"$divide": [
{ "$subtract": ["$visitEnd", "$visitStart"] },
1000 * 60
]
}, 60
]
}
}
}
])
Sample Output 样本输出
{
"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : ISODate("2016-11-15T23:22:37.766Z"),
"visitStart" : ISODate("2016-11-15T22:43:35.749Z"),
"timeDiffInMinutes" : 39.0336166666667,
"isMoreThanHour" : false
}
Now, having an understanding of how the above operators work, you can now apply it in the following example, where running the following aggregate pipeline will use the device_data
collection as the main collection, first filter the documents on the specified field using $match
and then do the join to DBVisit_DB
collection using $lookup
. 现在,在了解了上述运算符的工作原理之后,您现在可以将其应用在以下示例中,其中运行以下聚合管道将把
device_data
集合用作主集合,首先使用$match
和指定字段过滤文档。然后使用$lookup
DBVisit_DB
到DBVisit_DB
集合。 $redact
will process the logical condition of getting visits which are more than an hour long within $cond
and uses the special system variables $$KEEP
to "keep" the document where the logical condition is true or $$PRUNE
to "discard" the document where the condition was false. $redact
将处理在$cond
长达一个小时以上的访问的逻辑条件,并使用特殊系统变量$$KEEP
来“保留”逻辑条件为真的文档,或使用$$PRUNE
来“丢弃”条件为假的文件。
The arithmetic operators $divide
and $subtract
allow you to calculate the difference between the two timestamp fields as minutes, and the $gt
logical operator then evaluates the condition: 算术运算符
$divide
和$subtract
允许您以分钟为单位计算两个时间戳字段之间的差,然后$gt
逻辑运算符计算条件:
db.device_data.aggregate([
/* Filter input documents */
{ "$match": { "data.networkOperatorName": "Cellcom" } },
/* Do a left-join to DBVisit_DB collection */
{
"$lookup": {
"from": "DBVisit_DB",
"localField": "userId",
"foreignField": "userId",
"as": "userVisits"
}
},
/* Flatten resulting array */
{ "$unwind": "$userVisits" },
/* Redact documents */
{
"$redact": {
"$cond": [
{
"$gt": [
{
"$divide": [
{ "$subtract": [
"$userVisits.visitEnd",
"$userVisits.visitStart"
] },
1000 * 60
]
},
60
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
There are couple of things incorrect in your java script. 您的Java脚本中有几处错误。
Replace time
and users
condition with time.length
and users.length
in for
loops. 在
for
循环中for
用time.length
和users.length
替换time
和users
条件。
Your timePassed
calculation should be 您的
timePassed
计算应为
timePassed = this.visitEnd - this.visitStart
return timePassed > 3600000
You have couple of data related issues. 您有几个与数据有关的问题。
You don't have matching userId
and difference between visitEnd
and visitStart
is less than an hour for the documents you posted in the question. 您没有匹配的
userId
,对于您在问题中发布的文档, visitEnd
和visitStart
之间的visitEnd
不到一个小时。
For mongo based query you should checkout the other answer . 对于基于mongo的查询,您应该签出其他答案 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.