两个集合之间的mongodb复杂查询

Question

Fairly new to node and mongo. 对node和mongo来说还很新。 I'm a developer from a relational db background. 我是来自关系数据库背景的开发人员。

I have been asked to write a report to calculate the conversion rate from leads relating to vehicle workshop bookings to invoices. 我被要求编写一份报告，以计算从与车辆维修车间预订相关的线索到发票的转换率。 A conversion is where an invoice was produced within 60 days of a lead being generated. 转化是指在潜在客户生成后的60天内生成发票的情况。

So I have managed with mongodb, mongoose and nodejs to import all of the data from flat files into two collections, leads and invoices. 因此，我使用mongodb，mongoose和nodejs进行了管理，以将平面文件中的所有数据导入两个集合（线索和发票）中。 I have 1M leads and about 30M invoices over a 5 year period and the rates are to be produced on a month by month basis. 我在5年内有1百万个潜在客户和约3,000万张发票，费率将按月产生。 All data has vehicle reg in common. 所有数据都具有通用的车辆记录。

So my problem is how do I join the data together with mongoose and nodejs? 所以我的问题是如何将数据与猫鼬和nodejs一起加入？

So far I have attempted for any single lead so find any invoices within a 60 day period in order for the lead to qualify as a conversion. 到目前为止，我已经尝试过寻找任何潜在客户，因此请在60天之内找到任何发票，以使潜在客户有资格获得转化。 This works but my script stops after about 20 or so successful updates. 这行得通，但是我的脚本在大约20次成功更新后停止了。 At this point I think my script which makes individual queries for invoices per lead is too heavy a load on mongodb and I can see that making millions of individual queries is too much for mongodb. 在这一点上，我认为对mongodb进行单个查询每个线索的脚本的负担太重了，我可以看到对mongodb进行数百万个单个查询的工作量太大。

After hours of browsing, I'm not sure what I should be looking for!? 经过数小时的浏览，我不确定应该找什么！？

Any help would be greatly appreciated. 任何帮助将不胜感激。

Answer 1

Your attempt should be working without a problem. 您的尝试应该没有问题。 What helps me, though, with large data Mongo DB instances and analysis on them: Run queries directly in Mongo, not through Node. 但是，对大数据Mongo数据库实例和对它们的分析对我有什么帮助：直接在Mongo中运行查询，而不是通过Node运行。 Like that you avoid having to convert Mongo structures (eg iterators) into Node structures (eg arrays) and generally lose a lot of overhead. 这样，您就不必将Mongo结构（例如迭代器）转换为Node结构（例如数组），并且通常会损失很多开销。

Also, make sure you have correct indices setup. 另外，请确保您有正确的索引设置。 That can be a HUGE difference in terms of performance in big databases. 在大型数据库中，这可能是巨大的性能差异。

What I would do then is something like (this should be considered pseudo code): 然后，我将要做的事情是这样的（应该将其视为伪代码）：

let converted = 0;
db.leads.find({},{id: 1, date: 1}).forEach(lead => {
    const hasInvoices = db.invoices.count({leadId: lead.id, date: {$lt: lead.date + 60}});
    converted ++;

});

To speed things up, I'd use the following index for this case: 为了加快速度，在这种情况下，我将使用以下索引：

db.invoices.createIndex({leadId: 1, date: -1});

两个集合之间的mongodb复杂查询

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-06-28 22:10:32

两个集合之间的mongodb复杂查询

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-06-28 22:10:32

解决方案1
0 已采纳 2018-06-28 22:10:32