简体   繁体   English

两个集合之间的mongodb复杂查询

[英]mongodb complex query between two collections

Fairly new to node and mongo. 对node和mongo来说还很新。 I'm a developer from a relational db background. 我是来自关系数据库背景的开发人员。

I have been asked to write a report to calculate the conversion rate from leads relating to vehicle workshop bookings to invoices. 我被要求编写一份报告,以计算从与车辆维修车间预订相关的线索到发票的转换率。 A conversion is where an invoice was produced within 60 days of a lead being generated. 转化是指在潜在客户生成后的60天内生成发票的情况。

So I have managed with mongodb, mongoose and nodejs to import all of the data from flat files into two collections, leads and invoices. 因此,我使用mongodb,mongoose和nodejs进行了管理,以将平面文件中的所有数据导入两个集合(线索和发票)中。 I have 1M leads and about 30M invoices over a 5 year period and the rates are to be produced on a month by month basis. 我在5年内有1百万个潜在客户和约3,000万张发票,费率将按月产生。 All data has vehicle reg in common. 所有数据都具有通用的车辆记录。

So my problem is how do I join the data together with mongoose and nodejs? 所以我的问题是如何将数据与猫鼬和nodejs一起加入?

So far I have attempted for any single lead so find any invoices within a 60 day period in order for the lead to qualify as a conversion. 到目前为止,我已经尝试过寻找任何潜在客户,因此请在60天之内找到任何发票,以使潜在客户有资格获得转化。 This works but my script stops after about 20 or so successful updates. 这行得通,但是我的脚本在大约20次成功更新后停止了。 At this point I think my script which makes individual queries for invoices per lead is too heavy a load on mongodb and I can see that making millions of individual queries is too much for mongodb. 在这一点上,我认为对mongodb进行单个查询每个线索的脚本的负担太重了,我可以看到对mongodb进行数百万个单个查询的工作量太大。

After hours of browsing, I'm not sure what I should be looking for!? 经过数小时的浏览,我不确定应该找什么!?

Any help would be greatly appreciated. 任何帮助将不胜感激。

Your attempt should be working without a problem. 您的尝试应该没有问题。 What helps me, though, with large data Mongo DB instances and analysis on them: Run queries directly in Mongo, not through Node. 但是,对大数据Mongo数据库实例和对它们的分析对我有什么帮助:直接在Mongo中运行查询,而不是通过Node运行。 Like that you avoid having to convert Mongo structures (eg iterators) into Node structures (eg arrays) and generally lose a lot of overhead. 这样,您就不必将Mongo结构(例如迭代器)转换为Node结构(例如数组),并且通常会损失很多开销。

Also, make sure you have correct indices setup. 另外,请确保您有正确的索引设置。 That can be a HUGE difference in terms of performance in big databases. 在大型数据库中,这可能是巨大的性能差异。

What I would do then is something like (this should be considered pseudo code): 然后,我将要做的事情是这样的(应该将其视为伪代码):

let converted = 0;
db.leads.find({},{id: 1, date: 1}).forEach(lead => {
    const hasInvoices = db.invoices.count({leadId: lead.id, date: {$lt: lead.date + 60}});
    converted ++;

});

To speed things up, I'd use the following index for this case: 为了加快速度,在这种情况下,我将使用以下索引:

db.invoices.createIndex({leadId: 1, date: -1});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM