简体   繁体   中英

mongodb complex query between two collections

Fairly new to node and mongo. I'm a developer from a relational db background.

I have been asked to write a report to calculate the conversion rate from leads relating to vehicle workshop bookings to invoices. A conversion is where an invoice was produced within 60 days of a lead being generated.

So I have managed with mongodb, mongoose and nodejs to import all of the data from flat files into two collections, leads and invoices. I have 1M leads and about 30M invoices over a 5 year period and the rates are to be produced on a month by month basis. All data has vehicle reg in common.

So my problem is how do I join the data together with mongoose and nodejs?

So far I have attempted for any single lead so find any invoices within a 60 day period in order for the lead to qualify as a conversion. This works but my script stops after about 20 or so successful updates. At this point I think my script which makes individual queries for invoices per lead is too heavy a load on mongodb and I can see that making millions of individual queries is too much for mongodb.

After hours of browsing, I'm not sure what I should be looking for!?

Any help would be greatly appreciated.

Your attempt should be working without a problem. What helps me, though, with large data Mongo DB instances and analysis on them: Run queries directly in Mongo, not through Node. Like that you avoid having to convert Mongo structures (eg iterators) into Node structures (eg arrays) and generally lose a lot of overhead.

Also, make sure you have correct indices setup. That can be a HUGE difference in terms of performance in big databases.

What I would do then is something like (this should be considered pseudo code):

let converted = 0;
db.leads.find({},{id: 1, date: 1}).forEach(lead => {
    const hasInvoices = db.invoices.count({leadId: lead.id, date: {$lt: lead.date + 60}});
    converted ++;

});

To speed things up, I'd use the following index for this case:

db.invoices.createIndex({leadId: 1, date: -1});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM