简体   繁体   中英

Mongodb add fields from One collection to another collection based on few conditions for large number of documents

I ran into the below situation where I need to update large number of collections very frequently.

I have a collections like below

coll1
{
  "identification_id" : String,
  "name" : String,
  "mobile_number" : Number,
  "location" : String,
  "user_properties" : [Mixed types],
  "profile_url" : String
}

coll2
{
  "identification_id": String,
  "user_id" : String,
  "name" : String,
  "mobile_number" : Number,
  "location" : String,
  "user_properties" : String,
  "profile_url": String,
  "qualified_user" : String,
  "user_interest_stage" :Number,
  "source" : String,
  "fb_id" : String,
  "comments":String
}

updated coll1
{
  "identification_id": String,
  "name" : String,
  "mobile_number" : Number,
  "location" : String,
  "user_properties" : String,
  "profile_url": String,
  "qualified_user" : String,
  "user_interest_stage" :Number,
  "source" : String,
  "fb_id" : String,
  "comments":String
}

As you have seen coll1 and coll2, below will be inserted documents scenarios

  1. If user from coll1 is qualified based on some scenarios where he can show interest on products, I will create a record in coll2.
  2. Manually I can create a new record from API information in coll2
  3. Identification for coll1 in coll2 is user_id
  4. It is possible that there can be multiple records in coll2 for a record in coll1

Now due to some reasons, We are merging these collections into one collection, which is coll1. We have decided to update qualified visitor based on key 'qualified_user' and update corresponding user fields in coll1.

I have written a script, using Node JS and mongoose, which will fetch documents from coll1 and verify a qualified_user in coll2 and update based on below scenarios.

  1. If there is no qualified user update the document with default values of unqualified user
  2. If there is one qualified user copy the qualification documents from coll2 and update in coll1
  3. If there is multiple qualified user copy first document and update in coll1. for rest of documents in coll2 create a new document in coll1
  4. After processing all documents from coll1, process coll2 documents which are qualified from APIs and create a new document in coll1.

When I run this script, I am getting below error.

<--- JS stacktrace --->

==== JS stack trace =========================================

The number of documents in coll1 are 1L. Due to processing large number of collections I ran into this situation. So I have used skip and limit to process all the documents but it took 1hour to process all documents.

Is there any better way to handle these type of db updates for large number of collections?

You're trying to hold too many documents at once and it makes you run out of memory.

You have two easy options:

  1. Use Mongo's cursor to iterate over the results instead of fetching them all at once.
  2. Use --max-old-space-size flag when running you're script, with that you can manually set the amount of memory the script has access to, like so: node --max-old-space-size=4096 script.js

With that said both of these aren't optimal and assuming you're scale will keep increasing both will eventually not work. I personally recommend to re-think the data structure. Mongo by being an unstructured language does not handle data duplications well. This means you 'want' to keep all the data in one collection, then just update certain fields under certain conditions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM