简体   繁体   中英

MongoDB and Mongoose: Nested Array of Document Reference IDs

I have been diving into a study of MongoDB and came across a particularly interesting pattern in which to store relationships between documents. This pattern involves the parent document containing an array of ids referencing the child document as follows:

//Parent Schema
export interface Post extends mongoose.Document {
  content: string;
  dateCreated: string;
  comments: Comment[];
}

let postSchema = new mongoose.Schema({
  content: {
    type: String,
    required: true
  },
  dateCreated: {
    type: String,
    required: true
  },
  comments: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Comment' }] //nested array of child reference ids
});

And the child being referenced:

//Child Schema
export interface Comment extends mongoose.Document {
  content: string;
  dateCreated: string;
}

let commentSchema = new mongoose.Schema({
  content: {
    type: String,
    required: true
  },
  dateCreated: {
    type: String,
    required: true
  }
});

This all seems fine and dandy until I go to send a request from the front end to create a new comment. The request has to contain the Post _id (to update the post) and the new Comment, which are both common to a request one would send when using a normal relational database. The issue appears when it comes time to write the new Comment to the database. Instead of one db write, like you would do in a normal relational database, I have to do 2 writes AND 1 read. The first write to insert the new Comment and retrieve the _id. Then a read to retrieve the Post by the Post _id sent with the request so I can push the new Comment _id to the nested reference array. Finally, a last write to update the Post back into the database.

This seems extremely inefficient. My question is two-fold:

  1. Is there a better/more efficient way to handle this relationship pattern (parent containing an array of child reference ids)?

  2. If not, what would be the benefit of using this pattern as opposed to A) storing the parent _id in a property on the child similar to a traditional foreign key, or B) taking advantage of MongoDB documents and storing an array of the Comments as opposed to an array of reference ids to the Comments.

Thanks in advance for your insight!

Regarding your first question:

You specifically ask for a better way to work with child-ids that are stored in the parent. I'm pretty sure that there is no better way to deal with this, if it has to be this pattern.

But this problem also exist in relational databases. If you want to save your post in a relational database (using that pattern), you also have to first create the comment, get its ID and then update the post. Granted, you can send all these tasks in a single request, which is probably more efficient than using mongoose, but the type of work that needs to be done is the same.

Regarding your second question:

The benefit over variant A is, that you can for example get the post, and instantly know how many comments it has, without asking the mongodb to go through probably hundrets of documents.

The benefit over variant B is, that you can store more references to comments in a single document (a single post), than whole comments, because of mongos 16MB document-size-limit.


The Downside however is the one you mentioned, that it's inefficient to maintain that structure. I take it, that this is only an example to showcase the scenario, so here is what i would do: I would decide on a case by case basis what to use.

  • If the document will be read a lot, and not much written to, AND it is unlikely to grow larger than 16MB: Embed the sub-document. this way you can get all the data in a single query.

  • If you need to reference the document from multiple other documents AND your data really must be consistent, then you have no choice but to reference it.

  • If you need to reference the document from multiple other documents BUT data-consitency is not that super important AND the restrictions from the first bulletpoint apply, then embed the sub-documents, and write code to keep your data consistent.

  • If you need to reference the document from multiple other documents, and they are written to a lot, but not read that often, you're probably better off referencing them, as this is easier to code, because you don't need to write code to sync duplicate data.

In this specific case (post/comment) referencing the parent from the child (letting the child know the parents _id ) is probably a good idea, because it's easier to maintain than the other way around, and the document might grow larger than 16MB if they were embedded directly. If i'd know for sure, that the document would NOT larger than over 16MB, embedding them would be better, because its faster to query the data that way

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM