简体   繁体   中英

MongoDB data integrity

I'm new to MongoDB, and the most difficult to understand is how to ensure data integrity.

I've got two collections Post -> Comment (One to many).

Is there a way to store number of comments for each post, without of using two phase commit ?

Post {
    _id,
    text,
    commentsNumber
}

Comment {
    _id,
    text,
    postId
}

When comment is added / removed, commentsNumber have to be incremented / decremented. And this is two requests to two different collections. Such as in MongoDB a write operation is atomic on the level of a single document, there is a chance that comment will be added / removed, but commentsNumber won't be updated or vise versa.

What are the techniques to guarantee integrity?

  • Run script every period of time to update commentsNumber ?
  • Not to store commentsNumber at all?

I doubt there is anything that can guarantee data integrity apart from the mentioned 2-phase commit. At least until announced v4 .

There are few things to minimise chances of get wrong counts. Combine insert and update into a single bulk . It will reduce chances that one of the operations fails on application side, since it is a single request.

Then check if nInserted === 1 , and nModified === 1 . Otherwise re-try or enqueue a re-calculation job for the given post id.

For re-tries it is essential to have retryable writes enabled, as you are going to use $inc on posts, which is quite far from idempotent operation.

Another option is to apply transactionless approach - a kind of a combination of "Run script every period of time to update commentsNumber" and "Not to store commentsNumber at all". You will need to keep timestamps of the last re-calculation job, and count new comments since the date.

Since you mentioned embedding comments within the Post is not a viable option for your use case and don't want to go with 2 phase commit,

I can think of below options:

  1. Creating a secondary index on postId attribute of Comment collection. And finally using count(...) function based on postId on Comment collection.

  2. The other option is to have a map-reduce job that stores the commentCount and postId in a new collection every time a Comment document is added.

In both the options you would not need to store commentNumbers in the Post document. One thing to note is, since the commentsCount is not part of Post document this would result in new query to mongo to read the count.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM