简体   繁体   中英

Mongodb update guarantee using w=0

I have a large collection with more that half a million of docs, which I need to updated continuously. To achieve this, my first approach was to use w=1 to ensure write result, which causes a lot of delay.

collection.update(
    {'_id': _id},
    {'$set': data},
    w=1
)

So I decided to use w=0 in my update method, now the performance got significantly faster.

Since my past bitter experience with mongodb, I'm not sure if all the update are guaranteed when w=0 . My question is, is it guaranteed to update using w=0 ?

Edit : Also, I would like to know how does it work? Does it create an internal queue and perform update asynchronously one by one? I saw using mongostat , that some update is being processed even after the python script quits. Or the update is instant?

Edit 2 : According to the answer of Sammaye, link , any error can cause silent failure. But what happens if a heavy load of updates are given? Does some updates fail then?

No, w=0 can fail, it is only:

http://docs.mongodb.org/manual/core/write-concern/#unacknowledged

Unacknowledged is similar to errors ignored; however, drivers will attempt to receive and handle network errors when possible.

Which means that the write can fail silently within MongoDB itself.

It is not reliable if you wish to specifically guarantee. At the end of the day if you wish to touch the database and get an acknowledgment from it then you must wait, laws of physics.

Does w:0 guarantee an update?

As Sammaye has written: No, since there might be a time where the data is only applied to the in memory data and is not written to the journal yet. So if there is an outage during this time, which, depending on the configuration, is somewhere between 10 (with j:1 and the journal and the datafiles living on separate block devices) and 100ms by default, your update may be lost.

Please keep in mind that illegal updates (such as changing the _id of a document) will silently fail.

How does the update work with w:0 ?

Assuming there are no network errors, the driver will return as soon it has send the operation to the mongod/mongos instance with w:0 . But let's look a bit further to give you an idea on what happens under the hood.

Next, the update will be processed by the query optimizer and applied to the in memory data set. After sucessful application of the operation a write with write concern w:1 would return now. The operations applied will be synced to the journal every commitIntervalMs , which is divided by 3 with write concern j:1 . If you have a write concern of {j:1} , the driver will return after the operations are stored in the journal successfully . Note that there are still edge cases in which data which made it to the journal won't be applied to replica set members in case a very "well" timed outage occurs now.

By default, every syncPeriodSecs , the data from the journal is applied to the actual data files.

Regarding what you saw in mongostat: It's granularity isn't very high, you might well we operations which took place in the past. As discussed, the update to the in memory data isn't instant, as the update first has to pass the query optimizer.

Will heavy load make updates silently fail with w:0 ?

In general, it is safe to say "No." And here is why:

For each connection, there is a certain amount of RAM allocated. If the load is so high that mongo can't allocate any further RAM, there would be a connection error – which is dealt with, regardless of the write concern, except for unacknowledged writes.

Furthermore, the application of updates to the in memory data is extremely fast - most likely still faster than they come in in case we are talking of load peaks . If mongod is totally overloaded (eg 150k updates a second on a standalone mongod with spinning disks), problems might occur, of course, though even that usually is leveraged from a durability point of view by the underlying OS.

However, updates still may silently disappear in case of an outage when the write concern is w:0,j:0 and the outage happens in the time the update is not synced to the journal.

Notes:

  1. The optimal balance between maximum performance and minimal guaranteed durability is a write concern of j:1 . With a proper setup, you can reduce the latency to slightly over 10ms.
  2. To further reduce the latency/update, it might be worth having a look at bulk write operations , if those apply to your use case. In my experience, they do more often than not. Please read and try before dismissing the idea.
  3. Doing write operations with w:0,j:0 is highly discouraged in case you expect any guarantee on data durability. Use at your own risk. This write concern is only meant for "cheap" data, which is easy to reobtain or where speed concern exceeds the need for durability. Collecting real time weather data in a large scale would be an example – the system still works, even if one or two data points are missing here and there. For most applications, durability is a concern. Conclusion: use w:1,j:1 at least for durable writes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM