简体   繁体   中英

CosmosDB - Mongodb IsUpsert not working for bulk updates

We have been using the MongoDB API with CosmosDB (Server v3.6) quite extensively the last few months via .NET Core and latest MongoDB.Driver Nuget package (2.11.0).

Bulk Inserts and inserts work fine, but unfortunately, I cannot get bulk operations to work with the IsUpsert=true mode.

Note:

  • We use Polly to manage rate limiting. As part of this, we handle MongoWriteException, MongoExecutionTimeoutException, MongoCommandException and MongoBulkWriteExceptions .
  • This issue can be observed for both sharded/non-sharded collections.

Specifically, given a list of non-sharded input documents List<T> documents , the following works fine:

  1. Bulk Insert:

     await Collection.BulkWriteAsync(documents.Select(s => new InsertOneModel<T>(s)),...)
  2. Bulk Update:

     await Collection.BulkWriteAsync(documents.Select(s => new ReplaceOneModel<T>(Builders<T>.Filter.Eq("Id", item.Id), item) { IsUpsert = false }),...)

Unfortunately, if some of the documents are new documents, we should be able to use the bulk update code above as is - but simply set the IsUpsert flag to true... but alas, this isn't working.

Specifically, given 50 existing documents and 50 new documents :

  • If document has Id of type ObjectId as primary key, for the first new document it processes, CosmosDb will incorrectly insert it with Id=ObjectId("000000000000000000000000") - and at that point no further documents will be inserted/updated. In this scenario:
    • BulkWriteResult returned with MatchedCount=65, ModifiedCount=65, ProcessedRequests=100, RequestCount=100, Upserts=1, IsAcknowledged=true, IsModifiedCountAvailable=true, InsertedCount=0
    • No exception thrown.
    • Note - only 51 documents are in the database, so one can't rely on BulkWriteResult
  • If document has Id of type int as primary key then cosmos db seems to
    • give up processing documents at some random point. This appears more of a rate-limiting type of scenario... EXCEPT that no exception is thrown .
    • For example, update all 50 documents but only inserted 8. In this case, BulkWriteResult returns with MatchedCount=50, ModifiedCount=50, ProcessedRequests=100, RequestCount=100, Upserts=8, IsAcknowledged=true, IsModifiedCountAvailable=true, InsertedCount=0 .

What am I missing? The ObjectId scenario seems totally broken; the other scenario could be coded around but it doesn't seem correct that no exception raised here.

For anyone else plaqued by this issue - the workarounds were far from straightforward, but here's what I ended up doing.

  • Documents with primary key of ObjectId : This scenario works consistently in in both CosmosDB and MongoDB as long as you either use 'InsertOneModel<>' or ReplaceOneModel<> based on the value of the identifier being ObjectId.Empty or not. However, you may still have to deal with below mentioned off by one error .
  • Documents with primary key other than ObjectId : Definitely a bug in CosmosDb as I couldn't reproduce this scenario in official MongoDB implementation. To fix this, I had to apply the following two workarounds:
    • Throw a custom exception and update my existing Polly policies to retry the unprocessed requests just like how I normally do with the other MongoDB rate limiting exceptions typically thrown by CosmosDB . Sample code:
     BulkWriteResult<T> bulkWriteResult = await Collection.BulkWriteAsync( remainingWork, new BulkWriteOptions { BypassDocumentValidation = true }, token); var actuallyProcessed = bulkWriteResult.DeletedCount + bulkWriteResult.InsertedCount + bulkWriteResult.ModifiedCount + bulkWriteResult.Upserts?.Count; if (actuallyProcessed < bulkWriteResult.ProcessedRequests.Count) { // Off by one error: OCCASIONALLY, the last one processed is not actually processed // No way to detect this, unfortunately - hence the adjustment by 1 actuallyProcessed = actuallyProcessed > 1? actuallyProcessed - 1: 0; var processed = bulkWriteResult.ProcessedRequests.Take((int)actuallyProcessed).ToList().AsReadOnly(); var unprocessed = bulkWriteResult.ProcessedRequests.Skip((int)actuallyProcessed).ToList().AsReadOnly(); throw new CosmosDbRateLimitingBugException<T>(unprocessed, processed, bulkWriteResult); }
  • Off by one Error handling . Not sure if this is needed in pure MongoDB implementations, but just like above, you also have to sometimes adjust processed records by 1. Note: This issue apply regardless of using 'IsUpsert=true'. Below code is a slightly simplified as I use Polly.Context to keep track of exceptions and processed/unprocessed records (not shown). Here remainingWork is the WriteModel<T> requests that have to be issued to next BulkWriteAsync<> call.
if (exception is MongoBulkWriteException<T> mostRecentException)
{
    var unProcessedRequests =
        mostRecentException.UnprocessedRequests.ToList();
    if (mostRecentException.WriteErrors.Any())
    {
        //get processed requests (without success) that failed and add to remainingWork
        var requestWithError = new[]
            {
                mostRecentException.Result.ProcessedRequests[
                    mostRecentException.WriteErrors[0].Index]
            };
        unProcessedRequests = unProcessedRequests.Concat(requestWithError).ToList();
    }

    remainingWork = unProcessedRequests.ToList();
}
else if (exception is CosmosDbRateLimitingBugException<T> cosmosDbBug)
{
    remainingWork = cosmosDbBug.UnprocessedRequests;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM