简体   繁体   English

更改 Azure Cosmos DB 的 API 对 Z206E3708AF0917CCA1ZE7 的 stream 支持有多可靠?

[英]How reliable is change stream support in Azure Cosmos DB’s API for MongoDB?

Description描述

I am working on an ASP.NET Core 3.1 web application which needs to track/respond on changes made to the MongoDB database hosted by Azure Cosmos DB (version 3.6). I am working on an ASP.NET Core 3.1 web application which needs to track/respond on changes made to the MongoDB database hosted by Azure Cosmos DB (version 3.6). For this purpose I am using the Change feed support .为此,我使用了Change feed support

The changes are pretty frequent: ~10 updates per second on a single entry in a collection.更改非常频繁:集合中的单个条目每秒更新约 10 次。

In order to track down changes made on the collection, I am dumping the affected entries to a file ( this is just for testing purposes ) with the following piece of code.为了追踪对集合所做的更改,我使用以下代码将受影响的条目转储到文件中(这仅用于测试目的)。

private async Task HandleChangeStreamAsync<T>(IMongoCollection<T> coll, StreamWriter file, CancellationToken cancellationToken = default)
{
    var pipeline = new EmptyPipelineDefinition<ChangeStreamDocument<T>>()
            .Match(change => change.OperationType == ChangeStreamOperationType.Insert || 
                             change.OperationType == ChangeStreamOperationType.Update || 
                             change.OperationType == ChangeStreamOperationType.Replace)
            .AppendStage<ChangeStreamDocument<T>, ChangeStreamDocument<T>, ChangeStreamOutputWrapper<T>>(
                  "{ $project: { '_id': 1, 'fullDocument': 1, 'ns': 1, 'documentKey': 1 }}");

    var options = new ChangeStreamOptions
    {
        FullDocument = ChangeStreamFullDocumentOption.UpdateLookup
    };

    using (var cursor = await coll.WatchAsync(pipeline, options, cancellationToken))
    {
        await cursor.ForEachAsync(async change =>
        {
            var json = change.fullDocument.ToJson(new JsonWriterSettings { Indent = true });
            await file.WriteLineAsync(json);
        }, cancellationToken);
    }
}

Issue问题

While observing the output, I have noticed that the change feed was not triggered for every update that was made to the collection.在观察 output 时,我注意到对集合进行的每次更新都不会触发更改提要。 I can confirm this by comparing the output generated against the database hosted by MongoDB Cloud.我可以通过将生成的 output 与 MongoDB 云托管的数据库进行比较来确认这一点。

Questions问题

  1. How reliable is change stream support in Azure Cosmos DB's API for MongoDB?更改 Azure Cosmos DB 的 API 对 Z206E3708AF0917CCA1ZE7 的 stream 支持有多可靠?

  2. Can the API guarantee that the most recent update will always be available? API 能否保证最新的更新始终可用?

  3. I was not able to process the 'oplog.rs' collection of the 'local' database on my own, does the API support this in any way?我无法自己处理“本地”数据库的“oplog.rs”集合,API 是否以任何方式支持这个? Is this even encouraged?这甚至受到鼓励吗?

  4. Is the collection throughput (RU/s) in some way related to the change event frequency?收集吞吐量 (RU/s) 是否以某种方式与更改事件频率相关?

Final thoughts最后的想法

My understanding is that frequent updates throttle the system and the change feed simply does not handle all of the events from the log (rather scans it periodically). 我的理解是频繁的更新会限制系统,并且更改源根本无法处理日志中的所有事件(而是定期扫描它)。 However, I am wondering how safe it is to rely on such mechanism and be sure not to miss any critical updates made to the database. 但是,我想知道依靠这种机制有多安全,并确保不会错过对数据库进行的任何关键更新。

If change feed support cannot make any guarantees regarding event handling frequency and there is no way to process 'oplog.rs', the only option seems to be periodic polling of the database.如果更改提要支持无法对事件处理频率做出任何保证并且无法处理“oplog.rs”,则唯一的选择似乎是定期轮询数据库。

Correct me if I am wrong, but switching to polling would greatly affect the performance and would lead to a solution which is not scalable.如果我错了,请纠正我,但切换到轮询会极大地影响性能,并会导致解决方案不可扩展。

I suspect that the MongoDB change stream is built on the Cosmos DB Change Feed.我怀疑 MongoDB 更改 stream 是基于 Cosmos DB Change Feed 构建的。 My experience is entirely with the Cosmos DB change feed;我的经验完全来自 Cosmos DB 更改提要; I haven't used the MongoDB API at all.我根本没有使用过 MongoDB API。 So this answer is all assuming that the MongoDB change stream internally uses the Cosmos DB Change Feed, which makes sense, but I could be wrong.所以这个答案都是假设 MongoDB 更改 stream 在内部使用 Cosmos DB Change Feed,这是有道理的,但我可能是错的。

How reliable is change stream support in Azure Cosmos DB's API for MongoDB?更改 Azure Cosmos DB 的 API 对 Z206E3708AF0917CCA1ZE7 的 stream 支持有多可靠?

It's fully reliable, but has some limitations.它是完全可靠的,但有一些限制。

One of the change feed limitations is that it can "batch" updates.更改提要的限制之一是它可以“批量”更新。 Internally, the change feed processor polls the change feed, and it will get all items that have changed.在内部,更改提要处理器轮询更改提要,它将获取所有已更改的项目。 However, if an item changes multiple times between polls, it will only show up in the change feed once.但是,如果一个项目在轮询之间多次更改,它只会在更改提要中显示一次。 This is the behavior of the Cosmos DB SQL API Change Feed, and I expect the same limitation applies to the MongoDB change stream, though I don't see it actually documented anywhere in the MongoDB docs. This is the behavior of the Cosmos DB SQL API Change Feed, and I expect the same limitation applies to the MongoDB change stream, though I don't see it actually documented anywhere in the MongoDB docs.

Another limitation is that deletes are not observed.另一个限制是没有观察到删除。

Because of these limitations, the change feed / change stream is not an event sourcing solution.由于这些限制,更改馈送/更改 stream不是事件溯源解决方案。 If you want event sourcing, then you'll need to model your data as events yourself;如果您想要事件源,那么您需要自己将数据作为事件 model; there's nothing built-in that will do that for you.没有任何内置功能可以为您做到这一点。

That said, within these limitations , it's fully reliable in the sense that your code will receive every changed document in the change feed.也就是说,在这些限制范围内,它是完全可靠的,因为您的代码将接收更改提要中的每个更改文档。 The limitations just mean that multiple updates may come across as a single changed document, and deleted documents do not come across at all.这些限制只是意味着多个更新可能会作为一个更改的文档出现,而删除的文档根本不会出现。

Can the API guarantee that the most recent update will always be available? API 能否保证最新的更新始终可用?

There's always the chance that the document has changed after your code retrieved the document from the change feed, in which case the updated document will be re-published to the change feed and your code will see it again in a bit.在您的代码从更改提要中检索文档后,文档总是有可能发生更改,在这种情况下,更新后的文档将重新发布到更改提要,您的代码稍后会再次看到它。 There's no guarantee (of course) that the document your code just got from the change feed is the same as what's in the db, but it will be eventually consistent. (当然)不能保证您的代码刚刚从更改提要中获得的文档与数据库中的文档相同,但最终会保持一致。

I was not able to process the 'oplog.rs' collection of the 'local' database on my own, does the API support this in any way?我无法自己处理“本地”数据库的“oplog.rs”集合,API 是否以任何方式支持这个? Is this even encouraged?这甚至受到鼓励吗?

¯\ (ツ) ¯\ (ツ)

Is the collection throughput (RU/s) in some way related to the change event frequency?收集吞吐量 (RU/s) 是否以某种方式与更改事件频率相关?

Yes.是的。 The change feed itself is built-in to Cosmos DB, but the change feed processing has an RU cost.更改提要本身内置于 Cosmos DB,但更改提要处理具有 RU 成本。 RUs are used by the change feed processor to poll the change feed, read documents from the change feed, and also update its "bookmark" to keep track of where in the change feed it is.变更提要处理器使用 RU 来轮询变更提要,从变更提要中读取文档,并更新其“书签”以跟踪它在变更提要中的位置。

My understanding is that frequent updates throttle the system and the change feed simply does not handle all of the events from the log (rather scans it periodically).我的理解是频繁的更新会限制系统,并且更改源根本无法处理日志中的所有事件(而是定期扫描它)。

That is correct.那是对的。

However, I am wondering how safe it is to rely on such mechanism and be sure not to miss any critical updates made to the database.但是,我想知道依靠这种机制有多安全,并确保不会错过对数据库进行的任何关键更新。

The code will always (eventually) receive the updated documents.代码将始终(最终)接收更新的文档。 However, if you need to see each change individually , then you will need to structure your data using something like event sourcing.但是,如果您需要单独查看每个更改,那么您将需要使用事件溯源之类的东西来构建数据。 If your app only cares about the final state of the documents, then the change feed is fine.如果您的应用只关心文档的最终 state,那么更改提要就可以了。 But if, eg, you need to know if someCriticalProperty was set to true and then back to false , then you'll need event sourcing.但是,例如,如果您需要知道someCriticalProperty是否设置为true然后又设置为false ,那么您将需要事件溯源。

switching to polling would greatly affect the performance and would lead to a solution which is not scalable.切换到轮询会极大地影响性能,并导致解决方案不可扩展。

Polling isn't necessarily bad.投票不一定是坏事。 The change feed processor uses polling, as described above.如上所述,更改馈送处理器使用轮询。 It also has a neat mechanism to allow scale-out, where different processors watching the same collection can split up the documents between them (by partition key);它还有一个允许横向扩展的简洁机制,其中不同的处理器观察同一个集合可以拆分它们之间的文档(通过分区键); I'm not sure if/how this would translate to the MongoDB world, but it's a pretty elegant solution for scaling SQL API change feed processors and works quite nicely with Azure Functions (unfortunately, there's no MongoDB change stream trigger for Azure Functions). I'm not sure if/how this would translate to the MongoDB world, but it's a pretty elegant solution for scaling SQL API change feed processors and works quite nicely with Azure Functions (unfortunately, there's no MongoDB change stream trigger for Azure Functions).

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM