简体   繁体   English

复制 CosmosDB 集合并按物理分区保留项目顺序的最有效方法是什么?

[英]What is the most efficient way to copy a CosmosDB collection and retain the order of items by physical partition?

I've tried many different combinations using Azure Data Factory to create a clone of a CosmosDB collection that maintains the order of items written to a partition, but unless I specify a batch write size of 1, it does not keep the order.我使用 Azure 数据工厂尝试了许多不同的组合来创建 CosmosDB 集合的克隆,该集合维护写入分区的项目的顺序,但除非我指定批量写入大小为 1,否则它不会保持顺序。 Even triggering from the Change Feed of the source in a mapping data flow does not preserve order.即使从映射数据流中源的更改源触发也不会保留顺序。 We have written a simple tool that copies a record at a time, but obviously, that is slow.我们编写了一个简单的工具,可以一次复制一条记录,但很明显,这很慢。

We are using Cosmos as an event store, and the change feed processor feeds our projectors - it all works really well, but we would like to copy the events out to a different environment to test changes.我们将 Cosmos 用作事件存储,更改馈送处理器为我们的投影仪提供数据 - 这一切都运行良好,但我们希望将事件复制到不同的环境中以测试更改。 This requires the original write order to be preserved.这需要保留原始写入顺序。

Thanks in advance.提前致谢。

The change feed processor does read from each physical partition in _ts order.更改馈送处理器确实以_ts顺序从每个物理分区中读取。

Certainly I've been able to use this to successfully copy very large collections (> 1TB) in a matter of a few hours.当然,我已经能够使用它在几个小时内成功复制非常大的 collections (> 1TB)。

For this I've used a function app scaled across multiple instances, ensured the leases collection has sufficient max RU configured to not become a bottle neck and when provisioning the target scaled up the RU sufficient to create the desired number of physical partitions up front rather than having the partitions split during the import.为此,我使用了跨多个实例扩展的 function 应用程序,确保租约集合具有足够的最大 RU 配置,不会成为瓶颈,并且在配置目标时扩大了 RU,足以预先创建所需数量的物理分区,而不是而不是在导入期间拆分分区。

I have always used bulk insert though so within each batch delivered by the change feed processor I guess the _ts could become disordered.不过,我一直使用批量插入,因此在更改馈送处理器交付的每个批次中,我猜_ts可能会变得无序。 This has never been important for me.这对我来说从来都不重要。

The most efficient way of copying the collection to a new one and preserving the _ts order would certainly be to restore a backup.将集合复制到新集合并保留_ts顺序的最有效方法当然是恢复备份。

It also has the benefit that you do not have to write any code and provision any resources to do it.它还具有您不必编写任何代码并提供任何资源来执行此操作的好处。 If you are not already using the continuous backup model you should consider switching to it as this allows the restore to be self service and to a specified point in time.如果您还没有使用连续备份 model,您应该考虑切换到它,因为这允许恢复是自助服务并可以在指定的时间点进行。

get a tool like cerebrata it will do copy between collections etc as you see fit, if you are doing a lot of Azure work specially with CosmosDB it is a very handly tool to use, I could not live without it these days.获得像 cerebrata 这样的工具,它会在 collections 等之间进行复制,如果你认为合适的话,如果你正在做很多 Azure 专门与 CosmosDB 一起工作,它是一个非常方便使用的工具,这些天我不能没有它。

Disclaimer: I do NOT work for cerebrata nor do I receive any benefit for recommending their tools its is purely based on my own experience.免责声明:我不为 cerebrata 工作,也没有因为推荐他们的工具而获得任何好处,这纯粹是基于我自己的经验。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 CosmosDB 跨分区查询批量操作的最有效方法 - Most efficient way to query across partition with CosmosDB for bulk operations 从Azure Document DB中的Partitioned Collection中的特定分区查询所有文档的最有效方法是什么? - What is the most efficient way to query all documents from a specific partition in Partitioned Collection in Azure Document DB? CosmosDB中的逻辑分区扫描是否始终以相同的顺序返回项目? - Does a logical partition scan in CosmosDB always returns items in the same order? 如何在 Azure CosmosDb 中获取物理分区列表,有没有办法获取物理分区列表? - How do I get list of Physical Partition in Azure CosmosDb, is there a way to get list of physical partitions? Azure CosmosDB-上传文档(大小,频率)的最有效方法 - Azure CosmosDB - Most efficient way to upload documents (size, frequency) CosmosDb - 物理分区放置,RU 影响 - CosmosDb - physical partition placement, RU impact Azure CosmosDB如何针对物理分区进行查询 - Azure CosmosDB how to query against a physical partition 由于在 Azure CosmosDB 中的物理分区上存在时间,文档存档会产生什么影响 - What is impact of document archival because of time to live on the physical partition in Azure CosmosDB 更改合成分区键值的最有效方法 - Most efficient way to change synthetic partition key values 获取CosmosDB集合中每个分区的大小 - Getting the size of each partition in a CosmosDB collection
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM