简体   繁体   English

如何使用 Azure Cosmos DB Spark 仅将不存在的记录写入 Cosmos DB?

[英]How to write only non existing records to Cosmos DB from using Azure Cosmos DB Spark?

I am using Databricks which writes the data from CSV file to Cosmos DB using Spark Connector.我正在使用 Databricks,它使用 Spark 连接器将 CSV 文件中的数据写入 Cosmos DB。 Now my Cosmos DB already contains few records, so when I run Databricks Notebooks, it should write only the records which doesn't exist in DB.现在我的 Cosmos DB 已经包含很少的记录,所以当我运行 Databricks Notebooks 时,它应该只写入 DB 中不存在的记录。 I tried with SaveMode.Ignore but doesn't help.我尝试使用 SaveMode.Ignore 但没有帮助。

df.write.mode(SaveMode.Ignore).cosmosDB(writeConfig)

Now ideally, SaveMode.Ignore should skip over the existing records and write the only ones which doesn't exist in DB but it is not happening.现在理想情况下, SaveMode.Ignore 应该跳过现有记录并写入数据库中不存在但没有发生的唯一记录。

It would be a great help if anyone has suggestions on how to achieve this.如果有人对如何实现这一目标提出建议,那将是一个很大的帮助。

Thanks.谢谢。

Create a container with unique key using some unique field from the CSV file.使用 CSV 文件中的一些唯一字段创建具有唯一键的容器。 After that you cannot add duplicate unique key values to Cosmos DB.之后,您无法将重复的唯一键值添加到 Cosmos DB。

More info: https://docs.microsoft.com/en-us/azure/cosmos-db/unique-keys更多信息: https://docs.microsoft.com/en-us/azure/cosmos-db/unique-keys

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM