How to write only non existing records to Cosmos DB from using Azure Cosmos DB Spark?

Question

I am using Databricks which writes the data from CSV file to Cosmos DB using Spark Connector. Now my Cosmos DB already contains few records, so when I run Databricks Notebooks, it should write only the records which doesn't exist in DB. I tried with SaveMode.Ignore but doesn't help.

df.write.mode(SaveMode.Ignore).cosmosDB(writeConfig)

Now ideally, SaveMode.Ignore should skip over the existing records and write the only ones which doesn't exist in DB but it is not happening.

It would be a great help if anyone has suggestions on how to achieve this.

Thanks.

Answer 1

Create a container with unique key using some unique field from the CSV file. After that you cannot add duplicate unique key values to Cosmos DB.

More info: https://docs.microsoft.com/en-us/azure/cosmos-db/unique-keys

How to write only non existing records to Cosmos DB from using Azure Cosmos DB Spark?

Question

1 answers

solution1
1 2020-07-10 02:49:54

How to write only non existing records to Cosmos DB from using Azure Cosmos DB Spark?

Question

1 answers

solution1 1 2020-07-10 02:49:54

solution1
1 2020-07-10 02:49:54