简体   繁体   中英

How to drop duplicates in source data set (JSON) and load data into azure SQL DB in azure data factory

I have a table in SQL DB with primary key fields. Now i am using a copy activity in azure data factory with source dataset(JSON).

We are writing this data into sink dataset(SQL DB) but the pipeline is failing with the below error

"message": "'Type=System.Data.SqlClient.SqlException,Message=Violation of 
 PRIMARY KEY constraint 'PK__field__399771B9251AD6D4'. Cannot 
 insert duplicate key in object 'dbo.crop_original_new'. The 
 duplicate key value is (9161, en).\r\nThe statement has been 
 terminated.,Source=.Net SqlClient Data Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors= 
[{Class=14,Number=2627,State=1,Message=Violation of PRIMARY KEY 
constraint 'PK__field__399771B9251AD6D4'. Cannot insert 
duplicate key in object 'Table'. The duplicate key value is 
(9161, en).,},{Class=0,Number=3621,State=0,Message=The statement has 
been terminated.,},],'",

Well, the finest solution would be:

  • Create a staging table in your SQL environment stg_table (this table should have a different key policy)
  • Load data from JSON source to stg_table
  • Write a stored procedure to clean data from duplicates and to load into your destination table

Or if you are familiar with Mapping Data Flows in ADF you can check this article by Mark Kromer

You can use Fault tolerance setting provided in copy activity to skip incompatible rows.

Setting image

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM