简体   繁体   中英

ADF Pipeline Adding Sequential Value in Copy Activity

Apologies if this has been asked and answered elsewhere. If it is, please do refer to the url in comments on replies. So here is the situation,

I am making an API Request, in response I get auth_token which I use in the Copy Activity as Authorization to retrieve data in JSON format and Sink it to Azure SQL Database. I am able to Map all the elements I'm receiving in JSON to the columns of Azure SQL Database. However, there are two columns ( UploadId and RowId ) that still need to be populated.

  • UploadId is a GUID which will be same for the whole batch of rows (this I've managed to solve)
  • RowId will be a sequence starting from 1 to end of that batch entry, and then for next batch (with new GUID value) it resets back to 1.

The database will look something like this,

| APILoadTime |      UploadId     |    RowId    |
|  2020-02-01 | 29AD7-12345-22EwQ |      1      |
|  2020-02-01 | 29AD7-12345-22EwQ |      2      |
|  2020-02-01 | 29AD7-12345-22EwQ |      3      |
|  2020-02-01 | 29AD7-12345-22EwQ |      4      |
|  2020-02-01 | 29AD7-12345-22EwQ |      5      |
--------------------------------------------------> End of Batch One / Start of Batch Two
|  2020-02-01 | 30AD7-12345-22MLK |      1      |
|  2020-02-01 | 30AD7-12345-22MLK |      2      |
|  2020-02-01 | 30AD7-12345-22MLK |      3      |
|  2020-02-01 | 30AD7-12345-22MLK |      4      |
|  2020-02-01 | 30AD7-12345-22MLK |      5      |
--------------------------------------------------> End of Batch Two and so on ... 

Is there a way in Azure Pipeline's Copy Activity to achieve this RowId behavior ... Or even if it's possible within Azure SQL Database.

Apologies for a long description, and Thank you in advance for any help! Regards

You need to use a Window Function to achieve this. ADF Data Flows have Window Transformation activities that are designed to do this exact thing.

Otherwise, you could load the data into a staging table and then use Azure SQL to window the data as you select it out like...

SELECT
    APILoadTime
    ,UploadId
    ,ROW_NUMBER() OVER (PARTITION BY UploadId ORDER BY APILoadTime) AS RowId
FROM dbo.MyTable;

Thanks a lot @Leon Yue and @JeffRamos, I've managed to figure out the solution, so posting it here for everyone else who might encounter the same situation,

The solution I found was to use a Stored Procedure within Azure Data Factory from where I call the Azure Data Flow Activity. This is the code I used for creating the RowId seed function,

CREATE PROCEDURE resetRowId
AS
BEGIN
    DBCC CHECKIDENT ('myDatabase', RESEED, 0)
END
GO

Once I have this Stored Procedure, all I did was something like this,

Azure 数据工厂管道重置 RowId

This does it for you, the reason I kept it 0 so that when new data comes in, it starts from 1 again ...

Hope this helps others too ...

Thank you all who helped in someway

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM