Azure Data Factory get data for “For Each”component from query

Question

The situation is as follows: I have a table in my database that recieves about 3 million rows each day. We want to archive this table on a regular base, so that only the 8 most recents weeks are in the table. The rest of the data can be archived tot AZure Data lake. I allready found out how to do this by one day at a time. But now I want to run this pipeline each week for the first seven days in the table. I assume I should do this with the "For Each" component. It should itterate along the seven distinct dates that are present in the dataset I want to backup. This dataset is copied from the source table to an archive table on forehand. It's not difficult to get the distinct dates with a SQL query, but how to get the result of this query into an array that is used for the "For Each" component?

Answer 1

The issue is solved thanks to a co-worker. What we have to do is assign a parameter to the dataset of the sink. Does not matter how you name this and you do not have to assign a value to it. But let's assume this parameter is called "date" After that you can use this parameter in the filename of the sink (also in dataset) with by using "@dataset().Date". After that you go back to the copyactivity and in the sink you assign a dataset property to @item().DateSelect. (DateSelect is the field name from the array that is passed to the For Each activity)

See also the answer from Bo Xioa as part of the answer

This way it works perfectly. It's just a shame that this is not well documented

Answer 2

You can use lookup activity to fetch the column content, and the output will be like

{
"count": "2",
"value": [
    {
        "Id": "1",
        "TableName" : "Table1"
    },
    {
        "Id": "2",
        "TableName" : "Table2"
    }
]
}

Then you can pass the value array to the Foreach activity items field by using the pattern of @activity('MyLookupActivity').output.value

ref doc: Use the Lookup activity result in a subsequent activity

Answer 3

I post this as an answer, because the error does not fit into a comment :D

have seen antoher option to accomplish this. That is by executing a pipeline from another pipeline. And in that way I can define the dates that I should iterate over as a parameter in the second pipeline (docs.microsoft.com/en-us/azure/data-factory/…). But unformtunately this leads to the same rsult as when just using the foreach parameter. Because in the filename of my data lake file I have to use: @{item().columname}. I can see in the monitoring view that the right values are passed in the iteration steps, but I keep getting an error:

{ "errorCode": "2200", "message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The request to 'Unknown' failed and the status code is 'BadRequest', request id is ''. {\\"error\\":{\\"code\\":\\"BadRequest\\",\\"message\\":\\"A potentially dangerous Request.Path value was detected from the client (:). Trace: cf3b4c3f-1681-4073-b225-17e1c07ec76d Time: 2018-08-02T05:16:13.2141897-07:00\\"}} ,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=System,'", "failureType": "UserError", "target": "CopyDancerDatatoADL" }

Azure Data Factory get data for “For Each”component from query

Question

3 answers

solution1
2 ACCPTED 2018-08-06 12:45:59

solution2
0 2018-07-31 09:57:00

solution3
0 2018-08-02 12:39:28

Azure Data Factory get data for “For Each”component from query

Question

3 answers

solution1 2 ACCPTED 2018-08-06 12:45:59

solution2 0 2018-07-31 09:57:00

solution3 0 2018-08-02 12:39:28

solution1
2 ACCPTED 2018-08-06 12:45:59

solution2
0 2018-07-31 09:57:00

solution3
0 2018-08-02 12:39:28