We have a data lake container weith three folders a,b,c. Each folder has 3 files a1,a2,a3,b1,b2,b3,c1,C2,c3. Now we need to design a pipeline which will dynamically do incremental load from the folders to a blob stroarge with same name file as souce. Incremental load is implemented by me in dataflow. We have other dataflow dependancy as well so we can't use copy activity but dataflow. I am unable to integrate get metadata activity with the dataflow where I am expecting some help.
We have a data lake container weith three folders a,b,c. Each folder has 3 I tried with parameters and variables.But I did not got the desired output. I used get metadata child item. Then a foreach loop. Inside foreach I tried with another fireaceach to get the files. I have used an append variable to append the data. I have already implemented the upsert logic for a single table in dataflow. If I am passing second get matadata active output (inside foreach) to dataflow it does not accepts. The main problem I am facing is to integrate the dataflow with foreach in dataset level. Because the dataset of the dataflow will be dependent on get metadata's output.
Nested for-each is not possible in Azure data factory. Work around is to use execute pipeline inside for-each activity. To pass the output of metadata activity to dataflow, create the dataflow parameters and pass the value to that parameter. I tried to repro this scene in my environment, below is the approach.
Outer Pipeline:
@activity('Get Metadata1').output.childItems
@item().name
, to pass directory names as input to the child pipeline.Child Pipeline:
In child pipeline, another Get meta data activity is taken and in the dataset file path, container name is given and for folder, dataset parameter is created and value of pipeline parameter FolderName is passed. @pipeline().parameters.FolderName
Child items is selected as an argument in the field list. This activity will give the list of files that are available in the directory.
Then for-each activity is added and in items output of the meta data activity is given. @activity('Get_Metadata_inner').output.childItems
Inside for-each, dataflow is added.
Dataflow
In dataflow, parameter called filename is created.
In Source dataset, dataset parameter is created for filename and foldername as fileName and folderName respectively.
Then all other transformations are added in data flow.
In sink dataset of sink transformation, dataset parameter for folder is created and file name is left blank in dataset.
$filename.
@item().name
folderName (for both source and sink parameter): @pipeline().parameters.FolderName
In Parameters tab, filename value is given as @item().name
In this repro, simple select transformation is taken. This can be extended to any transformation in data flow. By this way, we can pass the values to dataflow.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.