简体繁体中英

How to execute a pipeline just once no matter how many blobs are created? (Azure Data Factory)

原文 2022-01-06 04:26:43 4 2 azure/ azure-blob-storage/ azure-data-factory/ azure-data-factory-2/ azure-triggers

I've created a Pipeline that's executed by a trigger every time a blob is created, the problem is that are scenarios where the process needs to upload multiple files at the same time, when it happens, the pipeline executes as many times as the number of blobs and it causes that the data is wrong. I tried to cofigure a Copy Data Activity in the main Pipeline in order to copy every blob created, but since this pipeline is inside the first one, it executes many times as well.

The expected result would be that the pipeline just execute once (no matter how many blob are created) and the copy data activity, be able to copy all the blobs in the folder.

Here is my Pipeline and trigger configuration:

The Copy Data configuration down below:

I've been trying for months, and here is the other try I did:

Microsoft Answers

Could you tell me what I'm doing wrong?

2 answers

What you can do is filter the copy activity source based on the property Filter by last modified , where you can specify a start time and end time in UTC.

you can try this Incrementally copy new and changed files based on LastModifiedDate by using the Copy Data tool

OR...

Here as per your scenario, just mention the Start time.

This start time is nothing but the last time a triggered pipeline run was executed! You can can get the Triggered pipeline run details using a REST API call Trigger Runs - Query By Factory .
Now you can choose to query the runs that were executed in the last x hours or to be safe in the last day based on how frequent you have the files created in Storage.
Next, from this result collect only triggerRunTimestamp and append to a array variable.
Find the Max or last run time using functions. Set this time as the StartTime in UTC for the copy activity source filter as explained at the start.

If this is feasible, I can spin an example pipeline.

Any reason why you are mapping your event trigger to the original path source where all files are being created and uploaded? Cant you create a dummy blob path at the end with a dummy file to have a final trigger once all files are uploaded to overcome this issue?

note: this is how we manage this:) but there is a redundant file generated unfortunately

how to download blobs using azure data factory

How to execute a trigger based on Blob created in Azure Data Factory?

How to delete a folder in an azure data factory pipeline?

How to Scheduling a Pipeline in Azure Data Factory

How to pass parameter to pipeline in azure data factory?

Azure Data Factory - How to disable a pipeline?

How to check if the pipeline exists in the Azure Data Factory?

Azure - Data Factory - New Pipeline Created

How do I execute a Azure Data Factory pipeline X times (for X distinct parameter values)?

How to execute a SQL query in Azure Data Factory

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question how to download blobs using azure data factory How to execute a trigger based on Blob created in Azure Data Factory? How to delete a folder in an azure data factory pipeline? How to Scheduling a Pipeline in Azure Data Factory How to pass parameter to pipeline in azure data factory? Azure Data Factory - How to disable a pipeline? How to check if the pipeline exists in the Azure Data Factory? Azure - Data Factory - New Pipeline Created How do I execute a Azure Data Factory pipeline X times (for X distinct parameter values)? How to execute a SQL query in Azure Data Factory

Related Tags

How to execute a pipeline just once no matter how many blobs are created? (Azure Data Factory)

Question

2 answers

solution1
0 2022-01-06 05:19:32

OR...

solution2
0 2022-01-07 05:45:02

How to execute a pipeline just once no matter how many blobs are created? (Azure Data Factory)

Question

2 answers

solution1 0 2022-01-06 05:19:32

OR...

solution2 0 2022-01-07 05:45:02

solution1
0 2022-01-06 05:19:32

solution2
0 2022-01-07 05:45:02