简体   繁体   中英

Retry of batch operation in Azure Table

I am trying insert data from Azure data lake store to Azure table through Azure Data Factory. The data in Azure Data Lake file is in same schema to that of final Azure Table sink.

The ADF pipeline consist of single copy activity to copy from Azure Data Lake store to Azure Table. But the ADF pipeline is failing at times due to throttling. I cannot afford to rerun the complete pipeline as it takes hours.

I wanted to retry only the failed batch. But I don't see that as as option provided in Azure Table.

I found SinkRetryCount and SinkRetryWait as two parameters for AzureTableSink class, but I guess(since the doc doesn't mention properly) that would be for the complete pipeline.

I have two questions:

  1. What does SinkRetryCount and SinkRetryWait actually mean?
  2. Is there a ways to retry a batch if it fails either through setting of parameters or making a different activity graph in ADF pipeline?

Have you tried below:

  • If your process ensures a clean state as the very first step, similar to Command Design pattern's undo (but more naive), then your process can re-execute.

    • With #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.
    • this is an ADFv1 or v2 compatible approach

Reference: https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-create-pipelines

  • If ADFv2, then you have more options and can have more complex logic to handle errors:

    • for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.

    • you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.

  • You can also use asynchronous communication to future process executions that save success to a central store. Then later executions “if” I already was successful then stop processing before the activity.

    • this is powerful for more generalized pipelines, since you can choose where to begin

Check retries at ee retry at https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-pipelines .

Retry: Number of retries before the data processing for the slice is marked as Failure. Activity execution for a data slice is retried up to the specified retry count. The retry is done as soon as possible after the failure.

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM