简体   繁体   English

重试 Azure 表中的批处理操作

[英]Retry of batch operation in Azure Table

I am trying insert data from Azure data lake store to Azure table through Azure Data Factory.我正在尝试通过 Azure 数据工厂将数据从 Azure 数据湖存储插入 Azure 表。 The data in Azure Data Lake file is in same schema to that of final Azure Table sink. Azure Data Lake 文件中的数据与最终 Azure 表接收器的架构相同。

The ADF pipeline consist of single copy activity to copy from Azure Data Lake store to Azure Table. ADF 管道包含从 Azure Data Lake 存储复制到 Azure 表的单个复制活动。 But the ADF pipeline is failing at times due to throttling.但是由于节流,ADF 管道有时会失败。 I cannot afford to rerun the complete pipeline as it takes hours.我不能重新运行完整的管道,因为它需要几个小时。

I wanted to retry only the failed batch.我只想重试失败的批次。 But I don't see that as as option provided in Azure Table.但我不认为这是 Azure 表中提供的选项。

I found SinkRetryCount and SinkRetryWait as two parameters for AzureTableSink class, but I guess(since the doc doesn't mention properly) that would be for the complete pipeline.我发现SinkRetryCountSinkRetryWait作为 AzureTableSink 类的两个参数,但我猜(因为文档没有正确提及)这将用于完整的管道。

I have two questions:我有两个问题:

  1. What does SinkRetryCount and SinkRetryWait actually mean? SinkRetryCount 和 SinkRetryWait 实际上是什么意思?
  2. Is there a ways to retry a batch if it fails either through setting of parameters or making a different activity graph in ADF pipeline?如果批处理失败,是否可以通过设置参数或在 ADF 管道中制作不同的活动图来重试批处理?

Have you tried below:您是否尝试过以下方法:

  • If your process ensures a clean state as the very first step, similar to Command Design pattern's undo (but more naive), then your process can re-execute.如果您的流程在第一步确保干净状态,类似于命令设计模式的撤消(但更幼稚),那么您的流程可以重新执行。

    • With #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.使用 #1,您可以安全地在管道活动中使用“重试”,并且重试之间有足够的时间。
    • this is an ADFv1 or v2 compatible approach这是 ADFv1 或 v2 兼容的方法

Reference: https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-create-pipelines参考: https : //docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-create-pipelines

  • If ADFv2, then you have more options and can have more complex logic to handle errors:如果是 ADFv2,那么您有更多选择,可以有更复杂的逻辑来处理错误:

    • for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.对于失败的活动,将其包装在一个直到成功的循环中,并确保在执行时包含一个界限。

    • you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.您可以在循环中添加更多活动来处理故障并记录、通知或解决由于您无法控制的外部因素而导致的已知故障情况。

  • You can also use asynchronous communication to future process executions that save success to a central store.您还可以对未来的流程执行使用异步通信,从而将成功保存到中央存储。 Then later executions “if” I already was successful then stop processing before the activity.然后稍后执行“如果”我已经成功然后在活动之前停止处理。

    • this is powerful for more generalized pipelines, since you can choose where to begin这对于更通用的管道非常有用,因为您可以选择从哪里开始

Check retries at ee retry at https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-pipelines .https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-pipelines上的 ee retry 检查重试。

Retry: Number of retries before the data processing for the slice is marked as Failure. Activity execution for a data slice is retried up to the specified retry count. The retry is done as soon as possible after the failure.

Hope it helps.希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对 Azure 存储表进行批量操作? - How to do batch operation on Azure Storage Table? 具有不同分区键的azure表批处理操作 - azure table batch operation with different parition key 在Azure表上读取重试 - Read retry on Azure Table 执行Azure Table Storage批删除时,“操作的意外响应代码:0” - “Unexpected Response Code for Operation: 0” when executing Azure Table Storage batch delete Azure 表存储,帮我弄清楚批量操作的幕后发生了什么 - Azure table storage, help me clarify what's happening behind the scenes in batch operation 如何在 Azure EventGrid 中的一批事件中触发重试/死信特定事件 - How to trigger retry/deadletter specific events in a batch of events in Azure EventGrid Azure存储表中的批量插入 - Batch Insert in Azure storage table 如何使用 Azure.Storage.Blobs 程序集对 Azure blob 存储操作设置重试策略? - How do I set a retry policy on an Azure blob storage operation using the Azure.Storage.Blobs assembly? Azure表存储,如何避免连接错误(我尝试重试) - Azure Table Storage, How to avoid connection error (I tried retry) 当应用于Azure存储库ListBlobs()的结果时,是否有办法使foreach循环的MoveNext()部分重试该操作? - Is there a way to make the MoveNext() part of foreach loop retry the operation when applied to result of Azure Storage Library ListBlobs()?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM