[英]How to improve time to execute logging stored procedure in Azure Data Factory?
I have an ADF solution which is metadata driven.我有一个元数据驱动的 ADF 解决方案。 It passes a connection string, and source and sink as parameters.
它传递一个连接字符串,以及源和接收器作为参数。 My concern is that I also have SQL logging steps within pipelines and child pipelines and now for a simple Azure DB table copy into ADSL Parquet it is bottlenecked by the logging steps and child pipelines.
我担心的是,我在管道和子管道中也有 SQL 日志记录步骤,现在将简单的 Azure 数据库表复制到 ADSL Parquet 中,它受到日志记录步骤和子管道的瓶颈。 I noticed that each step (mainly logging steps) take around 3-6 seconds.
我注意到每个步骤(主要是记录步骤)大约需要 3-6 秒。
I have tried the following:我尝试了以下方法:
Nothing seems to reduce the time to run these audit steps.似乎没有什么可以减少运行这些审计步骤的时间。
The audit step is a stored procedure which you pass in a load of parameters.审计步骤是一个存储过程,您可以在其中传递大量参数。 This proc run in split seconds in SSMS so the proc isn't the issue.
此 proc 在 SSMS 中分秒运行,因此 proc 不是问题。
Is there any way of reducing the time to execute logging steps?有什么方法可以减少执行记录步骤的时间吗?
As per the Microsoft SLA for ADF, they guarantee that at least 99.9% of the time, all activity runs will initiate within 4 minutes of their scheduled execution times.根据适用于 ADF 的 Microsoft SLA,他们保证至少 99.9% 的时间,所有活动运行将在其计划执行时间的 4 分钟内启动。
As stated from the Product Team, any stored procedure activity that performs within 4 minutes has met SLA within ADF--this SLA covers the overhead of ADF communicating with SQL server.正如产品团队所述,任何在 4 分钟内执行的存储过程活动都符合 ADF 中的 SLA——该 SLA 涵盖了 ADF 与 SQL 服务器通信的开销。 With that said, your current performance within ADF is normal.
话虽如此,您目前在 ADF 中的表现是正常的。
Check the supported documents:检查支持的文档:
https://azure.microsoft.com/en-gb/support/legal/sla/data-factory/v1_2/ https://azure.microsoft.com/en-gb/support/legal/sla/data-factory/v1_2/
https://docs.microsoft.com/en-us/answers/questions/36323/adf-performance-troubleshooting.html https://docs.microsoft.com/en-us/answers/questions/36323/adf-performance-troubleshooting.html
I tend to think adding excessive logging into pipelines is a bit of an anti-pattern as it leads to more complex pipelines with more components to maintain and this type of issue arises, particularly if you're running the logging inside a For Each activity for example.我倾向于认为在管道中添加过多的日志记录是一种反模式,因为它会导致更复杂的管道需要维护更多的组件,并且会出现这种类型的问题,特别是如果您在 For Each 活动中运行日志记录例子。 Most of the information you need should be in the built-in logging which you can harvest via API calls either at the end of your pipelines or elsewhere, as ably described here .
您需要的大部分信息都应该在内置日志记录中,您可以通过 API 调用在管道末端或其他地方收集这些信息,如此处所述。
The other thing you could do is move any non-critical tasks to run in parallel.您可以做的另一件事是将任何非关键任务移动到并行运行。 For example a 'log start of activity' task does not necessarily need to run first sequentially.
例如,“记录活动开始”任务不一定需要首先按顺序运行。 Consider making some of these tasks non-parallel, as explained in the diagram below.
考虑使其中一些任务不并行,如下图所示。 Obviously this does not apply when you need to capture information from the activity to log.
显然,当您需要从活动中捕获信息以进行记录时,这并不适用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.