简体繁体 English

如何提高在 Azure 数据工厂中执行日志存储过程的时间？

[英]How to improve time to execute logging stored procedure in Azure Data Factory?

原文 2021-12-13 15:58:34 3 2 sql-server/ azure-data-factory/ azure-data-factory-2/ azure-data-factory-pipeline

I have an ADF solution which is metadata driven.我有一个元数据驱动的 ADF 解决方案。 It passes a connection string, and source and sink as parameters.它传递一个连接字符串，以及源和接收器作为参数。 My concern is that I also have SQL logging steps within pipelines and child pipelines and now for a simple Azure DB table copy into ADSL Parquet it is bottlenecked by the logging steps and child pipelines.我担心的是，我在管道和子管道中也有 SQL 日志记录步骤，现在将简单的 Azure 数据库表复制到 ADSL Parquet 中，它受到日志记录步骤和子管道的瓶颈。 I noticed that each step (mainly logging steps) take around 3-6 seconds.我注意到每个步骤（主要是记录步骤）大约需要 3-6 秒。

I have tried the following:我尝试了以下方法：

upgrading the config database from basic to S1将配置数据库从基本升级到 S1
changing the ADF's integration runtime to 32 core count将 ADF 的集成运行时更改为 32 个核心数
changing the TTL to 20 mins将 TTL 更改为 20 分钟
checked the quick cache检查快速缓存

Nothing seems to reduce the time to run these audit steps.似乎没有什么可以减少运行这些审计步骤的时间。

The audit step is a stored procedure which you pass in a load of parameters.审计步骤是一个存储过程，您可以在其中传递大量参数。 This proc run in split seconds in SSMS so the proc isn't the issue.此 proc 在 SSMS 中分秒运行，因此 proc 不是问题。

Is there any way of reducing the time to execute logging steps?有什么方法可以减少执行记录步骤的时间吗？

2 个解决方案

As per the Microsoft SLA for ADF, they guarantee that at least 99.9% of the time, all activity runs will initiate within 4 minutes of their scheduled execution times.根据适用于 ADF 的 Microsoft SLA，他们保证至少 99.9% 的时间，所有活动运行将在其计划执行时间的 4 分钟内启动。

As stated from the Product Team, any stored procedure activity that performs within 4 minutes has met SLA within ADF--this SLA covers the overhead of ADF communicating with SQL server.正如产品团队所述，任何在 4 分钟内执行的存储过程活动都符合 ADF 中的 SLA——该 SLA 涵盖了 ADF 与 SQL 服务器通信的开销。 With that said, your current performance within ADF is normal.话虽如此，您目前在 ADF 中的表现是正常的。

Check the supported documents:检查支持的文档：

I tend to think adding excessive logging into pipelines is a bit of an anti-pattern as it leads to more complex pipelines with more components to maintain and this type of issue arises, particularly if you're running the logging inside a For Each activity for example.我倾向于认为在管道中添加过多的日志记录是一种反模式，因为它会导致更复杂的管道需要维护更多的组件，并且会出现这种类型的问题，特别是如果您在 For Each 活动中运行日志记录例子。 Most of the information you need should be in the built-in logging which you can harvest via API calls either at the end of your pipelines or elsewhere, as ably described here .您需要的大部分信息都应该在内置日志记录中，您可以通过 API 调用在管道末端或其他地方收集这些信息，如此处所述。

The other thing you could do is move any non-critical tasks to run in parallel.您可以做的另一件事是将任何非关键任务移动到并行运行。 For example a 'log start of activity' task does not necessarily need to run first sequentially.例如，“记录活动开始”任务不一定需要首先按顺序运行。 Consider making some of these tasks non-parallel, as explained in the diagram below.考虑使其中一些任务不并行，如下图所示。 Obviously this does not apply when you need to capture information from the activity to log.显然，当您需要从活动中捕获信息以进行记录时，这并不适用。