[英]How to create dependency between activities of Pipeline for Azure Data Factory in Python
在 Azure DataFactory 管道中,我尝试让两个 CopyActivities 按顺序运行,即第一个将数据从 blob 复制到 SQL 表,然后第二个将 SQL 表复制到另一个数据库。
我尝试了下面的代码,但结果管道没有建立在活动上的依赖性(从工作流图和 JSON 在 Azure UI 中检查)。 当我运行管道时,我收到如下错误消息: “ErrorResponseException:模板验证失败:模板操作‘我的第二个活动 nameScope’的‘runAfter’属性在‘1’行和‘22521’列包含不存在的行动。balababla……”
在 Azure UI 中手动添加依赖项后,我可以成功运行管道。
如果有人能指出示例代码 (Python/C#/Powershell) 或文档,我将不胜感激。 我的 Python 代码:
def createDataFactoryRectStage(self,
aPipelineName, aActivityStageName, aActivityAcquireName,
aRectFileName, aRectDSName,
aStageTableName, aStageDSName,
aAcquireTableName, aAcquireDSName):
adf_client = self.__getAdfClient()
ds_blob = AzureBlobDataset(linked_service_name = LinkedServiceReference(AZURE_DATAFACTORY_LS_BLOB_RECT),
folder_path=PRJ_AZURE_BLOB_PATH_RECT,
file_name = aRectFileName,
format = {"type": "TextFormat",
"columnDelimiter": ",",
"rowDelimiter": "",
"nullValue": "\\N",
"treatEmptyAsNull": "true",
"firstRowAsHeader": "true",
"quoteChar": "\"",})
adf_client.datasets.create_or_update(AZURE_RESOURCE_GROUP, AZURE_DATAFACTORY, aRectDSName, ds_blob)
ds_stage= AzureSqlTableDataset(linked_service_name = LinkedServiceReference(AZURE_DATAFACTORY_LS_SQLDB_STAGE),
table_name='[dbo].[' + aStageTableName + ']')
adf_client.datasets.create_or_update(AZURE_RESOURCE_GROUP, AZURE_DATAFACTORY, aStageDSName, ds_stage)
ca_blob_to_stage = CopyActivity(aActivityStageName,
inputs=[DatasetReference(aRectDSName)],
outputs=[DatasetReference(aStageDSName)],
source= BlobSource(),
sink= SqlSink(write_batch_size = AZURE_SQL_WRITE_BATCH_SIZE))
ds_acquire= AzureSqlTableDataset(linked_service_name = LinkedServiceReference(AZURE_DATAFACTORY_LS_SQLDB_ACQUIRE),
table_name='[dbo].[' + aAcquireTableName + ']')
adf_client.datasets.create_or_update(AZURE_RESOURCE_GROUP, AZURE_DATAFACTORY, aAcquireDSName, ds_acquire)
dep = ActivityDependency(ca_blob_to_stage, dependency_conditions =[DependencyCondition('Succeeded')])
ca_stage_to_acquire = CopyActivity(aActivityAcquireName,
inputs=[DatasetReference(aStageDSName)],
outputs=[DatasetReference(aAcquireDSName)],
source= SqlSource(),
sink= SqlSink(write_batch_size = AZURE_SQL_WRITE_BATCH_SIZE),
depends_on=[dep])
p_obj = PipelineResource(activities=[ca_blob_to_stage, ca_stage_to_acquire], parameters={})
return adf_client.pipelines.create_or_update(AZURE_RESOURCE_GROUP, AZURE_DATAFACTORY, aPipelineName, p_obj)
以防万一有人像我一样遇到与这个老问题相同的问题,python 代码中有一个微妙的错误
更改 dep 以使用活动名称,而不是对活动对象的引用使它对我有用。
dep = ActivityDependency(aActivityStageName, dependency_conditions =[DependencyCondition('Succeeded')])
这是C#
中的一个示例,它基本上在管道内按顺序执行Chaining activities
和链接活动。 请记住,在 ADFV1 中,我们必须将一个活动的输出配置为另一个活动的输入,以将它们链接起来并使它们相互依赖。
管道代码片段(注意 dependsOn 属性,它确保第二个活动在第一个活动成功运行后运行) -
static PipelineResource PipelineDefinition(DataFactoryManagementClient client) {
Console.WriteLine("Creating pipeline " + pipelineName + "...");
PipelineResource resource = new PipelineResource {
Activities = new List < Activity > {
new CopyActivity {
Name = copyFromBlobToSQLActivity,
Inputs = new List < DatasetReference > {
new DatasetReference {
ReferenceName = blobSourceDatasetName
}
},
Outputs = new List<DatasetReference>
{
new DatasetReference {
ReferenceName = sqlDatasetName
}
},
Source = new BlobSource {},
Sink = new SqlSink {}
},
new CopyActivity {
Name = copyToSQLServerActivity,
Inputs = new List < DatasetReference > {
new DatasetReference {
ReferenceName = sqlDatasetName
}
},
Outputs = new List<DatasetReference>
{
new DatasetReference {
ReferenceName = sqlDestinationDatasetName
}
},
Source = new SqlSource {},
Sink = new SqlSink {},
DependsOn = new List < ActivityDependency > {
new ActivityDependency {
Activity = copyFromBlobToSQLActivity,
DependencyConditions = new List < String > {
"Succeeded"
}
}
}
}
}
};
Console.WriteLine(SafeJsonConvert.SerializeObject(resource, client.SerializationSettings));
return resource;
}
请在此处查看 ADFV2 教程以获得全面的解释和更多场景。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.