简体   繁体   中英

Dynamic Azure Data Factory v2 pipelines

So we've got a factory with ~400 datasets and ~200 pipelines and it's getting unwieldy. Focusing on copying from sql source to blob sink. Since we are copying to blob the schema has no impact. I'd like to have one dataset for each source, one dataset for each blob account and one pipeline for each combination of source/blob account, dynamically feeding it the config from a lookup.

We've successfully developed a pipeline that uses dummy datasets for source and sink. It works if you feed it a query, container name and folder name.

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "DynamicCopy",
                "type": "Copy",
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "typeProperties": {
                    "source": {
                        "type": "SqlSource",
                        "sqlReaderQuery": "select 1 a"
                    },
                    "sink": {
                        "type": "BlobSink"
                    },
                    "enableStaging": false,
                    "dataIntegrationUnits": 0
                },
                "inputs": [
                    {
                        "referenceName": "AzureSql",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "AzureBlob",
                        "type": "DatasetReference",
                        "parameters": {
                            "container": "raw-test",
                            "folder": "test"
                        }
                    }
                ]
            }
        ]
    }
}

When we put a lookup before it and wrap it in a foreach, it stops working. With the not so helpful

 "errorCode": "400", "message": "Activity failed because an inner activity failed", "failureType": "UserError", "target": "ForEach" 

The lookup stored procedure [dbo].[adfdynamic] is not actually referred to in the foreach yet:

create proc adfdynamic as
select 'raw-test' container, 'test_a' folder, 'select 1 a, 2 b' 
UNION ALL    
select 'raw-test' container, 'test_b' folder, 'select 3 c, 2 d' 

So what I desired behaviour is:

  • one blob in raw-test@..myblob.../test_a/out.dsv with content {'a,b','1,2'}
  • one blob in raw-test@..myblob.../test_b/out.dsv with content {'c,d','3,2'}

sql dataset:

{
    "name": "AzureSql",
    "properties": {
        "linkedServiceName": {
            "referenceName": "Dest",
            "type": "LinkedServiceReference"
        },
        "type": "AzureSqlTable",
        "structure": [
            {
                "name": "CustomerKey",
                "type": "Int32"
            },
            {
                "name": "Name",
                "type": "String"
            }
        ],
        "typeProperties": {
            "tableName": "[dbo].[DimCustomer]"
        }
    }
}

blob dataset:

{
    "name": "AzureBlob",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureStorage1",
            "type": "LinkedServiceReference"
        },
        "parameters": {
            "container": {
                "type": "String"
            },
            "folder": {
                "type": "String"
            }
        },
        "type": "AzureBlob",
        "typeProperties": {
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "treatEmptyAsNull": false,
                "skipLineCount": 0,
                "firstRowAsHeader": false
            },
            "fileName": {
                "value": "@{dataset().folder}/out.dsv",
                "type": "Expression"
            },
            "folderPath": {
                "value": "@dataset().container",
                "type": "Expression"
            }
        }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

and the non-working dynamic pipeline:

{
    "name": "Copy",
    "properties": {
        "activities": [
            {
                "name": "ForEach",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Lookup",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "typeProperties": {
                    "items": {
                        "value": "@activity('Lookup').output.value",
                        "type": "Expression"
                    },
                    "activities": [
                        {
                            "name": "Copy",
                            "type": "Copy",
                            "policy": {
                                "timeout": "7.00:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "typeProperties": {
                                "source": {
                                    "type": "SqlSource",
                                    "sqlReaderQuery": {
                                        "value": "select 1 a, 2 b from dest",
                                        "type": "Expression"
                                    }
                                },
                                "sink": {
                                    "type": "BlobSink"
                                },
                                "enableStaging": false,
                                "dataIntegrationUnits": 0
                            },
                            "inputs": [
                                {
                                    "referenceName": "AzureSql",
                                    "type": "DatasetReference"
                                }
                            ],
                            "outputs": [
                                {
                                    "referenceName": "AzureBlob",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "container": {
                                            "value": "raw-test",
                                            "type": "Expression"
                                        },
                                        "folder": {
                                            "value": "folder",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "name": "Lookup",
                "type": "Lookup",
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                }
            }
        ]
    }
}

Apologies about the formatting. too much code in one message?

  1. In you lookup activity, please check whether your firstRowOnly property. Is it false or True? By default, it is true.
  2. In the UI, you could set a breakpoint to debug your lookup activity. Then you could see whether the output is what you want.

在此处输入图片说明

Not exactly an answer to your question, but something I did to make life simpler was to create a Dataset called GenericBlob. This had 2 parameters container and path. This may help simplify what you're doing. I too used to have 20 blob datasets, now I have one ... (this is assuming the blobs are in the same storage account).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM