简体   繁体   中英

Azure DataFactory ForEach Copy activity is not iterating through but instead pulling all files in blob. Why?

I have a pipeline in DF2 that has to look at a folder in blob and process each of the 145 files sequentially into a database table. After each file has been loaded into the table, a stored procedure should be trigger that will check each record and either insert it, or update an existing record into a master table.

Looking online I feel as though I have tried every combination of "Get MetaData", "For Each", "LookUp" and "Assign Variable" activates that have been suggested but for some reason my Copy Data STILL picks up all files at the same time and runs 145 times.

Recently found a blog online that I followed to use "Assign Variable" as it will be useful for multiple file locations but it does not work for me. I need to read the files as CSVs to tables and not binary objects so therefore I think this is my issue.

    {
        "name": "BulkLoadPipeline",
        "properties": {
            "activities": [
                {
                    "name": "GetFileNames",
                    "type": "GetMetadata",
                    "policy": {
                        "timeout": "7.00:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "typeProperties": {
                        "dataset": {
                            "referenceName": "DelimitedText1",
                            "type": "DatasetReference",
                            "parameters": {
                                "fileName": "@item()"
                            }
                        },
                        "fieldList": [
                            "childItems"
                        ],
                        "storeSettings": {
                            "type": "AzureBlobStorageReadSetting"
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSetting"
                        }
                    }
                },
                {
                    "name": "CopyDataRunDeltaCheck",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "BuildList",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "typeProperties": {
                        "items": {
                            "value": "@variables('fileList')",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "WriteToTables",
                                "type": "Copy",
                                "policy": {
                                    "timeout": "7.00:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "typeProperties": {
                                    "source": {
                                        "type": "DelimitedTextSource",
                                        "storeSettings": {
                                            "type": "AzureBlobStorageReadSetting",
                                            "wildcardFileName": "*.*"
                                        },
                                        "formatSettings": {
                                            "type": "DelimitedTextReadSetting"
                                        }
                                    },
                                    "sink": {
                                        "type": "AzureSqlSink"
                                    },
                                    "enableStaging": false,
                                    "translator": {
                                        "type": "TabularTranslator",
                                        "mappings": [
                                            {
                                                "source": {
                                                    "name": "myID",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "myID",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col1",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col1",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col2",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col2",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col3",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col3",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col4",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col4",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "DW Date Created",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "DW_Date_Created",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "DW Date Updated",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "DW_Date_Updated",
                                                    "type": "String"
                                                }
                                            }
                                        ]
                                    }
                                },
                                "inputs": [
                                    {
                                        "referenceName": "DelimitedText1",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "fileName": "@item()"
                                        }
                                    }
                                ],
                                "outputs": [
                                    {
                                        "referenceName": "myTable",
                                        "type": "DatasetReference"
                                    }
                                ]
                            },
                            {
                                "name": "CheckDeltas",
                                "type": "SqlServerStoredProcedure",
                                "dependsOn": [
                                    {
                                        "activity": "WriteToTables",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "policy": {
                                    "timeout": "7.00:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "typeProperties": {
                                    "storedProcedureName": "[TL].[uspMyCheck]"
                                },
                                "linkedServiceName": {
                                    "referenceName": "myService",
                                    "type": "LinkedServiceReference"
                                }
                            }
                        ]
                    }
                },
                {
                    "name": "BuildList",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "GetFileNames",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('GetFileNames').output.childItems",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Create list from variables",
                                "type": "AppendVariable",
                                "typeProperties": {
                                    "variableName": "fileList",
                                    "value": "@item().name"
                                }
                            }
                        ]
                    }
                }
            ],
            "variables": {
                "fileList": {
                    "type": "Array"
                }
            }
        }
    }

The Details screen of the pipleline output shows the pipeline loops for the number of items in the blob but each time, the Copy Data and Stored Procedure are run for each file in the list at once as opposed to one at a time.

I feel like I am close to the answer but missing one vital part. Any help or suggestions are GREATLY appreciated.

Your payload is not correct.

  1. GetMetadata actvitiy should not use the same dataset with Copy Activity.
  2. GetMetadata activity should reference a dataset with a folder, the folder contains all file you want to deal with. but your dataset has 'filename' parameter.
  3. use the output of the getMetadata activity as the input of forEach activity. childItems

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM