简体繁体中英

matching the columns in a source file with sink table columns to make sure they match using Azure Data Factory

原文 2021-09-27 02:53:35 8 2 azure-data-factory/ azure-data-factory-2/ azure-data-flow

I have an Azure Data factory trigger that is fired off when a file is placed in blob storage, this trigger will start pipeline execution and pass the file name to the data flow activity. I would like to make sure that all the column names from the header row in the file are in the sink table. There is an identity column in the sink table that should not be in the comparison. Not sure how to tackle this task, I've read about the 'derived column' activity, is that the route I should take?

2 answers

Strategy:

Use two ADF pipelines, one to get a list of all files and another one to process each file copying its content to a specific SQL table.

Setup:

I've created 4 CSV files, following the pattern you need: “[CustomerID] [TableName] [FileID].csv” and 4 SQL tables, one for each type of file.

A_inventory_0001.csv: inventory records for customer A, to be inserted into the SQL table “A_Inventory”.
A_sales_0003.csv: sales records for customer A, to be inserted into the SQL table “A_Sales”.
B_inventory_0002.csv: inventory records for customer B, to be inserted into the SQL table “B_Inventory”.
B_sales_0004.csv: sales records for customer B, to be inserted into the SQL table “B_Sales”

Linked Services

In Azure Data Factory, the following linked services were create using Key Vault (Key Vault is optional).

Datasets

The following datasets were created. Note we have created some parameters to allow the pipeline to specify the source file and the destination SQL table.

The dataset “AzureSQLTable” has a parameter to specify the name of the destination SQL table.

The dataset “DelimitedTextFile” has a parameter to specify the name of the source CSV file.

The dataset “DelimitedTextFiles” has no parameter because it will be used to list all files from source folder.

Pipelines

The first pipeline “Get Files” will get the list of CSV files from source folder (Get Metadata activity), and then, for each file, call the second pipeline passing the CSV file name as a parameter.

Inside the foreach loop, there is a call to the second pipeline “Process File” passing the file name as a parameter.

The second pipeline has a parameter “pFileName” to receive the name of the file to be processed and a variable to calculate the name of the destination table based on the file name.

The first activity is to use a split in the file name to extract the parts we need to compose the destination table name. In the expression bellow we are splitting the file name using the “__” separator and then using the first and second parts to compose the destination table name. @concat(string(split(pipeline().parameters.pFileName, '_')[0]),'_',string(split(pipeline().parameters.pFileName, '_')[10]))

The second activity will then copy the file from the source “pFileName” to the desnation table “vTableName” using dynamic mapping, ie not adding specific column names as this will be dynamic.

The files I used in this example and the ADF code are available here: https://github.com/diegoeick/stack-overflow/tree/main/69340699

I hope this will resolve your issue.

In case you still need to save the CustomerID and FileID in the database tables, you can use the dynamic mapping and use the available parameters (filename) and create a json with the dynamic mapping in the mapping tab of your copy activity. You can find more details here: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#parameterize-mapping

You can select or filter which columns reside in sink dataset or table by using " Field mapping ". You can optionally use " derived columns " transformation, however in the " sink transformation " you will have this by default and is set to " Auto mapping ". Here you can add or remove which columns are written to sink.

In the below example the column " id " can be assumed as similar to " Identity " column in your table. Assuming all the files have same columns:

Once you have modified as per your need, you can confirm the same from the " inspect " tab before run.

Azure Data Factory Alter Row | Skip Columns while matching Data Between Source and Sink

How to Add default date in json for copy activity in azure data factory(adf) while dynamic mapping of columns between SQL source and sink

How to limit/filter source columns of a CSV file while importing to Azure data factory

Combine columns form multiple csv files using data flow in azure data factory

Data Factory Copy Data Source as Body of Sink Object

Azure Data Factory schema mapping not working with SQL sink

Parquet generated using Azure data factory - not able to create table in Hive

Azure Data Factory DataFlow Error: Key partitioning does not allow computed columns

Azure Data Factory Dataflow Transpose Array<Struct> Column to Single Row with Index Suffixed Columns

Why isn't there an option to upsert data in Azure Data Factory inline sink

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Azure Data Factory Alter Row | Skip Columns while matching Data Between Source and Sink How to Add default date in json for copy activity in azure data factory(adf) while dynamic mapping of columns between SQL source and sink How to limit/filter source columns of a CSV file while importing to Azure data factory Combine columns form multiple csv files using data flow in azure data factory Data Factory Copy Data Source as Body of Sink Object Azure Data Factory schema mapping not working with SQL sink Parquet generated using Azure data factory - not able to create table in Hive Azure Data Factory DataFlow Error: Key partitioning does not allow computed columns Azure Data Factory Dataflow Transpose Array<Struct> Column to Single Row with Index Suffixed Columns Why isn't there an option to upsert data in Azure Data Factory inline sink

Related Tags

matching the columns in a source file with sink table columns to make sure they match using Azure Data Factory

Question

2 answers

solution1
0 2021-09-27 08:56:36

solution2
0 ACCPTED 2021-09-27 12:49:25

matching the columns in a source file with sink table columns to make sure they match using Azure Data Factory

Question

2 answers

solution1 0 2021-09-27 08:56:36

solution2 0 ACCPTED 2021-09-27 12:49:25

solution1
0 2021-09-27 08:56:36

solution2
0 ACCPTED 2021-09-27 12:49:25