简体繁体中英

Azure Data Factory DYNAMICALLY partition a csv/txt file based on rowcount

原文 2020-07-28 15:55:59 3 2 azure/ dynamic/ partitioning/ azure-data-factory-2

I am using azure dataflow to transform delimited files (csv/txt) to json. But I want to separate the files dynamically based on a max row count of 5,000 because I will not know the row count every time. So if I have a csv file with 10,000 rows the pipeline will output two equal json files, file1.json and file2.json. What is the best way to actually get the row count of my sources and the correct n number of partitions based on that row count within Azure Data Factory?

2 answers

We can't specify the row number to split the csv file. The most closest workaround is specify the partition of the Sink.

For example, I have a csv file contains 700 rows data. I successfully copy to two equal json files.

My source csv data in Blob storage:

Sink settings: each partition output a new file: json1.json and json2.json :

Optimize:

Partition operation: Set partition
Partition type: Dynamic partition
Number of partitions: 2 (means split the csv data to 2 partitions)
Stored ranges in columns: id (split based on the id column)

Run the Data flow and the csv file will split to two json files which each contains 350 rows data.

For your situation, the csv file with 10,000 rows the pipeline will output two equal json files(each contains 500 row data).

One way to achieve this is to use the mod or % operator.

To start with set a surrogate key on the CSV file or use any sequential key in the data.
Add a aggregate step with a group by clause that is your key % row count
Set the Aggregates function to collect()

Your output should now be a array of rows with the expected count in each.

How to convert txt to csv files in Azure Data Factory

Azure Data Factory - how to save the rowCount of copied records in a file in ADLS Gen2 or to send this information to email as notification?

Azure Data Factory to create an empty csv file

Azure data factory to split a CSV file into multiple CSV files?

Specify the filename of the CSV inside the zip file on Azure Data Factory Copy

Azure Data Factory removing spaces from column names of csv file

How to prepend a single line to a csv file in Azure Data Factory?

Delete CSV File from FTP server in Azure Data Factory

Azure Data Factory - CSV to Parquet - Changing file extension

save the output of 'Set Variable' activity into a csv file [Azure Data Factory]

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to convert txt to csv files in Azure Data Factory Azure Data Factory - how to save the rowCount of copied records in a file in ADLS Gen2 or to send this information to email as notification? Azure Data Factory to create an empty csv file Azure data factory to split a CSV file into multiple CSV files? Specify the filename of the CSV inside the zip file on Azure Data Factory Copy Azure Data Factory removing spaces from column names of csv file How to prepend a single line to a csv file in Azure Data Factory? Delete CSV File from FTP server in Azure Data Factory Azure Data Factory - CSV to Parquet - Changing file extension save the output of 'Set Variable' activity into a csv file [Azure Data Factory]

Related Tags

Azure Data Factory DYNAMICALLY partition a csv/txt file based on rowcount

Question

2 answers

solution1
0 2020-07-29 03:04:25

solution2
0 2020-11-23 22:35:14

Azure Data Factory DYNAMICALLY partition a csv/txt file based on rowcount

Question

2 answers

solution1 0 2020-07-29 03:04:25

solution2 0 2020-11-23 22:35:14

solution1
0 2020-07-29 03:04:25

solution2
0 2020-11-23 22:35:14