简体   繁体   中英

ADF / Dataflow - Convert Multiple CSV to Parquet

In ADLS Gen2, TextFiles folder has 3 CSV files. Column names are different in each file.

We need to convert all 3 CSV files to 3 parquet files and put it in ParquetFiles folder

I tried to use Copy Activity and it fails because the column names have empty space in it and parquet files doesn't allow it

To remove spaces, I used Data flow: Source -> Select (replace space by underscore in col name) and sink. This worked for a single file. When I tried to do it for all 3 files, it tries to merge 3 files and generates single file with incorrect data.

How to solve this, mainly removing spaces from column names in all files. What would be the other options here?

管道:ForEach 活动(循环文件夹中的 CSV 文件并将当前迭代项作为参数发送到数据流)-> 数据流活动,源指向该文件夹(参数化源路径中的文件名)

I created 2 datasets, one in csv with wildcard format, the other in parquet. I used the Data Copy Activity using the parquet data set as sink and csv data set as source. I set the copy behavior to Merge files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM