简体   繁体   中英

How to split csv file based on column value in cloud dataflow python sdk

I would like to read csv file from GCS using the ReadFromText and like to split into multiple file based on column values.

See sample data below 
Col1    Col2    Col3
Value1  data    date
value2  data    date_1
Value3  data    date_2
Value4  data    date_3
Value5  data    date

I want to create the folder namely date,date_1..3 and file name prefix with date corresponding data should be load to the file.

Process each element to generate a KV, where the Key becomes the metadata about the location you would like the Value to land. Then look at using dynamic destinations to write out the files.

An example of using the Key with FileIO is in this answer on SO .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM