简体   繁体   中英

Apache Beam/Dataflow- passing file path to ReadFromText

I have a use case where I want to read the filename from a metadata.table, I have written a pipeline function to read the metadata.table, but I am not sure how can I pass this information to ReadFromText as it only takes string as input, Is it possible to assign this value to ReadFromText(). Please suggest some workarounds or ideas how to achieve this, Thanks

code: pipeline | 'Read from a File' >> ReadFromText(I want to pass the file path here?, skip_header_lines=1)

Note: There will be various folders and files in storage, files are in csv format, but in my use case I can't directly pass the storage location or filename to file path in ReadFromText. I want to read it from metadata and pass the value. Hope I am clear, Thanks

I don't understand why you need to read the metadata. If you want to read all the files inside a folder, you can just provide a blob. This solution working in python, not sure about java.

p|readfromtext("./folder/*.csv") 

"*" is the blob here, which allows pipeline to read all the patterns matching.csv. You can also add something at the starting.

What you want is textio.ReadAllFromText which reads from a PCollection instead of taking a string directly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM