[英]How to write files to google bucket using Apache Beam dynamically?
I'm trying to write a few files to google bucket using Apache Beam, but the file location and folder names are generated from first index in the file, so how do I create this dictionary and write my files to it?我正在尝试使用 Apache Beam 将一些文件写入 google 存储桶,但文件位置和文件夹名称是从文件中的第一个索引生成的,那么如何创建此字典并将我的文件写入其中?
metadata = (data_from_test |'CSVConversionMeta' >> beam.ParDo(WriteToCSVmeta())|'Writing To File' >> beam.io.WriteToText('gs://tester1212/CIK/YEAR/FILING/metadata.csv'))
So, this is the code where I write the file, but I want the YEAR to get fetched from the csv and create a folder during runtime.所以,这是我编写文件的代码,但我希望从 csv 中获取 YEAR 并在运行时创建一个文件夹。
If your date is known before the Pipeline start, you can do this如果您的日期在流水线开始之前已知,您可以执行此操作
year="2020"
metadata = (data_from_test |'CSVConversionMeta' >> beam.ParDo(WriteToCSVmeta())|'Writing To File' >> beam.io.WriteToText('gs://tester1212/CIK/{}/FILING/metadata.csv'.format(year)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.