简体   繁体   English

如何使用 Apache Beam 动态地将文件写入谷歌存储桶?

[英]How to write files to google bucket using Apache Beam dynamically?

I'm trying to write a few files to google bucket using Apache Beam, but the file location and folder names are generated from first index in the file, so how do I create this dictionary and write my files to it?我正在尝试使用 Apache Beam 将一些文件写入 google 存储桶,但文件位置和文件夹名称是从文件中的第一个索引生成的,那么如何创建此字典并将我的文件写入其中?

metadata = (data_from_test |'CSVConversionMeta' >> beam.ParDo(WriteToCSVmeta())|'Writing To File' >> beam.io.WriteToText('gs://tester1212/CIK/YEAR/FILING/metadata.csv'))

So, this is the code where I write the file, but I want the YEAR to get fetched from the csv and create a folder during runtime.所以,这是我编写文件的代码,但我希望从 csv 中获取 YEAR 并在运行时创建一个文件夹。

If your date is known before the Pipeline start, you can do this如果您的日期在流水线开始之前已知,您可以执行此操作

year="2020"
metadata = (data_from_test |'CSVConversionMeta' >> beam.ParDo(WriteToCSVmeta())|'Writing To File' >> beam.io.WriteToText('gs://tester1212/CIK/{}/FILING/metadata.csv'.format(year)))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Apache Beam和DataFlowRunner将数据写入Google MemoryStore(Redis) - How to Write the data into Google MemoryStore(Redis) using Apache Beam with DataFlowRunner 通过使用Google Cloud Dataflow中的Python SDK推断架构来读取和编写avro文件 - Apache Beam - Read and write avro files by inferring schema using Python SDK in Google Cloud Dataflow - Apache Beam 一旦使用 apache 光束 sdk 在 Google Cloud 中创建数据流作业,我们可以从云存储桶中删除 tmp 文件吗? - Once dataflow job is created in Google Cloud using apache beam sdk, can we delete the tmp files from cloud storage bucket? Apache beam fileio 写入压缩文件 - Apache beam fileio write compressed files 如何使用 Apache Beam Python 将输出写入动态路径 - How to write output to a dynamic path using Apache Beam Python 在 Apache Beam DoFn(谷歌数据流)中下载和上传文件到 GCP 存储桶 - downloading and uploading file to GCP bucket in Apache Beam DoFn(Google Dataflow) 使用 Apache Beam(GCP 数据流)写入 Kafka - Write To Kafka using Apache Beam (GCP Dataflow) Apache Beam 使用 Go 写入 PubSub 消息 - Apache Beam Write PubSub messages using Go Apache Beam 将数据从 Kafka 流式传输到 GCS Bucket(不使用 pubsub) - Apache Beam Streaming data from Kafka to GCS Bucket (Not using pubsub) Java:使用 apache 光束管道读取存储在存储桶中的 excel 文件 - Java: read excel file stored in a bucket using apache beam pipeline
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM