![](/img/trans.png)
[英]Writing Apache Beam Tagged Output (Dataflow runner) to different BQ tables
[英]Apache beam write each tagged output to separate file
我根據輸入數據元素之一(日期)標記輸入元素。
class TagElementsWithDate(beam.DoFn):
def process(self, element):
dt = element['date'].replace('-', '')[:6]
yield pvalue.TaggedOutput(dt, element)
input_data = p | 'Read Input' >> beam.io.Read(beam.io.BigQuerySource(query='select id, date from `project.dataset.tablename`', use_standard_sql=True))
tagged_data = input_data | 'tag data' >> beam.ParDo(TagElementsWithDate()).with_outputs()
tagged_data 是 DoOutputsTuple。 我正在尋找迭代這個並將每個標記的數據寫入一個單獨的文件。
您需要編寫自己的 DoFn。 就像是
from apache_beam.io.textio import _TextSink
class WriteEachKeyToText(beam.DoFn):
def __init__(self, file_path_prefix=str):
super().__init__()
self.file_path_prefix = file_path_prefix
def process(self, kv):
key = kv[0]
elements = kv[1]
sink = _TextSink(self.file_path_prefix, file_name_suffix=f"{key}.json")
writer = sink.open_writer("prefix", self.file_path_prefix)
for e in elements: # values
writer.write(e)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.