繁体   English   中英

如何在python apache梁中展平多个Pcollections

[英]How to flatten multiple Pcollections in python apache beam

如何实现以下逻辑位于https://beam.apache.org/documentation/pipelines/design-your-pipeline/

//merge the two PCollections with Flatten//me 
PCollectionList<String> collectionList = PCollectionList.of(aCollection).and(bCollection);
PCollection<String> mergedCollectionWithFlatten = collectionList
    .apply(Flatten.<String>pCollections());

// continue with the new merged PCollection
mergedCollectionWithFlatten.apply(...);

因此,多个PCollections可以组合成apache beam python api中的单个PCollection?

您也可以使用Flatten变换。 例如:

data1 = ['one', 'two', 'three']
data2 = ['four','five']

input1 = p | 'Create PCollection1' >> beam.Create(data1)
input2 = p | 'Create PCollection2' >> beam.Create(data2)

merged = ((input1,input2) | 'Merge PCollections' >> beam.Flatten())

合并的PCollection将包含:

INFO:root:one
INFO:root:two
INFO:root:three
INFO:root:four
INFO:root:five

完整代码:

import argparse, logging

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions


class LogFn(beam.DoFn):
  """Prints information"""
  def process(self, element):
    logging.info(element)
    return element


def run(argv=None):
  parser = argparse.ArgumentParser()
  known_args, pipeline_args = parser.parse_known_args(argv)

  pipeline_options = PipelineOptions(pipeline_args)
  pipeline_options.view_as(SetupOptions).save_main_session = True
  p = beam.Pipeline(options=pipeline_options)

  data1 = ['one', 'two', 'three']
  data2 = ['four','five']

  input1 = p | 'Create PCollection1' >> beam.Create(data1)
  input2 = p | 'Create PCollection2' >> beam.Create(data2)

  merged = ((input1,input2) | 'Merge PCollections' >> beam.Flatten())

  merged | 'Check Results' >> beam.ParDo(LogFn())

  result = p.run()
  result.wait_until_finish()

if __name__ == '__main__':
  logging.getLogger().setLevel(logging.INFO)
  run()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM