简体   繁体   English

如何从 Apache Beam 的 Pcollection 中获取一个元素

[英]How to get one element from a Pcollection in Apache Beam

considering a list of Pcollection:考虑 Pcollection 列表:

[{'id':'1','name':'Tom','country':'USA'},{'id':'2','name':'Oprah','country':'USA'}....] [{'id':'1','name':'Tom','country':'USA'},{'id':'2','name':'Oprah','country':'USA '}....]

I want to count the occurrence of every country.我想统计每个国家出现的次数。 The result should be something like this:结果应该是这样的:

{'USA':2, 'Tunisia':3, 'France':1} {'美国':2,'突尼斯':3,'法国':1}

Check beam.combiners.ToDict , which produces a dict as a result;检查beam.combiners.ToDict ,结果产生一个字典;

Example:例子:

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

p = beam.Pipeline(options=PipelineOptions()) 

(p  
| "create pcoll" >> beam.Create([{'id':'1','name':'Tom','country':'USA'},
                                                {'id':'2','name':'Oprah','country':'USA'},
                                                {'id':'2','name':'Oprah','country':'Italy'}])
| "map" >> beam.Map(lambda x: (x['country']))
| "count" >> beam.combiners.Count.PerElement()
| "toDict" >> beam.combiners.ToDict()
| "print" >> beam.Map(print)
) 

p.run()

# Result {'USA': 2, 'Italy': 1}

This is similar to the word count example.这类似于字数统计示例。 You can find an implementation in python here - https://beam.apache.org/get-started/wordcount-example/您可以在此处找到 Python 中的实现 - https://beam.apache.org/get-started/wordcount-example/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Beam-Python:如何通过累积获取PCollection的前10个元素? - Apache Beam - Python : How to get the top 10 elements of a PCollection with Accumulation? 如何从PCollection Apache Beam Python创建N个元素组 - How to create groups of N elements from a PCollection Apache Beam Python 如何计算Apache Beam中PCollection的元素数量 - How to calculate the number of elements of a PCollection in Apache beam Apache 光束列表到 PCollection - Apache beam list to PCollection 在 Apache Beam 管道中同时在一个 PCollection 上应用多个 PTransform - Applying multiple PTransforms on one PCollection simultaneously in Apache Beam pipeline 从 apache beam pcollection 返回什么以写入 bigquery - What to return from apache beam pcollection to write to bigquery Apache 中的分支和合并 pcollection 列表来自公共输入 - Branching and Merging pcollection list in Apache Beam from common input Select PCollection 中的一些列(Apache Beam、Python) - Select some columns from PCollection (Apache Beam, Python) python中的Apache Beam:如何在另一个PCollection上重用完全相同的转换 - Apache Beam in python: How to reuse exactly the same transform on another PCollection 如何在单元测试时正确测试 pcollection 长度 Apache Beam - How to properly test pcollection length when unit testing Apache Beam
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM