[英]How to get one element from a Pcollection in Apache Beam
considering a list of Pcollection:考虑 Pcollection 列表:
[{'id':'1','name':'Tom','country':'USA'},{'id':'2','name':'Oprah','country':'USA'}....] [{'id':'1','name':'Tom','country':'USA'},{'id':'2','name':'Oprah','country':'USA '}....]
I want to count the occurrence of every country.我想统计每个国家出现的次数。 The result should be something like this:
结果应该是这样的:
{'USA':2, 'Tunisia':3, 'France':1} {'美国':2,'突尼斯':3,'法国':1}
Check beam.combiners.ToDict , which produces a dict as a result;检查beam.combiners.ToDict ,结果产生一个字典;
Example:例子:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
p = beam.Pipeline(options=PipelineOptions())
(p
| "create pcoll" >> beam.Create([{'id':'1','name':'Tom','country':'USA'},
{'id':'2','name':'Oprah','country':'USA'},
{'id':'2','name':'Oprah','country':'Italy'}])
| "map" >> beam.Map(lambda x: (x['country']))
| "count" >> beam.combiners.Count.PerElement()
| "toDict" >> beam.combiners.ToDict()
| "print" >> beam.Map(print)
)
p.run()
# Result {'USA': 2, 'Italy': 1}
This is similar to the word count example.这类似于字数统计示例。 You can find an implementation in python here - https://beam.apache.org/get-started/wordcount-example/
您可以在此处找到 Python 中的实现 - https://beam.apache.org/get-started/wordcount-example/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.