considering a list of Pcollection:
[{'id':'1','name':'Tom','country':'USA'},{'id':'2','name':'Oprah','country':'USA'}....]
I want to count the occurrence of every country. The result should be something like this:
{'USA':2, 'Tunisia':3, 'France':1}
Check beam.combiners.ToDict , which produces a dict as a result;
Example:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
p = beam.Pipeline(options=PipelineOptions())
(p
| "create pcoll" >> beam.Create([{'id':'1','name':'Tom','country':'USA'},
{'id':'2','name':'Oprah','country':'USA'},
{'id':'2','name':'Oprah','country':'Italy'}])
| "map" >> beam.Map(lambda x: (x['country']))
| "count" >> beam.combiners.Count.PerElement()
| "toDict" >> beam.combiners.ToDict()
| "print" >> beam.Map(print)
)
p.run()
# Result {'USA': 2, 'Italy': 1}
This is similar to the word count example. You can find an implementation in python here - https://beam.apache.org/get-started/wordcount-example/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.