How to get one element from a Pcollection in Apache Beam

Question

considering a list of Pcollection:

[{'id':'1','name':'Tom','country':'USA'},{'id':'2','name':'Oprah','country':'USA'}....]

I want to count the occurrence of every country. The result should be something like this:

{'USA':2, 'Tunisia':3, 'France':1}

Answer 1

Check beam.combiners.ToDict , which produces a dict as a result;

Example:

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

p = beam.Pipeline(options=PipelineOptions()) 

(p  
| "create pcoll" >> beam.Create([{'id':'1','name':'Tom','country':'USA'},
                                                {'id':'2','name':'Oprah','country':'USA'},
                                                {'id':'2','name':'Oprah','country':'Italy'}])
| "map" >> beam.Map(lambda x: (x['country']))
| "count" >> beam.combiners.Count.PerElement()
| "toDict" >> beam.combiners.ToDict()
| "print" >> beam.Map(print)
) 

p.run()

# Result {'USA': 2, 'Italy': 1}

Answer 2

This is similar to the word count example. You can find an implementation in python here - https://beam.apache.org/get-started/wordcount-example/

How to get one element from a Pcollection in Apache Beam

Question

2 answers

solution1
1 2020-03-31 14:13:43

solution2
0 2020-03-31 13:53:20

How to get one element from a Pcollection in Apache Beam

Question

2 answers

solution1 1 2020-03-31 14:13:43

solution2 0 2020-03-31 13:53:20

solution1
1 2020-03-31 14:13:43

solution2
0 2020-03-31 13:53:20