简体   繁体   中英

How does java differentiate two keys in KV instance in apache beam?

Version of apache beam is 2.15.0 .

In this code , class Airport is used as Key for KV instance and at the end, mean is calculated for each Airport instance .

c.output(KV.of(stats.airport, stats.timestamp));

But how does apache beam internally compare two keys and return if two instances are same or not ? Are two instances treated same if all the class members has same values ? Document does not mention about the comparison for two keys.

I appreciate if someone can help me out with understanding.

This is actually explained in the GroupByKey transform docs , which is the operation done under the hood for a Mean aggregation:

Two keys of type K are compared for equality not by regular Java Object.equals ( java.lang.Object ), but instead by first encoding each of the keys using the Coder of the keys of the input PCollection , and then comparing the encoded bytes. This admits efficient parallel evaluation. Note that this requires that the Coder of the keys be deterministic (see Coder.verifyDeterministic() ). If the key Coder is not deterministic, an exception is thrown at pipeline construction time.

Note that Mean uses Combine.PerKey which is a 'shorthand' for GroupByKey + Combine.GroupedValues .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM