[英]How does java differentiate two keys in KV instance in apache beam?
Version of apache beam is 2.15.0
. apache梁的版本是2.15.0
。
In this code , class Airport is used as Key for KV instance and at the end, mean is calculated for each Airport instance . 在此代码中 ,将Airport类用作KV实例的Key,最后,为每个Airport实例计算均值 。
c.output(KV.of(stats.airport, stats.timestamp));
But how does apache beam internally compare two keys and return if two instances are same or not ? 但是, apache Beam如何在内部比较两个键并在两个实例相同或不同的情况下返回? Are two instances treated same if all the class members has same values ? 如果所有类成员都具有相同的值,是否将两个实例视为相同? Document does not mention about the comparison for two keys. 文件没有提及两个键的比较。
I appreciate if someone can help me out with understanding. 如果有人可以帮助我加深理解,我将不胜感激。
This is actually explained in the GroupByKey
transform docs , which is the operation done under the hood for a Mean
aggregation: 这实际上在GroupByKey
转换文档中进行了说明,这是在GroupByKey
进行的Mean
聚合操作:
Two keys of type
K
are compared for equality not by regular JavaObject.equals
(java.lang.Object
), but instead by first encoding each of the keys using theCoder
of the keys of the inputPCollection
, and then comparing the encoded bytes. 不是通过常规JavaObject.equals
(java.lang.Object
)而是通过首先使用输入PCollection
的键的Coder
对每个键进行Coder
,然后比较已编码的字节来比较类型K
两个键是否相等。 This admits efficient parallel evaluation. 这承认了有效的并行评估。 Note that this requires that theCoder
of the keys be deterministic (seeCoder.verifyDeterministic()
). 请注意,这要求键的Coder
是确定性的(请参阅Coder.verifyDeterministic()
)。 If the keyCoder
is not deterministic, an exception is thrown at pipeline construction time. 如果关键Coder
器不是确定性的,则在管道构建时会引发异常。
Note that Mean
uses Combine.PerKey
which is a 'shorthand' for GroupByKey
+ Combine.GroupedValues
. 请注意, Mean
使用Combine.PerKey
,这是GroupByKey
+ Combine.GroupedValues
的“简写”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.