简体   繁体   English

Java如何在Apache Beam的KV实例中区分两个键?

[英]How does java differentiate two keys in KV instance in apache beam?

Version of apache beam is 2.15.0 . apache梁的版本是2.15.0

In this code , class Airport is used as Key for KV instance and at the end, mean is calculated for each Airport instance . 在此代码中 ,将Airport类用作KV实例的Key,最后,为每个Airport实例计算均值

c.output(KV.of(stats.airport, stats.timestamp));

But how does apache beam internally compare two keys and return if two instances are same or not ? 但是, apache Beam如何在内部比较两个键并在两个实例相同或不同的情况下返回? Are two instances treated same if all the class members has same values ? 如果所有类成员都具有相同的值,是否将两个实例视为相同? Document does not mention about the comparison for two keys. 文件没有提及两个键的比较。

I appreciate if someone can help me out with understanding. 如果有人可以帮助我加深理解,我将不胜感激。

This is actually explained in the GroupByKey transform docs , which is the operation done under the hood for a Mean aggregation: 这实际上在GroupByKey转换文档中进行了说明,这是在GroupByKey进行的Mean聚合操作:

Two keys of type K are compared for equality not by regular Java Object.equals ( java.lang.Object ), but instead by first encoding each of the keys using the Coder of the keys of the input PCollection , and then comparing the encoded bytes. 不是通过常规Java Object.equalsjava.lang.Object )而是通过首先使用输入PCollection的键的Coder对每个键进行Coder ,然后比较已编码的字节来比较类型K两个键是否相等。 This admits efficient parallel evaluation. 这承认了有效的并行评估。 Note that this requires that the Coder of the keys be deterministic (see Coder.verifyDeterministic() ). 请注意,这要求键的Coder是确定性的(请参阅Coder.verifyDeterministic() )。 If the key Coder is not deterministic, an exception is thrown at pipeline construction time. 如果关键Coder器不是确定性的,则在管道构建时会引发异常。

Note that Mean uses Combine.PerKey which is a 'shorthand' for GroupByKey + Combine.GroupedValues . 请注意, Mean使用Combine.PerKey ,这是GroupByKey + Combine.GroupedValues的“简写”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Apache Beam 中一起使用 MapElements 和 KV? - How do I use MapElements and KV in together in Apache Beam? 如何在处理PCollection中的元素时将元素发布到kafka主题 <KV<String,String> &gt;在apache梁中的ParDo功能? - How to publish elements to a kafka topic while processing the elements in the PCollection<KV<String,String>> in ParDo function in apache beam? 如何区分两个 PCollection Apache Beam - How to diff two PCollection Apache Beam 如何使用 KafkaIO 和 Apache 使用 Java 设置 AvroCoder - How to set AvroCoder with KafkaIO and Apache Beam with Java 使用 Java 进行 Apache Beam 编码 - Apache Beam Coding with Java Java 如何区分单链表和双链表? - How does Java differentiate singly and doubly linkedlists? java如何区分Lambda中的Callable和Runnable? - How does java differentiate Callable and Runnable in a Lambda? 如何在Java中使用Apache Beam ParDo函数读取JSON文件 - How to read a JSON file using Apache beam parDo function in Java 如何在 Apache Beam Java SDK 中的多列上使用aggregateField()? - How to use aggregateField() over multiple columns in Apache Beam Java SDK? 如何使用java中的Apache Beam直达写入BigTable? - How to write to BigTable using Apache Beam direct-runner in java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM