简体   繁体   English

关于Tuple2的RDD的Apache Spark forEach:为我的RDD中的所有Tuple2返回一个值

[英]Apache Spark forEach on RDDs of Tuple2 : returns one Value for all the Tuple2s in my RDD

I have this code that's been giving me an unexpectedly wrong result that i couldn't solve : 我有这个代码给了我一个我无法解决的意外错误的结果:

// A method that calls the collectDataRDD(logValues, rowData) method :

// ....
// my collectDataRDD(Values, rowData) method : 

The problem is that when i try to run methods like getStatus() or getValidationDate() on Data Objects which are the values of my Tuple2, it only gives one output for all the objects in my JavaRDD which is wrong, because the JavaRDD contains multiple different Objects. 问题是,当我尝试在数据对象上运行getStatus()或getValidationDate()等方法时,它是我的Tuple2的值,它只为我的JavaRDD中的所有对象提供一个输出,这是错误的,因为JavaRDD包含多个不同的对象。 However when i checked the keys of my tuple2 it gave me correct results. 但是,当我检查我的tuple2的键时,它给了我正确的结果。

I have tried everything and still couldn't figure it out. 我已经尝试了一切,仍然无法弄明白。 Can anyone please help me solve this. 任何人都可以帮我解决这个问题。 THanks a lot in advance. 提前收了很多。

Verify if 验证是否

ticketsrdd.foreach((Tuple2<String, Data> rowData) -> {
    collectLogDataRDD(logValues, rowData);
}

is what you want to do. 是你想要做的。 This function is called for each element one by one and Tuple2 will have only one entry in that case. 在每种元素中逐个调用此函数,在这种情况下,Tuple2只有一个条目。

JavaRDD<Tuple2<String, Data>> ticketsrdd=TransformToRDD.transformToRDD(transformer.transform());
DataStore.setData(tickets);

Will be kind of Map<String, Tuple2> . 将是一种Map<String, Tuple2> And your Tuple2 will have one key as String and one value as Data. 而你的Tuple2将有一个键作为String,一个值作为Data。

Now when you say Data ticket = rowData._2 ; 现在当你说Data ticket = rowData._2 ; you are getting 1 Data object from 1 tuple. 你从1元组获得1个数据对象。 So for each tuple in ticketsrdd its going to called collectLogDataRDD . 因此对于ticketsrdd每个元组, ticketsrdd调用collectLogDataRDD

Lets say ticketsrdd has 100 element then its going to call collectLogDataRDD 100 times, and each time ticket.getStatus(); 让我们说ticketrdd有100个元素,然后它将调用collectLogDataRDD 100次,每次ticket.getStatus(); will also be be call. 也将被打电话。

This is what code is doing. 这就是代码正在做的事情。 What different behavior do you expect? 你期望有什么不同的行为?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM