[英]Can not union two CassandraJavaRDD<CassandraRow> in Spark
As there is a limit to query the data from Cassandra, I'm trying to read the data batch by batch using Spark and storing it in a RDD. 由于从Cassandra查询数据受到限制,因此我尝试使用Spark逐批读取数据并将其存储在RDD中。
And then I'm adding all the RDD , using union function. 然后,我使用并集函数添加所有的RDD。
Here is my code . 这是我的代码。
private void getDataFromCassandra(JavaSparkContext sc) {
CassandraJavaRDD<CassandraRow> cassandraRDD = null ;
CassandraJavaRDD<CassandraRow> cassandraRDD2 = null;
While(Some Condition)
cassandraRDD = CassandraJavaUtil
.javaFunctions(sc).cassandraTable("dmp", "table").select("abc", "xyz")
.where("pid IN ('" + sb + "')");
if(cassandraRDD2==null){
cassandraRDD2=cassandraRDD;
}
else{
cassandraRDD2 = cassandraRDD2.union(cassandraRDD);
}
}
} }
But in the union I'm getting the following error. 但是在工会中,我遇到了以下错误。
Type mismatch: cannot convert from JavaRDD to CassandraJavaRDD 类型不匹配:无法从JavaRDD转换为CassandraJavaRDD
Though the Both the RDD's is of similar type. 尽管两者都是相似的类型。
So 1) shall I apply a Cast as 所以1)我应否将演员表
cassandraRDD2 = (CassandraJavaRDD<CassandraRow>) cassandraRDD2.union(cassandraRDD);
2) Or Change the Type of one of the RDD to JavaRDD 2)或将RDD之一的类型更改为JavaRDD
The problem happens because according to the docs : 发生问题是因为根据docs :
Method: union(JavaRDD other) Return the union of this RDD and another one.
方法: union(JavaRDD other)返回此RDD与另一个的联合。
Return Value : JavaRDD
返回值 :JavaRDD
And therefore the mismatch. 因此不匹配。
Because according to this : 因为按照这个 :
public class CassandraJavaRDD<R> extends JavaRDD<R> {
...
}
The CassandraJavaRDD
class extends JavaRDD
so you can use: CassandraJavaRDD
类扩展了JavaRDD
因此您可以使用:
JavaRDD<CassandraRow> cassandraRDD = null;
JavaRDD<CassandraRow> cassandraRDD2 = null;
and therefore the return value of the union()
method will match its type. 因此,
union()
方法的返回值将与其类型匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.