繁体   English   中英

value join不是org.apache.spark.rdd.RDD的成员

[英]value join is not a member of org.apache.spark.rdd.RDD

我收到此错误:

value join is not a member of 
    org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
        forSome { type _0 <: (String, Double) }]

我找到的唯一建议是import org.apache.spark.SparkContext._我已经这样做了。

我究竟做错了什么?

编辑:更改代码以消除forSome (即,当对象具有类型org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)]))) ))解决问题。 这是Spark中的一个错误吗?

joinorg.apache.spark.rdd.PairRDDFunctions的成员。 那么为什么隐式类不会触发?

scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
 --- because ---
argument expression's type is not compatible with formal parameter type;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)
               ^
<console>:25: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(K, V)]
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)

我确信错误信息对其他人来说都是清楚的,但只是为了我自己的慢自我,我们试着理解它。 PairRDDFunctions有两个类型参数, KV 您的forSome适用于整个对,因此无法将其拆分为单独的KV类型。 没有KVRDD[(K, V)]将等于您的RDD类型。

但是,您可以将forSome仅应用于键,而不是整个对。 现在加入工作,因为这种类型可以分为KV

scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26

考虑将2个Spark RDD连接在一起..

rdd1.first说, rdd1.first(Int, Int, Float) = (1,957,299.98)rdd2.first类似于(Int, Int) = (25876,1) ,其中连接应该发生在1日来自两个RDD的字段。

scala> rdd1.join(rdd2)---导致错误:**:错误:值join不是org.apache.spark.rdd.RDD [(Int,Int,Float)]的成员

原因


两个RDD都应该是键值对的形式。

在这里,rdd2 - 以(1,957,299.98)的形式 - 不服从这个规则..而rdd1 - 以(25876,1)的形式 - 确实如此。

解析度


转换第一RDD的从输出(1,957,299.98)到一个键-值对的形式(1,(957,299.98))与RDD2接合之前,如下所示:

scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) -- modified RDD

scala> rdd1KV.join(rdd2) -- join successful :)
res**: (Int, (Int, Float)) = (1,(957,299.98))

顺便说一句,join是org.apache.spark.rdd.PairRDDFunctions的成员。 因此,请确保在Eclipse或IDE上将其导入到您希望运行代码的任何位置。

文章也在我的博客上:

https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM