简体   繁体   English

value join不是org.apache.spark.rdd.RDD的成员

[英]value join is not a member of org.apache.spark.rdd.RDD

I get this error: 我收到此错误:

value join is not a member of 
    org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
        forSome { type _0 <: (String, Double) }]

The only suggestion I found is import org.apache.spark.SparkContext._ I am already doing that. 我找到的唯一建议是import org.apache.spark.SparkContext._我已经这样做了。

What am I doing wrong? 我究竟做错了什么?

EDIT: changing the code to eliminate forSome (ie, when the object has type org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)]))) ) solved the problem. 编辑:更改代码以消除forSome (即,当对象具有类型org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)]))) ))解决问题。 Is this a bug in Spark? 这是Spark中的一个错误吗?

join is a member of org.apache.spark.rdd.PairRDDFunctions . joinorg.apache.spark.rdd.PairRDDFunctions的成员。 So why does the implicit class not trigger? 那么为什么隐式类不会触发?

scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
 --- because ---
argument expression's type is not compatible with formal parameter type;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)
               ^
<console>:25: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(K, V)]
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)

I'm sure that error message is clear to everyone else, but just for my own slow self let's try to make sense of it. 我确信错误信息对其他人来说都是清楚的,但只是为了我自己的慢自我,我们试着理解它。 PairRDDFunctions has two type parameters, K and V . PairRDDFunctions有两个类型参数, KV Your forSome is for the whole pair, so it cannot be split into separate K and V types. 您的forSome适用于整个对,因此无法将其拆分为单独的KV类型。 There are no K and V that RDD[(K, V)] would equal your RDD type. 没有KVRDD[(K, V)]将等于您的RDD类型。

However, you could have the forSome only apply to the key, instead of the whole pair. 但是,您可以将forSome仅应用于键,而不是整个对。 Join works now, because this type can be separated into K and V . 现在加入工作,因为这种类型可以分为KV

scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26

Consider 2 Spark RDDs to be joined together.. 考虑将2个Spark RDD连接在一起..

Say, rdd1.first is in the form of (Int, Int, Float) = (1,957,299.98) while rdd2.first is something like (Int, Int) = (25876,1) where the join is supposed to take place on the 1st field from both the RDDs. rdd1.first说, rdd1.first(Int, Int, Float) = (1,957,299.98)rdd2.first类似于(Int, Int) = (25876,1) ,其中连接应该发生在1日来自两个RDD的字段。

scala> rdd1.join(rdd2) --- results in an error :**: error: value join is not a member of org.apache.spark.rdd.RDD[(Int, Int, Float)] scala> rdd1.join(rdd2)---导致错误:**:错误:值join不是org.apache.spark.rdd.RDD [(Int,Int,Float)]的成员

REASON 原因


Both the RDDs should be in the form of a Key-Value pair. 两个RDD都应该是键值对的形式。

Here, rdd2 -- being in the form of (1,957,299.98) -- does not obey this rule.. While rdd1 -- which is in the form of (25876,1) -- does. 在这里,rdd2 - 以(1,957,299.98)的形式 - 不服从这个规则..而rdd1 - 以(25876,1)的形式 - 确实如此。

RESOLUTION 解析度


Convert the output of the 1st RDD from (1,957,299.98) to a Key-Value pair in the form of (1,(957,299.98)) before joining it with rdd2, as shown below: 转换第一RDD的从输出(1,957,299.98)到一个键-值对的形式(1,(957,299.98))与RDD2接合之前,如下所示:

scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) -- modified RDD

scala> rdd1KV.join(rdd2) -- join successful :)
res**: (Int, (Int, Float)) = (1,(957,299.98))

By the way, join is the member of org.apache.spark.rdd.PairRDDFunctions. 顺便说一句,join是org.apache.spark.rdd.PairRDDFunctions的成员。 So make sure you import this on your Eclipse or IDE, wherever you want to run your code. 因此,请确保在Eclipse或IDE上将其导入到您希望运行代码的任何位置。

Article also on my blog: 文章也在我的博客上:

https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM