[英]spark: value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]]
I'm new to spark and scala and I've come up with a compile error with scala: Let's say we have a rdd, which is a map like this: 我是Spark和Scala的新手,但是我遇到了Scala的编译错误:假设我们有一个rdd,它是这样的地图:
val rawData = someRDD.map{
//some ops
Map(
"A" -> someInt_var1 //Int
"B" -> someInt_var2 //Int
"C" -> somelong_var //Long
)
}
Then, I want to get histogram info of these vars. 然后,我想获取这些变量的直方图信息。 So, here is my code: 所以,这是我的代码:
rawData.map{row => row.get("A")}.histogram(10)
And the compile error says: 并且编译错误说:
value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]] 值直方图不是org.apache.spark.rdd.RDD [Option [Any]]的成员
I'm wondering why rawData.map{row => row.get("A")}
is org.apache.spark.rdd.RDD[Option[Any]]
and how to transform it to rdd[Int]? 我想知道为什么rawData.map{row => row.get("A")}
是org.apache.spark.rdd.RDD[Option[Any]]
以及如何将其转换为rdd [Int]? I have tried like this: 我已经这样尝试过:
rawData.map{row => row.get("A")}.map{_.toInt}.histogram(10)
But it compiles fail: 但是编译失败:
value toInt is not a member of Option[Any] toInt的值不是Option [Any]的成员
I'm totally confused and seeking for help here. 我很困惑,正在这里寻求帮助。
You get Option
because Map.get
returns an option; 之所以得到Option
是因为Map.get
返回一个选项。 Map.get
returns None if the key doesn't exist in the Map
; 如果键在Map
中不存在, Map.get
返回None; And Option[Any]
is also related to the miscellaneous data types of the Map's Value, you have both Int and Long, in my case it returns AnyVal
instead of Any
; Option[Any]
也与Map的Value的其他数据类型有关,您同时拥有Int和Long,在我的情况下,它返回AnyVal
而不是Any
;
A possible solution is use getOrElse
to get rid of Option by providing a default value when the key doesn't exist, and if you are sure A
's value is always a int, you can convert it from AnyVal
to Int
using asInstanceOf[Int]
; 一个可能的解决方案是使用getOrElse
通过提供当键不存在,默认值摆脱的选项,如果你确信A
的价值始终是一个INT,您可以将其转换从AnyVal
到Int
使用asInstanceOf[Int]
;
A simplified example as follows: 简化示例如下:
val rawData = sc.parallelize(Seq(Map("A" -> 1, "B" -> 2, "C" -> 4L)))
rawData.map(_.get("A"))
// res6: org.apache.spark.rdd.RDD[Option[AnyVal]] = MapPartitionsRDD[9] at map at <console>:27
rawData.map(_.getOrElse("A", 0).asInstanceOf[Int]).histogram(10)
// res7: (Array[Double], Array[Long]) = (Array(1.0, 1.0),Array(1))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.