简体   繁体   English

spark:值直方图不是org.apache.spark.rdd.RDD [Option [Any]]的成员

[英]spark: value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]]

I'm new to spark and scala and I've come up with a compile error with scala: Let's say we have a rdd, which is a map like this: 我是Spark和Scala的新手,但是我遇到了Scala的编译错误:假设我们有一个rdd,它是这样的地图:

val rawData = someRDD.map{
    //some ops
    Map(
    "A" -> someInt_var1  //Int
    "B" -> someInt_var2  //Int
    "C" -> somelong_var  //Long
    )
}

Then, I want to get histogram info of these vars. 然后,我想获取这些变量的直方图信息。 So, here is my code: 所以,这是我的代码:

rawData.map{row => row.get("A")}.histogram(10)

And the compile error says: 并且编译错误说:

value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]] 值直方图不是org.apache.spark.rdd.RDD [Option [Any]]的成员

I'm wondering why rawData.map{row => row.get("A")} is org.apache.spark.rdd.RDD[Option[Any]] and how to transform it to rdd[Int]? 我想知道为什么rawData.map{row => row.get("A")}org.apache.spark.rdd.RDD[Option[Any]]以及如何将其转换为rdd [Int]? I have tried like this: 我已经这样尝试过:

rawData.map{row => row.get("A")}.map{_.toInt}.histogram(10)

But it compiles fail: 但是编译失败:

value toInt is not a member of Option[Any] toInt的值不是Option [Any]的成员

I'm totally confused and seeking for help here. 我很困惑,正在这里寻求帮助。

You get Option because Map.get returns an option; 之所以得到Option是因为Map.get返回一个选项。 Map.get returns None if the key doesn't exist in the Map ; 如果键在Map中不存在, Map.get返回None; And Option[Any] is also related to the miscellaneous data types of the Map's Value, you have both Int and Long, in my case it returns AnyVal instead of Any ; Option[Any]也与Map的Value的其他数据类型有关,您同时拥有Int和Long,在我的情况下,它返回AnyVal而不是Any

A possible solution is use getOrElse to get rid of Option by providing a default value when the key doesn't exist, and if you are sure A 's value is always a int, you can convert it from AnyVal to Int using asInstanceOf[Int] ; 一个可能的解决方案是使用getOrElse通过提供当键不存在,默认值摆脱的选项,如果你确信A的价值始终是一个INT,您可以将其转换从AnyValInt使用asInstanceOf[Int]

A simplified example as follows: 简化示例如下:

val rawData = sc.parallelize(Seq(Map("A" -> 1, "B" -> 2, "C" -> 4L)))

rawData.map(_.get("A"))
// res6: org.apache.spark.rdd.RDD[Option[AnyVal]] = MapPartitionsRDD[9] at map at <console>:27

rawData.map(_.getOrElse("A", 0).asInstanceOf[Int]).histogram(10)
// res7: (Array[Double], Array[Long]) = (Array(1.0, 1.0),Array(1))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM