[英]convert RDD Array[Any] = Array(List([String], ListBuffer([string])) to RDD(String, Seq[String])
I have a RDD with Any
type, example: 我有
Any
类型的RDD,例如:
Array(List(Mathematical Sciences, ListBuffer(applications, asymptotic, largest, enable, stochastic)))
I want to convert it to RDD of type RDD[(String, Seq[String])]
我想将其转换为RDD类型的
RDD[(String, Seq[String])]
I tried: 我试过了:
val rdd = sc.makeRDD(strList)
case class X(titleId: String, terms: List[String])
val df = rdd.map { case Array(s0, s1) => X(s0, s1) }.toDF()
I passed a long time to try without success 我花了很长时间尝试没有成功
You can use: 您可以使用:
val result: RDD[(String, Seq[String])] =
rdd.map { case List(s0: String, s1: ListBuffer[String]) => (s0, s1) }
But note that any record in the input RDD[Any]
that doesn't match these types (that can't be checked in compile time) would throw a scala.MatchError
. 但是请注意,输入
RDD[Any]
中与这些类型不匹配(在编译时无法检查)的任何记录都将引发scala.MatchError
。
As mentioned in the question, if you have 如问题中所述,如果您有
val strList = Array(List("Mathematical Sciences", ListBuffer("applications", "asymptotic", "largest", "enable", "stochastic")))
val rdd = sc.makeRDD(strList)
which is of following dataTypes 属于以下dataTypes
rdd: org.apache.spark.rdd.RDD[List[java.io.Serializable]]
You can convert it to your required dataTypes 您可以将其转换为所需的dataTypes
res0: org.apache.spark.rdd.RDD[(String, Seq[String])]
by simply using map
and converting the dataTypes as 通过简单地使用
map
并将dataTypes转换为
rdd.map(x => (x(0).toString, x(1).asInstanceOf[ListBuffer[String]].toSeq))
I hope the answer is helpful 我希望答案是有帮助的
Finally , it s worked i have a warning but worked 最后,它奏效了,我有一个警告,但奏效了
val rdd = sc.makeRDD(strList) val rdd = sc.makeRDD(strList)
val result = rdd.map { case List(s0: String, s1: Seq[String]) => (s0, s1) } val result = rdd.map {case List(s0:String,s1:Seq [String])=>(s0,s1)}
:32: warning: non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure val result = rdd.map { case List(s0: String, s1: Seq[String]) => (s0, s1) } ^ result: org.apache.spark.rdd.RDD[(String, Seq[String])] = MapPartitionsRDD[1051] at map at :32 :32:警告:类型模式Seq [String](Seq [String]的基础)中的非变量类型参数String未选中,因为它已通过擦除val结果= rdd.map {case List(s0:String,s1 :Seq [String])=>(s0,s1)} ^结果:org.apache.spark.rdd.RDD [(String,Seq [String])] = MapPartitionsRDD [1051]在地图上的:32
thank you 谢谢
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.