[英]Spark RDD tuple transformation
I'm trying to transform an RDD of tuple of Strings of this format : 我正在尝试转换这种格式的字符串元组的RDD:
(("abc","xyz","123","2016-02-26T18:31:56"),"15")
TO (("abc","xyz","123","2016-02-26T18:31:56"),"15")
(("abc","xyz","123"),"2016-02-26T18:31:56","15")
Basically seperating out the timestamp string as a seperate tuple element. 基本上将时间戳字符串分隔为单独的元组元素。 I tried following but it's still not clean and correct. 我尝试了以下操作,但仍然不够干净和正确。
val result = rdd.map(r => (r._1.toString.split(",").toVector.dropRight(1).toString, r._1.toString.split(",").toList.last.toString, r._2))
However, it results in 但是,它导致
(Vector(("abc", "xyz", "123"),"2016-02-26T18:31:56"),"15")
The expected output I'm looking for is 我正在寻找的预期输出是
(("abc", "xyz", "123"),"2016-02-26T18:31:56","15")
This way I can access the elements using r._1
, r._2
(the timestamp string) and r._3
in a seperate map operation. 这样,我可以在单独的映射操作中使用r._1
, r._2
(时间戳字符串)和r._3
访问元素。
Any hints/pointers will be greatly appreciated. 任何提示/指针将不胜感激。
Vector.toString
will include the String 'Vector' in its result. Vector.toString
中将包含字符串“ Vector”。 Instead, use Vector.mkString(",")
. 而是使用Vector.mkString(",")
。
Example: 例:
scala> val xs = Vector(1,2,3)
xs: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3)
scala> xs.toString
res25: String = Vector(1, 2, 3)
scala> xs.mkString
res26: String = 123
scala> xs.mkString(",")
res27: String = 1,2,3
However, if you want to be able to access (abc,xyz,123)
as a Tuple and not as a string, you could also do the following: 但是,如果您希望能够以元组而不是字符串的形式访问(abc,xyz,123)
,则还可以执行以下操作:
val res = rdd.map{
case ((a:String,b:String,c:String,ts:String),d:String) => ((a,b,c),ts,d)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.