[英]How to convert RDD[Row] to RDD[String]
I have a DataFrame called source, a table from mysql 我有一个名为source的DataFrame,一个来自mysql的表
val source = sqlContext.read.jdbc(jdbcUrl, "source", connectionProperties)
I have converted it to rdd by 我已将它转换为rdd
val sourceRdd = source.rdd
but its RDD[Row] I need RDD[String] to do transformations like 但它的RDD [Row]我需要RDD [String]来进行转换
source.map(rec => (rec.split(",")(0).toInt, rec)), .subtractByKey(), etc..
Thank you 谢谢
You can use Row. mkString(sep: String): String
你可以使用
Row. mkString(sep: String): String
Row. mkString(sep: String): String
method in a map
call like this : Row. mkString(sep: String): String
map
调用中的Row. mkString(sep: String): String
方法,如下所示:
val sourceRdd = source.rdd.map(_.mkString(","))
You can change the ","
parameter by whatever you want. 您可以根据需要更改
","
参数。
Hope this help you, Best Regards. 希望对您有所帮助,最诚挚的问候。
What is your schema? 你的架构是什么?
If it's just a String, you can use: 如果它只是一个String,你可以使用:
import spark.implicits._
val sourceDS = source.as[String]
val sourceRdd = sourceDS.rdd // will give RDD[String]
Note: use sqlContext instead of spark in Spark 1.6 - spark is a SparkSession, which is a new class in Spark 2.0 and is a new entry point to SQL functionality. 注意:在Spark 1.6中使用sqlContext而不是spark - spark是SparkSession,它是Spark 2.0中的新类,是SQL功能的新入口点。 It should be used instead of SQLContext in Spark 2.x
应该在Spark 2.x中使用它来代替SQLContext
You can also create own case classes. 您还可以创建自己的案例类。
Also you can map rows - here source is of type DataFrame, we use partial function in map function: 你也可以映射行 - 这里的源是DataFrame类型,我们在map函数中使用partial函数:
val sourceRdd = source.rdd.map { case x : Row => x(0).asInstanceOf[String] }.map(s => s.split(","))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.