如何将RDD [Row]转换为RDD [String]

Question

I have a DataFrame called source, a table from mysql 我有一个名为source的DataFrame，一个来自mysql的表

val source = sqlContext.read.jdbc(jdbcUrl, "source", connectionProperties)

I have converted it to rdd by 我已将它转换为rdd

val sourceRdd = source.rdd

but its RDD[Row] I need RDD[String] to do transformations like 但它的RDD [Row]我需要RDD [String]来进行转换

source.map(rec => (rec.split(",")(0).toInt, rec)), .subtractByKey(), etc..

Thank you 谢谢

Answer 1

You can use Row. mkString(sep: String): String 你可以使用Row. mkString(sep: String): String Row. mkString(sep: String): String method in a map call like this : Row. mkString(sep: String): String map调用中的Row. mkString(sep: String): String方法，如下所示：

val sourceRdd = source.rdd.map(_.mkString(","))

You can change the "," parameter by whatever you want. 您可以根据需要更改","参数。

Hope this help you, Best Regards. 希望对您有所帮助，最诚挚的问候。

Answer 2

What is your schema? 你的架构是什么？

If it's just a String, you can use: 如果它只是一个String，你可以使用：

import spark.implicits._
val sourceDS = source.as[String]
val sourceRdd = sourceDS.rdd // will give RDD[String]

Note: use sqlContext instead of spark in Spark 1.6 - spark is a SparkSession, which is a new class in Spark 2.0 and is a new entry point to SQL functionality. 注意：在Spark 1.6中使用sqlContext而不是spark - spark是SparkSession，它是Spark 2.0中的新类，是SQL功能的新入口点。 It should be used instead of SQLContext in Spark 2.x 应该在Spark 2.x中使用它来代替SQLContext

You can also create own case classes. 您还可以创建自己的案例类。

Also you can map rows - here source is of type DataFrame, we use partial function in map function: 你也可以映射行 - 这里的源是DataFrame类型，我们在map函数中使用partial函数：

val sourceRdd = source.rdd.map { case x : Row => x(0).asInstanceOf[String] }.map(s => s.split(","))

如何将RDD [Row]转换为RDD [String]

问题描述

2 个解决方案

解决方案1
7 已采纳 2017-05-19 11:20:33

解决方案2
1 2017-05-19 10:31:32

如何将RDD [Row]转换为RDD [String]

问题描述

2 个解决方案

解决方案1 7 已采纳 2017-05-19 11:20:33

解决方案2 1 2017-05-19 10:31:32

解决方案1
7 已采纳 2017-05-19 11:20:33

解决方案2
1 2017-05-19 10:31:32