简体   繁体   English

如何将RDD [Row]转换为RDD [String]

[英]How to convert RDD[Row] to RDD[String]

I have a DataFrame called source, a table from mysql 我有一个名为source的DataFrame,一个来自mysql的表

val source = sqlContext.read.jdbc(jdbcUrl, "source", connectionProperties)

I have converted it to rdd by 我已将它转换为rdd

val sourceRdd = source.rdd

but its RDD[Row] I need RDD[String] to do transformations like 但它的RDD [Row]我需要RDD [String]来进行转换

source.map(rec => (rec.split(",")(0).toInt, rec)), .subtractByKey(), etc..

Thank you 谢谢

You can use Row. mkString(sep: String): String 你可以使用Row. mkString(sep: String): String Row. mkString(sep: String): String method in a map call like this : Row. mkString(sep: String): String map调用中的Row. mkString(sep: String): String方法,如下所示:

val sourceRdd = source.rdd.map(_.mkString(","))

You can change the "," parameter by whatever you want. 您可以根据需要更改","参数。

Hope this help you, Best Regards. 希望对您有所帮助,最诚挚的问候。

What is your schema? 你的架构是什么?

If it's just a String, you can use: 如果它只是一个String,你可以使用:

import spark.implicits._
val sourceDS = source.as[String]
val sourceRdd = sourceDS.rdd // will give RDD[String]

Note: use sqlContext instead of spark in Spark 1.6 - spark is a SparkSession, which is a new class in Spark 2.0 and is a new entry point to SQL functionality. 注意:在Spark 1.6中使用sqlContext而不是spark - spark是SparkSession,它是Spark 2.0中的新类,是SQL功能的新入口点。 It should be used instead of SQLContext in Spark 2.x 应该在Spark 2.x中使用它来代替SQLContext

You can also create own case classes. 您还可以创建自己的案例类。

Also you can map rows - here source is of type DataFrame, we use partial function in map function: 你也可以映射行 - 这里的源是DataFrame类型,我们在map函数中使用partial函数:

val sourceRdd = source.rdd.map { case x : Row => x(0).asInstanceOf[String] }.map(s => s.split(","))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM