简体   繁体   English

如何在 Spark Scala 中将列数据类型转换为字符串?

[英]How do I convert a Column DataType to String in Spark Scala?

I am trying to call the from_json method and want to fetch the schema of the JSON dynamically.我正在尝试调用 from_json 方法并希望动态获取 JSON 的模式。 The issue is with the third.withColumn line as it doesn't seem to like Seq[Column].问题在于 third.withColumn 行,因为它似乎不喜欢 Seq[Column]。

val randomStringGen = udf((length: Int) => {
scala.util.Random.alphanumeric.take(length).mkString
})

val randomKeyGen = udf((key: String, value: String) => {
s"""{"${key}": "${value}"}"""
})

val resultDF = initDF
.withColumn("value", randomStringGen(lit(10)))
.withColumn("keyValue", randomKeyGen(lit("key"), col("value")))
.withColumn("key", from_json(col("keyValue"), spark.read.json(Seq(col("keyValue")).toDS).schema))

error: value toDS is not a member of Seq[org.apache.spark.sql.Column].withColumn("key", from_json(col("keyValue"), spark.read.json(Seq(col("keyValue")).toDS).schema))错误:值 toDS 不是 Seq[org.apache.spark.sql.Column].withColumn("key", from_json(col("keyValue"), spark.read.json(Seq(col("keyValue" )).toDS).模式))

I have a known solution which is simply to hard code a sample JSON:我有一个已知的解决方案,它只是对示例 JSON 进行硬编码:

val jsData = """{"key": "value"}"""

and replace the col("keyValue") with the hardcoded variable.并将 col("keyValue") 替换为硬编码变量。

.withColumn("key", from_json(col("keyValue"), spark.read.json(Seq(jsData).toDS).schema))

This works and produces exactly what I want, but if I have a large json, then this method can be quite cumbersome.这有效并产生了我想要的结果,但如果我有一个大的 json,那么这种方法可能会非常麻烦。

There are 2 little errors in what you're writing.你写的有两个小错误。

First, if you want to use the toDS method on a Seq you'll need to import spark.implicits._ .首先,如果你想在Seq上使用toDS方法,你需要import spark.implicits._ This method is defined in there.这个方法在那里定义。 Doing that should get rid of your first error.这样做应该可以消除您的第一个错误。

Secondly, the from_json function you're trying to use has the following function signature:其次,您尝试使用的from_json function 具有以下 function 签名:

def from_json(e: Column, schema: StructType): Column

So that second field, schema , should be a StructType.所以第二个字段schema应该是一个 StructType。 This is what the .schema method of a Dataset returns.这是 Dataset 的.schema方法返回的内容。 So, one of your parentheses is at the wrong position.所以,你的括号之一是错误的 position。

Instead of代替

.withColumn("key", from_json(col("keyValue"), spark.read.json(Seq(col("keyValue")).toDS.schema)))

you should have你应该有

.withColumn("key", from_json(col("keyValue"), spark.read.json(Seq(col("keyValue")).toDS).schema))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在火花数据框中,如何使用 scala 将字符串类型的日期列转换为日期类型的日期列 - In spark Data frame how to convert Date column of type string to Date column of type Date using scala 如何读取 csv 文件并将一列转换为 Scala+Spark 中的 Map[String, String] 类型? - How to read csv file and convert one column to Map[String, String] type in Scala+Spark? 如何将Spark数据帧中的WrappedArray列转换为Strings? - How do I convert a WrappedArray column in spark dataframe to Strings? 如何在 Scala 中更新列中的值后重新推断 Spark Dataframe 列的数据类型 - How to reinfer datatype of a Spark Dataframe column after updating values in the column in Scala 我如何将DataFrame列名称转换为Spark-Scala中的值 - How could i convert a DataFrame Column name into a value in Spark-Scala 使用 Scala 将某个 DataType 的所有列的 DataType 转换为 Spark DataFrame 中的另一个 DataType - Convert DataType of all columns of certain DataType to another DataType in Spark DataFrame using Scala 如何在Spark / Scala中使用countDistinct? - How do I use countDistinct in Spark/Scala? 如何在 Spark 中将 Dataframe 的 String 列转换为 Struct - How do I cast String column of Dataframe As Struct in Spark 将cloumn的数据类型从StringType转换为spark scala中dataframe中的StructType - Convert datatype of cloumn from StringType to StructType in dataframe in spark scala 如何将地图数组转换为 Scala/Spark 中的单个 map 列? - How to convert array of maps to single map column in Scala/Spark?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM