简体   繁体   English

Spark Scala:将StructType转换为String

[英]Spark Scala: Cast StructType to String

I read json as: 我读json为:

val df = spark.read.json(rdd)

I read messages from different topics so I cannot specify explicit schema. 我阅读了来自不同主题的消息,因此无法指定显式架构。 Some message contains fields with nested json and they are converted to StructType. 某些消息包含带有嵌套json的字段,它们将转换为StructType。 For example: 例如:

{"name": "John", "son": {"name":"Tom"}}

How to cast it to String? 如何将其转换为String? I need to read "son" field as String: 我需要将“儿子”字段读取为字符串:

"{\"name\":\"Tom\"}"

Using cast method or sql function fails: 使用cast方法或sql函数失败:

df.selectExpr("cast(son as string)")

Error: 错误:

java.lang.String is not a valid external type for schema of struct<name:string>

您可以使用to_json轻松返回返回字符串

df.select(to_json(df("son")))

Sorry, I misunderstood your question. 抱歉,我误解了你的问题。 I thought you had different schema and sometimes the field was returned as a struct and sometimes as a string, and that you wanted to transform it to a string every time. 我以为您有不同的架构,有时字段作为结构返回,有时作为字符串返回,并且您想每次将其转换为字符串。 I leave the answer just for information purposes. 我将答案留作参考。


I tried a small test case locally and apparently if I let Spark to interfer the schema, it considers my "son" field as a String. 我在本地尝试了一个小测试用例,显然如果让Spark干扰模式,它会将我的“ son”字段视为String。 I don't know how do you build the processing logic but as a "workaround" you could try to specify a schema manually and type "son" as a String ? 我不知道您如何构建处理逻辑,但是作为“替代方法”,您可以尝试手动指定模式并输入“ son”作为String?

val testDataset =
  """
    | {"name": "John", "son": {"name":"Tom"}}
    | {"name": "John", "son": "Tom"}
  """.stripMargin
val testJsonFile = new File("./test_json.json")
FileUtils.writeStringToFile(testJsonFile, testDataset)


val schema = StructType(
  Seq(StructField("name", DataTypes.StringType, true), StructField("son", DataTypes.StringType, true))
)
val sparkSession = SparkSession.builder()
    .appName("Test inconsistent field type").master("local[*]").getOrCreate()
val structuredJsonData = sparkSession.read.schema(schema).json(testJsonFile.getAbsolutePath)
import sparkSession.implicits._

val collectedDataset = structuredJsonData.map(row => row.getAs[String]("son")).collect()
println(s"got=${collectedDataset.mkString("---")}")
structuredJsonData.printSchema()

It prints: 它打印:

got={"name":"Tom"}---Tom
root
 |-- name: string (nullable = true)
 |-- son: string (nullable = true)

You could still try to define a custom mapping function. 您仍然可以尝试定义自定义映射功能。 However I'm not sure it'll work because when I try to apply a schema with StructType to JSONs with a StringType instead, the whole line is ignored (null values in both fields): 但是,我不确定它是否会起作用,因为当我尝试将具有StructType的架构应用于具有StringType的JSON时,整行都将被忽略(两个字段均为空值):

val testDataset =
  """
    | {"name": "John", "son": {"name":"Tom"}}
    | {"name": "John", "son": "Tom2"}
  """.stripMargin
val testJsonFile = new File("./test_json.json")
FileUtils.writeStringToFile(testJsonFile, testDataset)

val schema = StructType(
  Seq(StructField("name", DataTypes.StringType, true), StructField("son", StructType(Seq(StructField("name", DataTypes.StringType, true))))
  )
)
val sparkSession = SparkSession.builder()
    .appName("Test inconsistent field type").master("local[*]").getOrCreate()
val structuredJsonData = sparkSession.read.schema(schema).json(testJsonFile.getAbsolutePath)
println(s"got=${structuredJsonData.collect().mkString("---")}")
structuredJsonData.printSchema()

It prints: 它打印:

got=[John,[Tom]]---[null,null]
root
 |-- name: string (nullable = true)
 |-- son: struct (nullable = true)
 |    |-- name: string (nullable = true)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 -Spark Scala Mongodb- MongoTypeConversionException 无法将 STRING 转换为 StructType(...) - -Spark Scala Mongodb- MongoTypeConversionException Cannot cast STRING into a StructType(…) Spark将StructType / JSON转换为字符串 - Spark Cast StructType / JSON to String PySpark到Scala:具有StructType,GenericRowWithSchema的UDF无法转换为org.apache.spark.sql.Column - PySpark to Scala: UDF with StructType, GenericRowWithSchema cannot be cast to org.apache.spark.sql.Column 使用定义的StructType转换Spark数据帧的值 - Cast values of a Spark dataframe using a defined StructType Spark - 如何在 Scala 中的 StructType 的开头添加 StructField - Spark - How to add a StructField at the beginning of a StructType in scala Spark scala - 将 StructType 嵌套转换为 Map - Spark scala - Nested StructType conversion to Map 在 spark scala 中使用 caseclass 与 structtype - using caseclass versus structtype in spark scala 从scala中的spark.structType读取 - reading from a spark.structType in scala 如何在Spark(Scala)中将WrappedArray [WrappedArray [(String,String)]]转换为Array [String] - How cast a WrappedArray[WrappedArray[(String, String)]] to Array[String] in Spark (Scala) Scala spark-shell:架构函数structType类型不匹配 - Scala spark-shell: schema function structType type mismatch
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM