简体   繁体   中英

Scala: Transform and replace values of Spark DataFrame with nested json structure

I have a nested json file that I am reading as Spark DataFrame and that I want to replace certain values in using an own transformation.

For now let's assume it looks as follows (which follows this )

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

// Convenience function for turning JSON strings into DataFrames.
def jsonToDataFrame(json: String, schema: StructType = null): DataFrame = {
// SparkSessions are available with Spark 2.0+
val reader = spark.read
Option(schema).foreach(reader.schema)
reader.json(sc.parallelize(Array(json)))
}

val df = jsonToDataFrame("""
 {
  "A": {
     "B": "b",
     "C": "c",
     "D": {"E": "e"
          }
        }
      }
 """)

display(df)
df.printSchema()

Suppose the following transformation (turn lower-case to upper-case) shall be applied for certain values in above Spark DataFrame

import org.apache.spark.sql.functions.udf
val upper: String => String = _.toUpperCase
val upperUDF = udf(upper)

While this doesn't work at all:

df.withColumn("A.B", upperUDF('A.B)).show()

the following works:

val df1 = df.select("A.B")
df1.withColumn("B", upperUDF('B)).show()

But in the end I want to stick to my nested structure and just replace certain values accordign to my transformation.

How can one achieve that? How can one preserve the schema wehen using withColumn?

Finally I have found this thread which gives the answer to my question. The trick is to dynamically preserve the schema while transforming the columns. Using the mutate() function defined therein, the following woks well for me:

val df2 = mutate(df, c => if (c.toString == "A.B") upperUDF(c) else c)
val df3 = mutate(df, c => if (c.toString == "A.D.E") upperUDF(c) else c)

display(df2)
df2.printSchema

display(df3)
df3.printSchema

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM