Say I have this original dataframe:
var df1 = Seq(("John","Jameson","TRUE","TRUE","FALSE"),("Kevin","Smith","TRUE","FALSE","TRUE"))
.toDF("First Name","Last Name","Married","Employed","Children")
and I want to convert it so that it fits into this template:
The output dataframe will look like this:
I want to iterate over the columns, "Married","Employed","Children", using "when" conditions and then populate the template like the screenshot above.
Any help would truly be appreciated!
Have a great day.
You could pair up each of the selected column values/names into a Struct
, group them into an Array
and flatten them via explode
, as shown below:
val df = Seq(
("John", "Jameson", "TRUE", "TRUE", "FALSE"),
("Kevin", "Smith", "TRUE", "FALSE", "TRUE")
).toDF("First Name", "Last Name", "Married", "Employed", "Children")
val cols = df.columns.filterNot(_.endsWith("Name"))
// cols: Array[String] = Array(Married, Employed, Children)
df.
withColumn("Temp", explode(array(cols.map(
c => struct(col(c).as("Value"), lit(c).as("Criteria"))): _*))
).
select($"First Name" :: $"Last Name" :: $"Temp.*" :: Nil: _*).
show
// +----------+---------+-----+--------+
// |First Name|Last Name|Value|Criteria|
// +----------+---------+-----+--------+
// | John| Jameson| TRUE| Married|
// | John| Jameson| TRUE|Employed|
// | John| Jameson|FALSE|Children|
// | Kevin| Smith| TRUE| Married|
// | Kevin| Smith|FALSE|Employed|
// | Kevin| Smith| TRUE|Children|
// +----------+---------+-----+--------+
Another solution using stack() function
val df = Seq(
("John", "Jameson", "TRUE", "TRUE", "FALSE"),
("Kevin", "Smith", "TRUE", "FALSE", "TRUE")
).toDF("First Name", "Last Name", "Married", "Employed", "Children")
df.show(false)
df.createOrReplaceTempView("df")
+----------+---------+-------+--------+--------+
|First Name|Last Name|Married|Employed|Children|
+----------+---------+-------+--------+--------+
|John |Jameson |TRUE |TRUE |FALSE |
|Kevin |Smith |TRUE |FALSE |TRUE |
+----------+---------+-------+--------+--------+
spark.sql("""
select `First Name`, `Last Name`, stack(3,Married,"Married",Employed,"Employed",Children,"Children") (Value,Criteria) from df
""").show(false)
+----------+---------+-----+--------+
|First Name|Last Name|Value|Criteria|
+----------+---------+-----+--------+
|John |Jameson |TRUE |Married |
|John |Jameson |TRUE |Employed|
|John |Jameson |FALSE|Children|
|Kevin |Smith |TRUE |Married |
|Kevin |Smith |FALSE|Employed|
|Kevin |Smith |TRUE |Children|
+----------+---------+-----+--------+
If you want to use dataframe steps:
df.selectExpr( "`First Name`", "`Last Name`", """ stack(3,Married,"Married",Employed,"Employed",Children,"Children") (value,criteria) """ ).show(false)
+----------+---------+-----+--------+
|First Name|Last Name|value|criteria|
+----------+---------+-----+--------+
|John |Jameson |TRUE |Married |
|John |Jameson |TRUE |Employed|
|John |Jameson |FALSE|Children|
|Kevin |Smith |TRUE |Married |
|Kevin |Smith |FALSE|Employed|
|Kevin |Smith |TRUE |Children|
+----------+---------+-----+--------+
Or:
df.select( $"First Name", $"Last Name", expr(""" stack(3,Married,"Married",Employed,"Employed",Children,"Children") (value,criteria) """) ).show(false)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.