簡體   English   中英

我將如何在 spark scala 中執行此 dataframe 轉換?

[英]How would I do this dataframe transformation in spark scala?

假設我有這個原裝 dataframe:

  var df1 = Seq(("John","Jameson","TRUE","TRUE","FALSE"),("Kevin","Smith","TRUE","FALSE","TRUE"))
    .toDF("First Name","Last Name","Married","Employed","Children")

在此處輸入圖像描述

我想將其轉換為適合此模板:

在此處輸入圖像描述

output dataframe 將如下所示:

在此處輸入圖像描述

我想使用“when”條件遍歷“Married”、“Employed”、“Children”列,然后像上面的屏幕截圖一樣填充模板。

任何幫助將不勝感激!

祝你有美好的一天。

您可以將每個選定的列值/名稱配對到Struct中,將它們分組到Array中並通過explode將它們展平,如下所示:

val df = Seq(
  ("John", "Jameson", "TRUE", "TRUE", "FALSE"),
  ("Kevin", "Smith", "TRUE", "FALSE", "TRUE")
).toDF("First Name", "Last Name", "Married", "Employed", "Children")

val cols = df.columns.filterNot(_.endsWith("Name"))
// cols: Array[String] = Array(Married, Employed, Children)

df.
  withColumn("Temp", explode(array(cols.map(
    c => struct(col(c).as("Value"), lit(c).as("Criteria"))): _*))
  ).
  select($"First Name" :: $"Last Name" :: $"Temp.*" :: Nil: _*).
  show
// +----------+---------+-----+--------+
// |First Name|Last Name|Value|Criteria|
// +----------+---------+-----+--------+
// |      John|  Jameson| TRUE| Married|
// |      John|  Jameson| TRUE|Employed|
// |      John|  Jameson|FALSE|Children|
// |     Kevin|    Smith| TRUE| Married|
// |     Kevin|    Smith|FALSE|Employed|
// |     Kevin|    Smith| TRUE|Children|
// +----------+---------+-----+--------+

使用 stack() function 的另一種解決方案

val df = Seq(
              ("John", "Jameson", "TRUE", "TRUE", "FALSE"),
              ("Kevin", "Smith", "TRUE", "FALSE", "TRUE")
).toDF("First Name", "Last Name", "Married", "Employed", "Children")
df.show(false)
df.createOrReplaceTempView("df")

+----------+---------+-------+--------+--------+
|First Name|Last Name|Married|Employed|Children|
+----------+---------+-------+--------+--------+
|John      |Jameson  |TRUE   |TRUE    |FALSE   |
|Kevin     |Smith    |TRUE   |FALSE   |TRUE    |
+----------+---------+-------+--------+--------+

spark.sql("""
select `First Name`, `Last Name`, stack(3,Married,"Married",Employed,"Employed",Children,"Children") (Value,Criteria) from df
""").show(false)

+----------+---------+-----+--------+
|First Name|Last Name|Value|Criteria|
+----------+---------+-----+--------+
|John      |Jameson  |TRUE |Married |
|John      |Jameson  |TRUE |Employed|
|John      |Jameson  |FALSE|Children|
|Kevin     |Smith    |TRUE |Married |
|Kevin     |Smith    |FALSE|Employed|
|Kevin     |Smith    |TRUE |Children|
+----------+---------+-----+--------+

如果要使用 dataframe 步驟:

df.selectExpr( "`First Name`", "`Last Name`",  """ stack(3,Married,"Married",Employed,"Employed",Children,"Children") (value,criteria) """ ).show(false)

+----------+---------+-----+--------+
|First Name|Last Name|value|criteria|
+----------+---------+-----+--------+
|John      |Jameson  |TRUE |Married |
|John      |Jameson  |TRUE |Employed|
|John      |Jameson  |FALSE|Children|
|Kevin     |Smith    |TRUE |Married |
|Kevin     |Smith    |FALSE|Employed|
|Kevin     |Smith    |TRUE |Children|
+----------+---------+-----+--------+

或者:

df.select( $"First Name", $"Last Name", expr(""" stack(3,Married,"Married",Employed,"Employed",Children,"Children") (value,criteria) """) ).show(false)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM