簡體   English   中英

Spark dataframe 上的多個 withColumn + 的全局條件

[英]Global condition on multiple withColumn + when instruction on Spark dataframe

考慮這個 df

+----+------+
|cond|chaine|
+----+------+
|   0|   TF1|
|   1|   TF1|
|   1|   TNT|
+----+------+

我想將此 withColumn 指令應用於具有cond == 1的行:

df.withColumn("New", when($"chaine" === "TF1", "YES!"))
  .withColumn("New2", when($"chaine" === "TF1", "YES2!"))
  .withColumn("New3", when($"chaine" === "TF1", "YES3!"))
  .withColumn("New4", when($"chaine" === "TF1", "YES4!"))

我不能使用.filter ,因為我仍然希望在 output 中有cond =!= 1的行。

我可以通過在代碼中的每個位置添加我的條件來做到這一點:

df.withColumn("New", when($"chaine" === "TF1" AND $"cond" === 1, "YES!"))
  .withColumn("New2", when($"chaine" === "TF1" AND $"cond" === 1, "YES2!"))
  .withColumn("New3", when($"chaine" === "TF1" AND $"cond" === 1, "YES3!"))
  .withColumn("New4", when($"chaine" === "TF1" AND $"cond" === 1, "YES4!"))

但問題是我有很多新專欄,我想要一個更好的解決方案(比如全局配置?)

謝謝你。

一些簡單的句法思想:

def whenCondIs(n: Int)(condition: Column, value: Any): Column =
  when(condition && $"cond" === n, value)

def whenOne(condition: Column, value: Any): Column  = 
  whenCondIs(1)(condition, value)

接着:

df.withColumn("New", whenOne($"chaine" === "TF1", "YES2!"))
  .withColumn("New2", whenOne($"chaine" === "TF1", "YES2!"))

您可以在列表中創建條件和要創建的新列之間的映射,並使用foldLeft將它們添加到 dataframe 中。 像這樣的東西:

val newCols = Seq(
  ("New", "chaine='TF1'", "YES!"),
  ("New2", "chaine='TF1'", "YES2!"),
  ("New3", "chaine='TF1'", "YES3!"),
  ("New4", "chaine='TF1'", "YES4!")
)

val df1 = newCols.foldLeft(df)((acc, x) =>
  acc.withColumn(x._1, when(expr(x._2) && col("cond")===1, lit(x._3)))
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM