PySpark：when子句中的多个条件

Question

I would like to modify the cell values of a dataframe column (Age) where currently it is blank and I would only do it if another column (Survived) has the value 0 for the corresponding row where it is blank for Age. 我想修改数据帧列（Age）的单元格值，其中当前它是空白的，我只会在另一列（Survived）的值为0时为相应的行进行修改，其中Age为空白。 If it is 1 in the Survived column but blank in Age column then I will keep it as null. 如果它在Survived列中为1但在Age列中为空，那么我将它保持为null。

I tried to use && operator but it didn't work. 我尝试使用&&运算符，但它没有用。 Here is my code: 这是我的代码：

tdata.withColumn("Age",  when((tdata.Age == "" && tdata.Survived == "0"), mean_age_0).otherwise(tdata.Age)).show()

Any suggestions how to handle that? 任何建议如何处理？ Thanks. 谢谢。

Error Message: 错误信息：

SyntaxError: invalid syntax
  File "<ipython-input-33-3e691784411c>", line 1
    tdata.withColumn("Age",  when((tdata.Age == "" && tdata.Survived == "0"), mean_age_0).otherwise(tdata.Age)).show()
                                                    ^

Answer 1

You get SyntaxError error exception because Python has no && operator. 你得到SyntaxError错误异常，因为Python没有&&运算符。 It has and and & where the latter one is the correct choice to create boolean expressions on Column ( | for a logical disjunction and ~ for logical negation). 它具有and & ，其中后者是在Column上创建布尔表达式的正确选择（ |用于逻辑析取，而~用于逻辑否定）。

Condition you created is also invalid because it doesn't consider operator precedence . 您创建的条件也无效，因为它不考虑运算符优先级。 & in Python has a higher precedence than == so expression has to be parenthesized. &在Python中具有比==更高的优先级，因此表达式必须用括号括起来。

(col("Age") == "") & (col("Survived") == "0")
## Column<b'((Age = ) AND (Survived = 0))'>

On a side note when function is equivalent to case expression not WHEN clause. 在旁注when函数等效于case表达式而不是WHEN子句。 Still the same rules apply. 仍然适用相同的规则。 Conjunction: 连词：

df.where((col("foo") > 0) & (col("bar") < 0))

Disjunction: 分离：

df.where((col("foo") > 0) | (col("bar") < 0))

You can of course define conditions separately to avoid brackets: 您当然可以单独定义条件以避免使用括号：

cond1 = col("Age") == "" 
cond2 = col("Survived") == "0"

cond1 & cond2

Answer 2

它至少应该在pyspark 2.4中起作用

tdata = tdata.withColumn("Age",  when((tdata.Age == "") & (tdata.Survived == "0") , "NewValue").otherwise(tdata.Age))

Answer 3

when in pyspark multiple conditions can be built using & (for and) and | 当在pyspark中 时，可以使用＆（for和）和|来构建多个条件 (for or). （for或）。

Note:In pyspark t is important to enclose every expressions within parenthesis () that combine to form the condition 注意：在pyspark中，重要的是将括号内的每个表达式括起来组合形成条件

%pyspark
dataDF = spark.createDataFrame([(66, "a", "4"), 
                                (67, "a", "0"), 
                                (70, "b", "4"), 
                                (71, "d", "4")],
                                ("id", "code", "amt"))
dataDF.withColumn("new_column",
       when((col("code") == "a") | (col("code") == "d"), "A")
      .when((col("code") == "b") & (col("amt") == "4"), "B")
      .otherwise("A1")).show()

In Spark Scala code ( && ) or ( || ) conditions can be used within when function 在Spark Scala代码（ && ）或（ || ）中，条件可以在函数内使用

//scala
val dataDF = Seq(
      (66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4"
      )).toDF("id", "code", "amt")
dataDF.withColumn("new_column",
       when(col("code") === "a" || col("code") === "d", "A")
      .when(col("code") === "b" && col("amt") === "4", "B")
      .otherwise("A1")).show()

======================= =======================

Output:
+---+----+---+----------+
| id|code|amt|new_column|
+---+----+---+----------+
| 66|   a|  4|         A|
| 67|   a|  0|         A|
| 70|   b|  4|         B|
| 71|   d|  4|         A|
+---+----+---+----------+

This code snippet is copied from sparkbyexamples.com 此代码段是从sparkbyexamples.com复制的

Answer 4

它应该是：

$when(((tdata.Age == "" ) & (tdata.Survived == "0")), mean_age_0)

PySpark：when子句中的多个条件

问题描述

4 个解决方案

解决方案1
75 已采纳 2016-06-08 21:02:25

解决方案2
1 2019-03-27 10:22:58

解决方案3
0 2019-07-10 15:02:47

解决方案4
-1 2018-01-12 12:55:18

PySpark：when子句中的多个条件

问题描述

4 个解决方案

解决方案1 75 已采纳 2016-06-08 21:02:25

解决方案2 1 2019-03-27 10:22:58

解决方案3 0 2019-07-10 15:02:47

解决方案4 -1 2018-01-12 12:55:18

解决方案1
75 已采纳 2016-06-08 21:02:25

解决方案2
1 2019-03-27 10:22:58

解决方案3
0 2019-07-10 15:02:47

解决方案4
-1 2018-01-12 12:55:18