根据现有列与 pyspark 的交互，将新列添加到 dataframe

Question

I have a dataframe which consists of two columns我有一个 dataframe 由两列组成

+--------------+------------+
|             A|           B|
+--------------+------------+
|       [b,  c]|   [a, b, c]|
|           [a]|      [c, d]|
|       [a,  c]|   [b, c, e]|
|       [b,  c]|      [a, b]|
|           [a]|   [a, d, e]|
|       [a,  c]|         [b]|
+--------------+------------+

Schema:架构：

 |-- A: string (nullable = true)
 |-- B: array (nullable = true)
 |    |-- element: string (containsNull = true)

I want to add a new column which must be O if the intersection of A and B is empty list ([]) and 1 otherwise.我想添加一个新列，如果 A 和 B 的交集为空列表 ([])，则该列必须为 O，否则为 1。 I tried the code below but it seem incorrect at all我尝试了下面的代码，但它似乎完全不正确

df.withColumn('Check', when (list((set(col('A'))&set(col('B')))) !=[] , 0).otherwise(1)).show()

Thank you for your help谢谢您的帮助

Answer 1

I want to add a new column which must be O if the intersection of A and B is empty list ([]) and 1 otherwise.我想添加一个新列，如果 A 和 B 的交集为空列表 ([])，则该列必须为 O，否则为 1。

You can directly use array_intersect with size and when+otherwise您可以直接使用 array_intersect 与size和when+otherwise

import pyspark.sql.functions as F
df.withColumn("Check",(F.size(F.array_intersect("A","B"))!=0).cast("Integer")).show()

or:或者：

df.withColumn("Check",F.when(F.size(F.array_intersect("A","B"))==0,0).otherwise(1)).show()

+------+---------+-----+
|     A|        B|Check|
+------+---------+-----+
|[b, c]|[a, b, c]|    1|
|   [a]|   [c, d]|    0|
|[a, c]|[b, c, e]|    1|
|[b, c]|   [a, b]|    1|
|   [a]|[a, d, e]|    1|
|[a, c]|      [b]|    0|
+------+---------+-----+

根据现有列与 pyspark 的交互，将新列添加到 dataframe

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-06-26 15:51:06

根据现有列与 pyspark 的交互，将新列添加到 dataframe

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-06-26 15:51:06

解决方案1
3 已采纳 2020-06-26 15:51:06