[英]Add new column to dataframe depending on interqection of existing columns with pyspark
I have a dataframe which consists of two columns我有一个 dataframe 由两列组成
+--------------+------------+
| A| B|
+--------------+------------+
| [b, c]| [a, b, c]|
| [a]| [c, d]|
| [a, c]| [b, c, e]|
| [b, c]| [a, b]|
| [a]| [a, d, e]|
| [a, c]| [b]|
+--------------+------------+
Schema:架构:
|-- A: string (nullable = true)
|-- B: array (nullable = true)
| |-- element: string (containsNull = true)
I want to add a new column which must be O if the intersection of A and B is empty list ([]) and 1 otherwise.我想添加一个新列,如果 A 和 B 的交集为空列表 ([]),则该列必须为 O,否则为 1。 I tried the code below but it seem incorrect at all我尝试了下面的代码,但它似乎完全不正确
df.withColumn('Check', when (list((set(col('A'))&set(col('B')))) !=[] , 0).otherwise(1)).show()
Thank you for your help谢谢您的帮助
I want to add a new column which must be O if the intersection of A and B is empty list ([]) and 1 otherwise.我想添加一个新列,如果 A 和 B 的交集为空列表 ([]),则该列必须为 O,否则为 1。
You can directly use array_intersect with size
and when+otherwise
您可以直接使用 array_intersect 与size
和when+otherwise
import pyspark.sql.functions as F
df.withColumn("Check",(F.size(F.array_intersect("A","B"))!=0).cast("Integer")).show()
or:或者:
df.withColumn("Check",F.when(F.size(F.array_intersect("A","B"))==0,0).otherwise(1)).show()
+------+---------+-----+
| A| B|Check|
+------+---------+-----+
|[b, c]|[a, b, c]| 1|
| [a]| [c, d]| 0|
|[a, c]|[b, c, e]| 1|
|[b, c]| [a, b]| 1|
| [a]|[a, d, e]| 1|
|[a, c]| [b]| 0|
+------+---------+-----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.