简体   繁体   English

pyspark - 在列之间

[英]pyspark - isin between columns

I am trying to use isin function to check if a value of a pyspark datarame column appears on the same row of another column.我正在尝试使用isin函数来检查 pyspark datarame 列的值是否出现在另一列的同一行上。

+---+-------------+----+------------+--------+
| ID|         date| loc|   main_list|  GOAL_f|
+---+-------------+----+------------+--------+
|ID1|   2017-07-01|  L1|        [L1]|       1|
|ID1|   2017-07-02|  L1|        [L1]|       1|
|ID1|   2017-07-03|  L2|        [L1]|       0|
|ID1|   2017-07-04|  L2|     [L1,L2]|       1|
|ID1|   2017-07-05|  L1|     [L1,L2]|       1|
|ID1|   2017-07-06|  L3|     [L1,L2]|       0|
|ID1|   2017-07-07|  L3|  [L1,L2,L3]|       1|
+---+-------------+----+------------+--------+

But I am getting errors when trying to collect the main_list for comparison.但是在尝试收集 main_list 进行比较时出现错误。 Here is what I tried unsuccessfully:这是我尝试失败的方法:

df.withColumn('GOAL_f', F.col('loc').isin(F.col('main_list').collect())

Consolidated code:合并代码:

w = Window.partitionBy('id').orderBy('date').rowsBetween(Window.unboundedPreceeding,-1)
df.withColumn('main_list', F.collect_set('loc').over(w))
  .withColumn('GOAL_f', F.col('loc').isin(F.col('main_list').collect())

You could reverse the query, not asking if value is in something, but if something contains the value.您可以反转查询,而不是询问值是否在某物中,而是询问是否包含该值。

Example:例子:

from pyspark.sql import SparkSession
import pyspark.sql.functions as F


if __name__ == "__main__":
    spark = SparkSession.builder.getOrCreate()
    data = [
        {"loc": "L1", "main_list": ["L1", "L2"]},
        {"loc": "L1", "main_list": ["L2"]},
    ]
    df = spark.createDataFrame(data=data)
    df = df.withColumn(
        "GOAL_f",
        F.when(F.array_contains(F.col("main_list"), F.col("loc")), 1).otherwise(0),
    )

Result:结果:

+---+---------+------+
|loc|main_list|GOAL_f|
+---+---------+------+
|L1 |[L1, L2] |1     |
|L1 |[L2]     |0     |
+---+---------+------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM