简体   繁体   中英

How to apply F.when condition separately for unique subsets of the data

I want to apply a condition over subsets of my data. In the example, I want to use F.when over "A" and "B" from col1 separately, and return the a DataFrame that contains both "A" and "B" with the condition applied.

I have tried to use a group by to do this, but I'm not interested in aggregating the data, I want to return the same number of rows before and after the condition is applied.

import pandas as pd
from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local").appName("test").getOrCreate()

spark.createDataFrame(pd.DataFrame({"col1": ["A", "A", "A", "B", "B"], "score": [1,2,3,1,2] }))

condition = F.when(F.col("score") > 2, 1).otherwise(0)

Does anyone have any advice as to how to solve this problem? Below is my expected output, but it is crucial that the condition is applied over "A" and "B" separately, as my actual use case is a bit different than the toy example supplied.

在此处输入图像描述

Try with:

df.select(df.col1, df.score, condition.alias("send")).show()
# Out:
# +----+-----+----+
# |col1|score|send|
# +----+-----+----+
# |   A|    1|   0|
# |   A|    2|   0|
# |   A|    3|   1|
# |   B|    1|   0|
# |   B|    2|   0|
# +----+-----+----+

(see: pyspark.sql.Column.when )

To apply multiple conditions depending on the row values use:

from pyspark.sql.functions import when
df.withColumn("send", when((df.col1 == "A") & (F.col("score") > 2), 1)
                     .when((df.col1 == "B") & (F.col("score") > 1), 1)
                     .otherwise(0)
             ).show()
# Out:
# +----+-----+----+
# |col1|score|send|
# +----+-----+----+
# |   A|    1|   0|
# |   A|    2|   0|
# |   A|    3|   1|
# |   B|    1|   0|
# |   B|    2|   1|
# +----+-----+----+

( pyspark.sql.functions.when )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM