简体   繁体   English

基于另一个列值的 pandas dataframe 列上的条件过滤器阈值

[英]Conditional filter threshold on pandas dataframe column based on another column value

Let's say I have a dataframe with two columns, and I would like to filter the values of the second column based on different thresholds that are determined by the values of the first column.假设我有一个包含两列的 dataframe,我想根据由第一列的值确定的不同阈值来过滤第二列的值。 Such thresholds are defined in a dictionary, whose keys are the first column values, and the dict values are the thresholds.这样的阈值在字典中定义,其键是第一列值,字典值是阈值。 There will be also a default value to match columns that do not have any of the specified values.还将有一个默认值来匹配没有任何指定值的列。

So for example:例如:

thresholds_dict = {"A": 5, "B": 2, "C": 4, "default": 0}

sample_dataframe = 
| Column1 | Column2 |
|   A     | 3       |
|   A     | 6       |
|   B     | 4       |
|   B     | 1       |
|   C     | 2       |
|   D     | 0       |

//Get threshold from dict based on value of Column1 on ...
result_dataframe = sample_dataframe[sample_dataframe[Column2] >= ...] 

result_dataframe =
| Column1 | Column2 |
|   A     | 6       |
|   B     | 4       |
|   D     | 0       |

What would be the best way to achieve this?实现这一目标的最佳方法是什么? (Not sure what to write in... part). (不确定在...部分写什么)。

PySpark version. PySpark 版本。

Your dataframe:您的 dataframe:

from pyspark.sql import functions as F

sample_dataframe = spark.createDataFrame(
    [("A", 3),
     ("A", 6),
     ("B", 4),
     ("B", 1),
     ("C", 2),
     ("D", 0)],
    ["Column1", "Column2"]
)
thresholds_dict = {"A": 5, "B": 2, "C": 4, "default": 0}

Script:脚本:

comparison = F.when(F.lit(False), None)
for k, v in thresholds_dict.items():
    comparison = comparison.when(F.col("Column1") == k, v)
comparison = comparison.otherwise(thresholds_dict["default"])

result_dataframe = sample_dataframe.filter(F.col("Column2") >= comparison)

result_dataframe.show()
# +-------+-------+
# |Column1|Column2|
# +-------+-------+
# |      A|      6|
# |      B|      4|
# |      D|      0|
# +-------+-------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM