Pandas dataframe select 具有多列字符串条件的行

Question

我有一个类似的 dataframe：

df = pd.DataFrame([{'year':2017, 'text':'yes it is', 'label_one':'POSITIVE', 'label_two':'positive'}, 
{'year':2017, 'text':'it could be', 'label_one':'POSITIVE', 'label_two':'negative'},
{'year':2017, 'text':'it may be', 'label_one':'NEGATIVE', 'label_two':'positive'},
{'year':2018, 'text':'it has to be done', 'label_one':'POSITIVE', 'label_two':'positive'},
{'year':2018, 'text':'no', 'label_one':'NEGATIVE', 'label_two':'negative'},
{'year':2019, 'text':'you should be afraid of it', 'label_one':'POSITIVE', 'label_two':'negative'},
{'year':2019, 'text':'he is right', 'label_one':'POSITIVE', 'label_two':'positive'},
{'year':2020, 'text':'do not mind, I wil fix it', 'label_one':'NEGATIVE', 'label_two':'positive'},
{'year':2020, 'text':'that is a trap', 'label_one':'NEGATIVE', 'label_two':'negative'},
{'year':2021, 'text':'I am on my way', 'label_one':'POSITIVE', 'label_two':'positive'}])

我如何过滤它以便只有label_one和label_two字符串值都是POSITIVE/positive或NEGATIVE/negative的行

我尝试了以下方法，但它不起作用：

ptp = df.loc[(df['label_one'].str.startswith('P') and df['label_two'].str.startswith('p')) & (df['label_one'].str.startswith('N') and df['label_two'].str.startswith('n'))]

Answer 1

关于什么

df[df['label_one'].str.lower() == df['label_two'].str.lower()]

假设label_one和label_two只持有negative 、 positive 、 NEGATIVE或POSITIVE 。

Answer 2

这行得通。 按照您的模式都以 P/p 或 N/n 开头

ptp = df.loc[((df['label_one'].str.startswith('P')) &
              (df['label_two'].str.startswith('p'))) |          
             ((df['label_one'].str.startswith('N')) &        
              (df['label_two'].str.startswith('n')))]

给

PTP
        year    text                label_one   label_two
    0   2017    yes it is           POSITIVE    positive
    3   2018    it has to be done   POSITIVE    positive
    4   2018    no                  NEGATIVE    negative
    6   2019    he is right         POSITIVE    positive
    8   2020    that is a trap      NEGATIVE    negative
    9   2021    I am on my way      POSITIVE    positive

Pandas dataframe select 具有多列字符串条件的行

问题描述

2 个解决方案

解决方案1
4 2021-01-30 18:20:21

解决方案2
2 已采纳 2021-01-30 18:22:44

Pandas dataframe select 具有多列字符串条件的行

问题描述

2 个解决方案

解决方案1 4 2021-01-30 18:20:21

解决方案2 2 已采纳 2021-01-30 18:22:44

解决方案1
4 2021-01-30 18:20:21

解决方案2
2 已采纳 2021-01-30 18:22:44