[英]Pandas dataframe select rows with multiple columns' string conditions
我有一个类似的 dataframe:
df = pd.DataFrame([{'year':2017, 'text':'yes it is', 'label_one':'POSITIVE', 'label_two':'positive'},
{'year':2017, 'text':'it could be', 'label_one':'POSITIVE', 'label_two':'negative'},
{'year':2017, 'text':'it may be', 'label_one':'NEGATIVE', 'label_two':'positive'},
{'year':2018, 'text':'it has to be done', 'label_one':'POSITIVE', 'label_two':'positive'},
{'year':2018, 'text':'no', 'label_one':'NEGATIVE', 'label_two':'negative'},
{'year':2019, 'text':'you should be afraid of it', 'label_one':'POSITIVE', 'label_two':'negative'},
{'year':2019, 'text':'he is right', 'label_one':'POSITIVE', 'label_two':'positive'},
{'year':2020, 'text':'do not mind, I wil fix it', 'label_one':'NEGATIVE', 'label_two':'positive'},
{'year':2020, 'text':'that is a trap', 'label_one':'NEGATIVE', 'label_two':'negative'},
{'year':2021, 'text':'I am on my way', 'label_one':'POSITIVE', 'label_two':'positive'}])
我如何过滤它以便只有label_one
和label_two
字符串值都是POSITIVE/positive
或NEGATIVE/negative
的行
我尝试了以下方法,但它不起作用:
ptp = df.loc[(df['label_one'].str.startswith('P') and df['label_two'].str.startswith('p')) & (df['label_one'].str.startswith('N') and df['label_two'].str.startswith('n'))]
关于什么
df[df['label_one'].str.lower() == df['label_two'].str.lower()]
假设label_one
和label_two
只持有negative
、 positive
、 NEGATIVE
或POSITIVE
。
这行得通。 按照您的模式都以 P/p 或 N/n 开头
ptp = df.loc[((df['label_one'].str.startswith('P')) &
(df['label_two'].str.startswith('p'))) |
((df['label_one'].str.startswith('N')) &
(df['label_two'].str.startswith('n')))]
给
PTP
year text label_one label_two
0 2017 yes it is POSITIVE positive
3 2018 it has to be done POSITIVE positive
4 2018 no NEGATIVE negative
6 2019 he is right POSITIVE positive
8 2020 that is a trap NEGATIVE negative
9 2021 I am on my way POSITIVE positive
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.