[英]Making new column in pandas DataFrame based on filter
Given this DataFrame: 鉴于此DataFrame:
df = pandas.DataFrame({"a": [1,10,20,3,10], "b": [50,60,55,0,0], "c": [1,30,1,0,0]})
What is the best way to make a new column, "filter" that has value "pass" if the values at columns a
and b
are both greater than x and value "fail" otherwise? 如果列
a
和b
的值都大于x而值“ fail”,则使新列“ filter”具有值“ pass”的最佳方法是什么?
It can be done by iterating through rows but it's inefficient and inelegant: 可以通过遍历行来完成此操作,但是它效率低下且笨拙:
c = []
for x, v in df.iterrows():
if v["a"] >= 20 and v["b"] >= 20:
c.append("pass")
else:
c.append("fail")
df["filter"] = c
One way would be to create a column of boolean values like this: 一种方法是创建一个布尔值列,如下所示:
>>> df['filter'] = (df['a'] >= 20) & (df['b'] >= 20)
a b c filter
0 1 50 1 False
1 10 60 30 False
2 20 55 1 True
3 3 0 0 False
4 10 0 0 False
You can then change the boolean values to 'pass' or 'fail' using replace
: 然后,您可以使用
replace
将布尔值更改为“通过”或“失败”:
>>> df['filter'].astype(object).replace({False: 'fail', True: 'pass'})
0 fail
1 fail
2 pass
3 fail
4 fail
You can extend this to more columns using all
. 您可以使用
all
将其扩展到更多列。 For example, to find rows across the columns with entries greater than 0: 例如,要查找条目大于0的列中的行:
>>> cols = ['a', 'b', 'c'] # a list of columns to test
>>> df[cols] > 0
a b c
0 True True True
1 True True True
2 True True True
3 True False False
4 True False False
Using all
across axis 1 of this DataFrame creates the new column: 使用此DataFrame的
all
横轴1创建新列:
>>> (df[cols] > 0).all(axis=1)
0 True
1 True
2 True
3 False
4 False
dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.