基于过滤器在Pandas DataFrame中创建新列

Question

Given this DataFrame: 鉴于此DataFrame：

df = pandas.DataFrame({"a": [1,10,20,3,10], "b": [50,60,55,0,0], "c": [1,30,1,0,0]})

What is the best way to make a new column, "filter" that has value "pass" if the values at columns a and b are both greater than x and value "fail" otherwise? 如果列a和b的值都大于x而值“ fail”，则使新列“ filter”具有值“ pass”的最佳方法是什么？

It can be done by iterating through rows but it's inefficient and inelegant: 可以通过遍历行来完成此操作，但是它效率低下且笨拙：

c = []

for x, v in df.iterrows():
     if v["a"] >= 20 and v["b"] >= 20:
         c.append("pass")
     else:
         c.append("fail")

df["filter"] = c

Answer 1

One way would be to create a column of boolean values like this: 一种方法是创建一个布尔值列，如下所示：

>>> df['filter'] = (df['a'] >= 20) & (df['b'] >= 20)
    a   b   c filter
0   1  50   1  False
1  10  60  30  False
2  20  55   1   True
3   3   0   0  False
4  10   0   0  False

You can then change the boolean values to 'pass' or 'fail' using replace : 然后，您可以使用replace将布尔值更改为“通过”或“失败”：

>>> df['filter'].astype(object).replace({False: 'fail', True: 'pass'})
0    fail
1    fail
2    pass
3    fail
4    fail

You can extend this to more columns using all . 您可以使用all将其扩展到更多列。 For example, to find rows across the columns with entries greater than 0: 例如，要查找条目大于0的列中的行：

>>> cols = ['a', 'b', 'c'] # a list of columns to test
>>> df[cols] > 0 
      a      b      c
0  True   True   True
1  True   True   True
2  True   True   True
3  True  False  False
4  True  False  False

Using all across axis 1 of this DataFrame creates the new column: 使用此DataFrame的all横轴1创建新列：

>>> (df[cols] > 0).all(axis=1)
0     True
1     True
2     True
3    False
4    False
dtype: bool

基于过滤器在Pandas DataFrame中创建新列

问题描述

1 个解决方案

解决方案1
6 已采纳 2014-11-09 17:14:39

基于过滤器在Pandas DataFrame中创建新列

问题描述

1 个解决方案

解决方案1 6 已采纳 2014-11-09 17:14:39

解决方案1
6 已采纳 2014-11-09 17:14:39