使用DataFrame.apply在Pandas中使用特定列创建新列

Question

I have a data frame like this 我有一个这样的数据框

ID    8-Jan 15-Jan  22-Jan  29-Jan  5-Feb   12-Feb  LowerBound   UpperBound  Problem
001    618    720    645     573     503     447     401.329      662.670     False
002    62     80      67      94      81      65     45.710       126.289     False   
003    32     10      23      26      26      31     12.314       58.114      True 
004    22     13       1      28      19      25     16.438       41.418      True
005    9       7       9      6        8       4     1.182        20.102      False

I want to create a new column which would be a Boolean column such that I want to iterate through all the weeks for each ID and if any value lies outside the upper and lower bound column I set it equal to True else False. 我想创建一个新的列，该列将是一个布尔列，这样我想遍历每个ID的所有星期，并且如果任何值位于上下限列之外，则将其设置为True，否则设置为False。 The upper and lower bound values in this case are dummy so the data will not return these values. 在这种情况下，上下限值是虚拟的，因此数据将不会返回这些值。 The resulting column should be like the Problem column 结果列应类似于“ Problem列

I know the hard way of doing this which is absolutely inefficient 我知道很难做到这一点，这绝对是低效的

import  pandas as pd

def Problem(df):
    r = []
    for i in range(len(df)):
        res = []
        x = [df['Week1'][i], df['Week2'][i], df['Week3'][i], df['Week4'][i], df['Week5'][i]]
        for j in range (len(x)):
            if (df['LowerBound'][i] <= x[j] <= df['UpperBound'][i]): res.append(True)
            else: res.append(False)
        if (False in res): r.append(True)
        else: r.append(False)
    return r

df['Problem'] = Problem(df)

This will work but it is long, hard and inefficient way. 这将起作用，但是它是漫长，艰苦和低效的方式。 I know there is df.apply which can do this for me but I don't understand how to convert my specific function into that. 我知道有df.apply可以为我做到这一点，但我不知道如何将我的特定函数转换为df.apply。 Can someone help ? 有人可以帮忙吗？ Thanks 谢谢

Answer 1

You can do this more succinctly using apply and calling between to test if each row's values are within range, invert the result using ~ and calling any to test if there are any positive values: 您可以使用apply和调用between更简洁地执行此操作between以测试每行的值是否在范围内，使用~反转结果，然后调用any以测试是否有正值：

In [24]:
df['Problem'] = df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)
df

Out[24]:
   ID  8-Jan  15-Jan  22-Jan  29-Jan  5-Feb  12-Feb  LowerBound  UpperBound  \
0   1    618     720     645     573    503     447     401.329     662.670   
1   2     62      80      67      94     81      65      45.710     126.289   
2   3     32      10      23      26     26      31      12.314      58.114   
3   4     22      13       1      28     19      25      16.438      41.418   
4   5      9       7       9       6      8       4       1.182      20.102   

  Problem  
0    True  
1   False  
2    True  
3    True  
4   False

We can see the individual steps here: 我们可以在此处查看各个步骤：

In [25]:
df.apply(lambda x: x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)

Out[25]:
  8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb
0  True  False   True   True  True   True
1  True   True   True   True  True   True
2  True  False   True   True  True   True
3  True  False  False   True  True   True
4  True   True   True   True  True   True

invert the mask using ~ : 使用~反转掩码：

In [26]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)

Out[26]:
   8-Jan 15-Jan 22-Jan 29-Jan  5-Feb 12-Feb
0  False   True  False  False  False  False
1  False  False  False  False  False  False
2  False   True  False  False  False  False
3  False   True   True  False  False  False
4  False  False  False  False  False  False

now test if any row values are positive using any : 现在使用any测试任何行值是否为正：

In [27]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)

Out[27]:
0     True
1    False
2     True
3     True
4    False
dtype: bool

使用DataFrame.apply在Pandas中使用特定列创建新列

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-04-13 09:37:24

使用DataFrame.apply在Pandas中使用特定列创建新列

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-04-13 09:37:24

解决方案1
2 已采纳 2017-04-13 09:37:24