[英]Create a new column using specific columns in Pandas using DataFrame.apply
I have a data frame like this 我有一个这样的数据框
ID 8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb LowerBound UpperBound Problem
001 618 720 645 573 503 447 401.329 662.670 False
002 62 80 67 94 81 65 45.710 126.289 False
003 32 10 23 26 26 31 12.314 58.114 True
004 22 13 1 28 19 25 16.438 41.418 True
005 9 7 9 6 8 4 1.182 20.102 False
I want to create a new column which would be a Boolean column such that I want to iterate through all the weeks for each ID and if any value lies outside the upper and lower bound column I set it equal to True else False. 我想创建一个新的列,该列将是一个布尔列,这样我想遍历每个ID的所有星期,并且如果任何值位于上下限列之外,则将其设置为True,否则设置为False。 The upper and lower bound values in this case are dummy so the data will not return these values. 在这种情况下,上下限值是虚拟的,因此数据将不会返回这些值。 The resulting column should be like the Problem
column 结果列应类似于“ Problem
列
I know the hard way of doing this which is absolutely inefficient 我知道很难做到这一点,这绝对是低效的
import pandas as pd
def Problem(df):
r = []
for i in range(len(df)):
res = []
x = [df['Week1'][i], df['Week2'][i], df['Week3'][i], df['Week4'][i], df['Week5'][i]]
for j in range (len(x)):
if (df['LowerBound'][i] <= x[j] <= df['UpperBound'][i]): res.append(True)
else: res.append(False)
if (False in res): r.append(True)
else: r.append(False)
return r
df['Problem'] = Problem(df)
This will work but it is long, hard and inefficient way. 这将起作用,但是它是漫长,艰苦和低效的方式。 I know there is df.apply which can do this for me but I don't understand how to convert my specific function into that. 我知道有df.apply可以为我做到这一点,但我不知道如何将我的特定函数转换为df.apply。 Can someone help ? 有人可以帮忙吗? Thanks 谢谢
You can do this more succinctly using apply
and calling between
to test if each row's values are within range, invert the result using ~
and calling any
to test if there are any positive values: 您可以使用apply
和调用between
更简洁地执行此操作between
以测试每行的值是否在范围内,使用~
反转结果,然后调用any
以测试是否有正值:
In [24]:
df['Problem'] = df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)
df
Out[24]:
ID 8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb LowerBound UpperBound \
0 1 618 720 645 573 503 447 401.329 662.670
1 2 62 80 67 94 81 65 45.710 126.289
2 3 32 10 23 26 26 31 12.314 58.114
3 4 22 13 1 28 19 25 16.438 41.418
4 5 9 7 9 6 8 4 1.182 20.102
Problem
0 True
1 False
2 True
3 True
4 False
We can see the individual steps here: 我们可以在此处查看各个步骤:
In [25]:
df.apply(lambda x: x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)
Out[25]:
8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb
0 True False True True True True
1 True True True True True True
2 True False True True True True
3 True False False True True True
4 True True True True True True
invert the mask using ~
: 使用~
反转掩码:
In [26]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)
Out[26]:
8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb
0 False True False False False False
1 False False False False False False
2 False True False False False False
3 False True True False False False
4 False False False False False False
now test if any row values are positive using any
: 现在使用any
测试任何行值是否为正:
In [27]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)
Out[27]:
0 True
1 False
2 True
3 True
4 False
dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.