[英]Pandas return value from multiple columns if equal to value in another column
I have a Pandas dataframe like this:我有一个像这样的 Pandas 数据框:
A B C D
0 month month+1 quarter+1 season+1
1 season month+5 quarter+3 season+2
2 day month+1 quarter+2 season+1
3 year month+3 quarter+4 season+2
4 quarter month+2 quarter+1 season+1
5 month month+4 quarter+1 season+2
I would like to insert a new column called 'E' based on several IF conditions.我想根据几个 IF 条件插入一个名为“E”的新列。 If column 'A' equals 'month' then return values in 'B', if column 'A' equals 'quarter' then return values in 'C', if column 'A' equals 'season' then return values in 'D', and if not then return values in column 'A'
如果“A”列等于“月”,则返回“B”中的值,如果“A”列等于“季度”,则返回“C”中的值,如果“A”列等于“季节”,则返回“D”中的值,如果不是,则返回“A”列中的值
A B C D E
0 month month+1 quarter+1 season+1 month+1
1 season month+5 quarter+3 season+2 season+2
2 day month+1 quarter+2 season+1 day
3 year month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month month+4 quarter+1 season+2 month+4
I am having trouble doing this.我在做这件事时遇到了麻烦。 I have tried playing around with a function but it did not work.
我试过玩弄一个函数,但它没有用。 See my attempt:
看我的尝试:
def f(row):
if row['A'] == 'month':
val = ['B']
elif row['A'] == 'quarter':
val = ['C']
elif row['A'] == 'season':
val = ['D']
else:
val = ['A']
return val
df['E'] = df.apply(f, axis=1)
EDITED: to change the last else
to column 'A'编辑:将最后一个
else
更改为“A”列
Frist, I recommend you see: When should I want to use apply() in my code.
首先,我建议你看看:
When should I want to use apply() in my code.
I would use Series.replace
我会使用
Series.replace
df['E'] = df['A'].replace(['month','quarter','season'],
[df['B'], df['C'], df['D']])
cond = [df['A'].eq('month'), df['A'].eq('quarter'), df['A'].eq('season')]
values= [df['B'], df['C'], df['D']]
df['E']=np.select(cond,values,default=df['A'])
A B C D E
0 month month+1 quarter+1 season+1 month+1
1 season month+5 quarter+3 season+2 season+2
2 day month+1 quarter+2 season+1 day
3 year month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month month+4 quarter+1 season+2 month+4
Just use np.select
只需使用
np.select
c1 = df['A'] == 'month'
c2 = df['A'] == 'quarter'
c3 = df['A'] == 'season'
df['E'] = np.select([c1, c2, c3], [df['B'], df['C'], df['D']], df['A'])
Out[271]:
A B C D E
0 month month+1 quarter+1 season+1 month+1
1 season month+5 quarter+3 season+2 season+2
2 day month+1 quarter+2 season+1 day
3 year month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month month+4 quarter+1 season+2 month+4
You probably need to fix your code like this:您可能需要像这样修复您的代码:
def f(row):
if row['A'] == 'month':
val = row['B']
elif row['A'] == 'quarter':
val = row['C']
elif row['A'] == 'season':
val = row['D']
else:
val = row['D']
return val
df['E'] = df.apply(f, axis=1)
note: you forgot to include row
注意:你忘了包括
row
val = ['B'] # before
val = row['B'] # after
Edit: This is just to point out the problem in the code, for better approaches check out the other answers related to the usage of numpy.select编辑:这只是为了指出代码中的问题,为了更好的方法,请查看与使用numpy.select相关的其他答案
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.