[英]Python dataframe assign new column using lambda function with 2 variables and if else statement
Setup the dataframe: 设置数据框:
import pandas as pd
import numpy as np
np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
'B' : np.random.choice(range(0, 2), rows, replace = True)})
df
A B
0 1 1
1 1 1
2 1 0
3 0 1
4 1 1
5 0 1
6 0 1
7 0 0
8 1 1
9 0 1
If would like to add a column 'C' with the value 'X' is df.A and df.B are both 0 and else value 'Y'. 如果要添加值为'X'的列'C'为df.A和df.B均为0,否则值为'Y'。
I tried: 我试过了:
df.assign(C = lambda row: 'X' if row.A + row.B == 0 else 'Y')
but that does not work... 但这不起作用...
I found other ways to get my results but would like to use .assign
with a lambda function in this situation. 我找到了其他方法来获得结果,但在这种情况下想将.assign
与lambda函数一起使用。
Any suggestions on how to get assign with lambda working? 关于如何使用lambda进行分配的任何建议?
lambda
不,不要使用lambda
You can do this vectorised: 您可以将其向量化:
import numpy as np
df['C'] = np.where(df['A'] + df['B'] == 0, 'X', 'Y')
The lambda
solution has no benefit here, but if you want it... lambda
解决方案在这里没有任何好处,但是如果您愿意的话...
df = df.assign(C=np.where(df.pipe(lambda x: x['A'] + x['B'] == 0), 'X', 'Y'))
The bad way to use assign
+ lambda
: 使用assign
+ lambda
的坏方法:
df = df.assign(C=df.apply(lambda x: 'X' if x.A + x.B == 0 else 'Y', axis=1))
What's wrong with the bad way is you are iterating rows in a Python-level loop. 坏方法的问题是您要在Python级循环中迭代行 。 It's often worse than a regular Python for
loop. 它通常比常规的Python for
循环更糟糕 。
The first two solutions perform vectorised operations on contiguous memory blocks, and are processed more efficiently as a result. 前两个解决方案在连续的存储块上执行矢量化操作 ,因此得到了更有效的处理。
快好了 ...
df['C'] = df.apply(lambda row: 'X' if row.A + row.B == 0 else 'Y', axis = 1)
进行一个简单的条件并将其应用于行:
df['C'] = df.apply(lambda row: 'X' if (row.A or row.B) else 'Y', axis = 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.