[英]Using conditional if/else logic with pandas dataframe columns
My dataframe called pw2
looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. 我的数据
pw2
看起来像这样,其中有两列pw1和pw2,这是获胜的概率。 I'd like to perform some conditional logic to create another column called WINNER
based off pw1
and pw2
. 我想执行一些条件逻辑,根据
pw1
和pw2
创建另一个名为WINNER
的列。
+-------------------------+-------------+-----------+-------------+
| Name1 | pw1 | Name2 | pw2 |
+-------------------------+-------------+-----------+-------------+
| Seaking | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn | 0.172510623 | Quagsire | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy | 0.28681284 | NaN | NaN |
+-------------------------+-------------+-----------+-------------+
I want to do this conditionally in a function but I'm having some trouble. 我想有条件地在函数中执行此操作,但是遇到了一些麻烦。
pw1
> pw2
, populate with Name1
pw1
> pw2
,填充Name1
pw2
> pw1
, populate with Name2
pw2
> pw1
,则填充Name2
pw1
is populated but pw2
isn't, populate with Name1
pw1
填充,但pw2
不,填充Name1
pw2
is populated but pw1
isn't, populate with Name2
pw2
填充pw1
则使用Name2
填充 But my function isn't working - for some reason checking if a value is null isn't working. 但是我的函数无法正常工作-由于某种原因,检查值是否为null无效。
def final_winner(df):
# If PW1 is missing and PW2 is populated, Pokemon 1 wins
if df['pw1'] = None and df['pw2'] != None:
return df['Number1']
# If it's the same thing but the other way around, Pokemon 2 wins
elif df['pw2'] = None and df['pw1'] != None:
return df['Number2']
# If pw2 is greater than pw1, then Pokemon 2 wins
elif df['pw2'] > df['pw1']:
return df['Number2']
else
return df['Number1']
pw2['Winner'] = pw2.apply(final_winner, axis=1)
Do not use apply
, which is very slow. 不要使用
apply
,这非常慢。 Use np.where
使用
np.where
pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)
Once NaN
s always lose, can just fillna()
it with -np.inf
to yield same logic. 一旦
NaN
总是丢失,可以用-np.inf
进行fillna()
以产生相同的逻辑。
Looking at your code, we can point out several problems. 查看您的代码,我们可以指出几个问题。 First, you are comparing
df['pw1'] = None
, which is invalid python syntax for comparison. 首先,您正在比较
df['pw1'] = None
,这是用于比较的无效python语法。 You usually want to compare things using ==
operator. 您通常希望使用
==
运算符进行比较。 However, for None
, it is recommended to use is
, such as if variable is None: (...)
. 但是,对于
None
,建议使用is
,例如, if variable is None: (...)
。 However again, you are in a pandas/numpy
environment, where there actually several values for null values ( None
, NaN
, NaT
, etc). 但是,同样,您处于
pandas/numpy
环境中,其中实际上有多个空值( None
, NaN
, NaT
等)。
So, it is preferable to check for nullability using pd.isnull()
or df.isnull()
. 因此,最好使用
pd.isnull()
或df.isnull()
检查可为空性。
Just to illustrate, this is how your code should look like: 只是为了说明,这就是您的代码应如下所示:
def final_winner(df):
if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
return df['Name1']
elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
return df['Name1']
elif df['pw2'] > df['pw1']:
return df['Name2']
else:
return df['Name1']
df['winner'] = df.apply(final_winner, axis=1)
But again, definitely use np.where
. 但是同样,绝对要使用
np.where
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.