简体   繁体   English

使用 if 条件 pandas dataframe 更新单元格值

[英]updating cell values with if conditions pandas dataframe

I ran into some issues where I used a for-loop and if conditions to update a dataframe.我遇到了一些问题,其中我使用了 for 循环和 if 条件来更新数据框。 They should be very basic python logic but I couldn't find explanations online so I'd like to ask here.它们应该是非常基本的 python 逻辑,但我在网上找不到解释,所以我想在这里问一下。

For illustration purposes, let's look at a simple dataframe df:出于说明目的,让我们看一个简单的数据框 df:

   1  2
0  1  0
1  0  1
2  1  0
3  0  0
4  1  1

I wanted a third column based on values of the first two columns:我想要基于前两列值的第三列:

Initially I wrote:最初我写道:

for i in range(len(df)):
    if df.loc[i,'1']==1 & df.loc[i,'2']==0:
        df.loc[i,'3']=1
    else:
        df.loc[i,'3']=0

But I got this:但我得到了这个:

   1  2    3
0  1  0  0.0
1  0  1  0.0
2  1  0  0.0
3  0  0  1.0
4  1  1  0.0

Then I found that when I added brackets to my conditions it worked: So instead of if df.loc[i,'1']==1 & df.loc[i,'2']==0: I used if (df.loc[i,'1']==1) & (df.loc[i,'2']==0):然后我发现当我在我的条件中添加方括号时它起作用了:所以我使用了if df.loc[i,'1']==1 & df.loc[i,'2']==0: if (df.loc[i,'1']==1) & (df.loc[i,'2']==0):

So why is this the case?那么为什么会这样呢?

Besides, I was testing whether I would always need the bracket even when I only have one condition:此外,我正在测试即使只有一种情况我是否总是需要支架:

for i in range(len(df)):
    if df.loc[1,'2']==1:
        df.loc[1,'4']=0
    else:
        df.loc[1,'4']=1
   

Another problem occurred where I have missing values and only the cell df.loc[1,'4'] was updated:另一个问题发生在我缺少值并且只更新单元格 df.loc[1,'4'] 的地方:

    1   2   3   4
0   1   0   1.0 NaN
1   0   1   0.0 0.0
2   1   0   1.0 NaN
3   0   0   0.0 NaN
4   1   1   0.0 NaN
     
 

I'm really baffled and this time adding the bracket doesn't change anything.我真的很困惑,这次添加括号并没有改变任何东西。 Why is it like this?为什么会这样?

In addition to these two problems, is my method of updating cell values wrong generally speaking?除了这两个问题,我更新单元格值的方法一般来说是错误的吗?

Vectorized solution is convert chained mask by & for bitwise AND to integers for mapping True, False to 1,0 :矢量化解决方案是将按位AND&链式掩码转换为整数,以将True, False映射到1,0

df['3'] = ((df['1'] == 1) & (df['2'] == 0)).astype(int)

Your solution working with scalars, so use and instead & working with arrays (not recommended):您的解决方案使用标量,因此使用and而不是&使用数组(不推荐):

for i in range(len(df)):
    if df.loc[i,'1']==1 and df.loc[i,'2']==0:
        df.loc[i,'3']=1
    else:
        df.loc[i,'3']=0


print (df)
   1  2    3
0  1  0  1.0
1  0  1  0.0
2  1  0  1.0
3  0  0  0.0
4  1  1  0.0

Don't use a loop, this is an anti-pattern in pandas, use:不要使用循环,这是 pandas 中的反模式,使用:

df['3'] = (df['1'].eq(1) & df['2'].eq(0)).astype(int)

df['4'] = df['2'].ne(1).astype(int)
# or, if only 0/1
# df['4'] = 1 - df['2']

Also, using eq in place of == avoids to need to wrap the equality with parentheses to respect operator precedence .此外,使用eq代替==避免需要用圆括号括起相等性以尊重运算符优先级

Output:输出:

   1  2  3  4
0  1  0  1  1
1  0  1  0  0
2  1  0  1  1
3  0  0  0  1
4  1  1  0  0

You better use np.where :你最好使用np.where

 import numpy as np
 df['3'] = np.where (df['1']==1 & df['2']==0, 1, 0)

if column 1 is equal to 1 and column 2 is equal to 0 then put value 1 in column 3.如果第 1 列等于 1 且第 2 列等于 0,则将值 1 放入第 3 列。

df.loc[(df["1"] == 1)&(df["2"] == 0), "3"] = 1 

if column 1 is not equal to 1 or column 2 is not equal to 0 then put value 0 in column 3.如果第 1 列不等于 1 或第 2 列不等于 0,则将值 0 放入第 3 列。

df.loc[(df["1"] != 1)|(df["2"] != 0), "3"] = 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM