简体   繁体   中英

updating cell values with if conditions pandas dataframe

I ran into some issues where I used a for-loop and if conditions to update a dataframe. They should be very basic python logic but I couldn't find explanations online so I'd like to ask here.

For illustration purposes, let's look at a simple dataframe df:

   1  2
0  1  0
1  0  1
2  1  0
3  0  0
4  1  1

I wanted a third column based on values of the first two columns:

Initially I wrote:

for i in range(len(df)):
    if df.loc[i,'1']==1 & df.loc[i,'2']==0:
        df.loc[i,'3']=1
    else:
        df.loc[i,'3']=0

But I got this:

   1  2    3
0  1  0  0.0
1  0  1  0.0
2  1  0  0.0
3  0  0  1.0
4  1  1  0.0

Then I found that when I added brackets to my conditions it worked: So instead of if df.loc[i,'1']==1 & df.loc[i,'2']==0: I used if (df.loc[i,'1']==1) & (df.loc[i,'2']==0):

So why is this the case?

Besides, I was testing whether I would always need the bracket even when I only have one condition:

for i in range(len(df)):
    if df.loc[1,'2']==1:
        df.loc[1,'4']=0
    else:
        df.loc[1,'4']=1
   

Another problem occurred where I have missing values and only the cell df.loc[1,'4'] was updated:

    1   2   3   4
0   1   0   1.0 NaN
1   0   1   0.0 0.0
2   1   0   1.0 NaN
3   0   0   0.0 NaN
4   1   1   0.0 NaN
     
 

I'm really baffled and this time adding the bracket doesn't change anything. Why is it like this?

In addition to these two problems, is my method of updating cell values wrong generally speaking?

Vectorized solution is convert chained mask by & for bitwise AND to integers for mapping True, False to 1,0 :

df['3'] = ((df['1'] == 1) & (df['2'] == 0)).astype(int)

Your solution working with scalars, so use and instead & working with arrays (not recommended):

for i in range(len(df)):
    if df.loc[i,'1']==1 and df.loc[i,'2']==0:
        df.loc[i,'3']=1
    else:
        df.loc[i,'3']=0


print (df)
   1  2    3
0  1  0  1.0
1  0  1  0.0
2  1  0  1.0
3  0  0  0.0
4  1  1  0.0

Don't use a loop, this is an anti-pattern in pandas, use:

df['3'] = (df['1'].eq(1) & df['2'].eq(0)).astype(int)

df['4'] = df['2'].ne(1).astype(int)
# or, if only 0/1
# df['4'] = 1 - df['2']

Also, using eq in place of == avoids to need to wrap the equality with parentheses to respect operator precedence .

Output:

   1  2  3  4
0  1  0  1  1
1  0  1  0  0
2  1  0  1  1
3  0  0  0  1
4  1  1  0  0

You better use np.where :

 import numpy as np
 df['3'] = np.where (df['1']==1 & df['2']==0, 1, 0)

if column 1 is equal to 1 and column 2 is equal to 0 then put value 1 in column 3.

df.loc[(df["1"] == 1)&(df["2"] == 0), "3"] = 1 

if column 1 is not equal to 1 or column 2 is not equal to 0 then put value 0 in column 3.

df.loc[(df["1"] != 1)|(df["2"] != 0), "3"] = 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM