简体   繁体   中英

Replace contents of cell with another cell if condition on a separate cell is met

I have to following data frame

A = [1,2,5,4,3,1]
B = ["yes","No","hello","yes","no", 'why']
C = [1,0,1,1,0,0]
D = ['y','n','y','y','n','n']
test_df = pd.DataFrame({'A': A, 'B': B, 'C': C, 'D':D})

we can see 4 columns A,B,C,D the intended outcome is to replace the contents of B with the contents of D, if a condition on C is met, for this example the condition is of C = 1

the intended output is

A = [1,2,5,4,3,1]
B = ["y","No","y","y","no", 'why']
C = [1,0,1,1,0,0]
D = ['y','n','y','y','n','n']
output_df = pd.DataFrame({'A': A, 'B': B, 'C': C, 'D':D})
output_df.drop('D', axis = 1)

What is the best way to apply this logic to a data frame?

There are many ways to solve, here is another one:

test_df['B'] = test_df['B'].mask(test_df['C'] == 1, test_df['D'])

Output:

   A    B  C  D
0  1    y  1  y
1  2   No  0  n
2  5    y  1  y
3  4    y  1  y
4  3   no  0  n
5  1  why  0  n

This can be done with np.where :

test_df['B'] = np.where(test_df['C']==1, test_df['D'], test_df['B'])

Output:

   A    B  C  D
0  1    y  1  y
1  2   No  0  n
2  5    y  1  y
3  4    y  1  y
4  3   no  0  n
5  1  why  0  n

The desired output is achieved using .loc with column 'C' as the mask.

test_df.loc[test_df['C']==1,'B'] = test_df.loc[test_df['C']==1,'D']

UPDATE: Just found out a similar answer is posted by @QuangHoang. This answer is slightly different in that it does not require numpy

I don't know if inverse is the right word here, but I noticed recently that mask and where are "inverses" of each other. If you pass a ~ to the condition of a .where statement, then you get the same result as mask :

A = [1,2,5,4,3,1]
B = ["yes","No","hello","yes","no", 'why']
C = [1,0,1,1,0,0]
D = ['y','n','y','y','n','n']
test_df = pd.DataFrame({'A': A, 'B': B, 'C': C, 'D':D})
test_df['B'] = test_df['B'].where(~(test_df['C'] == 1), test_df['D'])
# test_df['B'] = test_df['B'].mask(test_df['C'] == 1, test_df['D']) - Scott Boston's answer
test_df
Out[1]: 
   A    B  C  D
0  1    y  1  y
1  2   No  0  n
2  5    y  1  y
3  4    y  1  y
4  3   no  0  n
5  1  why  0  n

You can also use df.where :

test_df['B'] = test_df['D'].where(test_df.C.eq(1), test_df.B)

Output:

In [875]: test_df
Out[875]: 
   A    B  C  D
0  1    y  1  y
1  2   No  0  n
2  5    y  1  y
3  4    y  1  y
4  3   no  0  n
5  1  why  0  n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM