简体   繁体   中英

Pandas turn last N columns into NA based on another dataframe

I have the following dataframes:

df1 = pd.DataFrame(data={'col1': ['a', 'd', 'g', 'j'], 
                        'col2': ['b', 'c', 'i', np.nan], 
                        'col3': ['c', 'f', 'i', np.nan],
                        'col4': ['x', np.nan, np.nan, np.nan]},
                index=pd.Series(['ind1', 'ind2', 'ind3', 'ind4'], name='index'))
index col1 col2 col3 col4
ind1 a b c x
ind2 d c f NaN
ind3 g i i NaN
ind4 j NaN NaN NaN
df2 = pd.Series(data=[True, False, True, False],
                index=pd.Series(['ind1', 'ind2', 'ind3', 'ind4']))
ind1 True
ind2 False
ind3 True
ind4 False

How do I make the last 2 values for each row in df1 into NA, based on the boolean values of df2 ?

In this case, since ind1 and ind3 are True, it would impact the same indices in df1 .

index col1 col2 col3 col4
ind1 a b NaN NaN
ind2 d c f NaN
ind3 g i NaN NaN
ind4 j NaN NaN NaN

A possible solution, based on pandas.DataFrame.mask :

df1[['col3', 'col4']] = df1[['col3', 'col4']].mask(df2)

Output:

      col1 col2 col3 col4
index                    
ind1     a    b  NaN  NaN
ind2     d    c    f  NaN
ind3     g    i  NaN  NaN
ind4     j  NaN  NaN  NaN

You can use boolean indexing :

N = 2
df1.iloc[df2, -N:] = np.nan

NB. what you call df2 is actually a Series, s / ser might be more appropriate as a name.

output:

      col1 col2 col3 col4
index                    
ind1     a    b  NaN  NaN
ind2     d    c    f  NaN
ind3     g    i  NaN  NaN
ind4     j  NaN  NaN  NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM