简体   繁体   中英

Replace notnull values in pandas dataframe with values from a list / how to get the index of a notnull value / implementation of boolean indexing

I apologise for a rather clumsy title, I just got from my specific problem with the more generic, which I see is the core of the problem. Simply said, I have a dataframe and a list with the same length as the number of columns or rows. I would like to replace the notnull values in the dataframe with the corresponding values from the list.

Here is the example: input dataframe:

          a         b         c         d
a  0.547825       NaN       NaN  0.950158
b       NaN  0.663845  0.529115       NaN
c       NaN       NaN       NaN  0.685002
d       NaN  0.791249  0.574452  0.734804

input list: [1, 2, 3, 4]

desired output:

          a         b         c         d
a         1       NaN        NaN        4
b       NaN         2         3        NaN
c       NaN       NaN       NaN         4
d       NaN         2         3         4

This is currently my code:

frame = pd.DataFrame(np.random.rand(4,4),index=['a','b','c','d'], columns=['a','b','c','d'])
frame = np.asarray(frame)
frame[frame<0.5] = np.nan
frame = pd.DataFrame(frame,index=['a','b','c','d'], columns=['a','b','c','d'])

result = np.zeros((4,4))
result = pd.DataFrame(result, index=['A','B','C','D'], columns=['A','B','C','D'])
Somenums = [1,2,3,4]

for i, col in enumerate(frame.columns.values):
    print frame[col]
    print np.isfinite(frame[col])
    mask = frame.ix[np.isfinite(frame[col]),col]
    print mask
    print Somenums[mask]
    result.iloc[:,i] = Somenums[mask]
print result

But I receive:

TypeError                                 Traceback (most recent call last)
<ipython-input-34-c95f4f5ee05b> in <module>()
     24     mask = frame.ix[np.isfinite(frame[col]),col]
     25     print mask
---> 26     print Somenums[mask]
     27     result.iloc[:,i] = Somenums[mask]
     28 print result

TypeError: list indices must be integers, not Series

How can I index it properly/apply the mask correctly?

It seems the error occurs since 'mask' is a data series instead of index or boolean. A way I can think of is, instead of the for loop, do:

idx = frame.notnull()
result = idx * Somenums
result[~idx] = None

If you don't mind having zeros to replace nans in the output, you can do:

result = frame.notnull() * Somenums

You can use mask , from list is necessary create Series with index same as column names of df :

Somenums = [1, 2, 3, 4]

df = df.mask(df.notnull(), pd.Series(Somenums, index=df.columns), axis=1)
print (df)
     a    b    c    d
a  1.0  NaN  NaN  4.0
b  NaN  2.0  3.0  NaN
c  NaN  NaN  NaN  4.0
d  NaN  2.0  3.0  4.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM