简体   繁体   中英

Fill missing values in a data frame with the nearest row

I have the following data frame:

df = pd.DataFrame({'id': {3002: 10001,
  3003: 10002,
  3004: 10003,
  3005: 10004,
  3006: 10005,
  3007: 10006,
  3008: 10007,
  3009: 10008,
  3010: 10009,
  3011: 10010,
  3012: 10011,
  3013: 10012,
  3014: 10013,
  3015: 10014,
  3016: 10015,
  3017: 10016,
  3018: 10017,
  3019: 10018,
  3020: 10019,
  3021: 10020},
 'value': {3002: 1669.0,
  3003: 1264.0,
  3004: nan,
  3005: 1411.0,
  3006: 1224.0,
  3007: 1316.0,
  3008: 1736.0,
  3009: nan,
  3010: 1276.0,
  3011: nan,
  3012: nan,
  3013: nan,
  3014: nan,
  3015: 1790.0,
  3016: nan,
  3017: nan,
  3018: nan,
  3019: 1726.0,
  3020: nan,
  3021: nan}})

And I want to fill the missing values with the one in the nearest id, in case of two values at the same distance then I want to use the average.

EG

id 10008 is NaN, then I want to fill the cell with the average of 10009 and 10007: (1736.0 + 1276.0)/2

for id 10015 the nearest value is at 10014 so I'll use that value directly: 1790.0

在此处输入图片说明

How can I accomplish this efficiently?

df.value = df.value.interpolate(method='nearest')

This is a bit tricky, but you can use interpolate() (can only be used on Series):

df['value'] = df['value'].interpolate(method='slinear').interpolate(method='linear')

The second interpolation is only needed to fill the last NaNs in the series.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM