简体   繁体   中英

How to calculate the average time and fill the nan in another column in python?

Input Data:

from scene time      departure time           time
12/12/2017 20:01    12/12/2017 20:20           Nan
12/12/2017 22:09    12/12/2017 22:09           Nan
12/12/2017 23:00    12/12/2017 23:30           Nan
12/12/2017 22:37    12/12/2017 22:37           Nan
12/13/17 18:25      12/13/17   20:20        12/13/17 20:20
  • Here, need fill nan values of time column using the Average of the first two columns ie, from scene time and departure time

  • Mentioned expected output below and I have entered values in place of nan generally(as an example).

Expected output:

from scene time         departure time        handover time
    12/12/2017 20:01    12/12/2017 20:20     12/12/2017 20:19
    12/12/2017 22:09    12/12/2017 22:30     12/12/2017 22:16
    12/12/2017 23:00    12/12/2017 23:30     12/12/2017 23:22
    12/12/2017 22:37    12/12/2017 22:37           Nan
    12/13/17 18:25      12/13/17 20:20      12/13/17 20:20

Use:

#first convert columns to datetimes
cols = ['from scene time','departure time', 'time']
df[cols] = df[cols].apply(pd.to_datetime)

#create means by convert to numpy array
a = df[['from scene time','departure time']].to_numpy().astype(np.int64).mean(axis=1)
avg = pd.Series(pd.to_datetime(a), index=df.index)

#replace only missing values
df['time'] = df['time'].fillna(avg)
print (df)
      from scene time      departure time                time
0 2017-12-12 20:01:00 2017-12-12 20:20:00 2017-12-12 20:10:30
1 2017-12-12 22:09:00 2017-12-12 22:09:00 2017-12-12 22:09:00
2 2017-12-12 23:00:00 2017-12-12 23:30:00 2017-12-12 23:15:00
3 2017-12-12 22:37:00 2017-12-12 22:37:00 2017-12-12 22:37:00
4 2017-12-13 18:25:00 2017-12-13 20:20:00 2017-12-13 20:20:00

If only few rows with missing values for improve performance apply solution only for rows with missing values in time column:

cols = ['from scene time','departure time', 'time']
df[cols] = df[cols].apply(pd.to_datetime)

mask = df['time'].isna()

a=df.loc[mask,['from scene time','departure time']].to_numpy().astype(np.int64).mean(axis=1)

df.loc[mask, 'time'] = pd.to_datetime(a)
print (df)
      from scene time      departure time                time
0 2017-12-12 20:01:00 2017-12-12 20:20:00 2017-12-12 20:10:30
1 2017-12-12 22:09:00 2017-12-12 22:09:00 2017-12-12 22:09:00
2 2017-12-12 23:00:00 2017-12-12 23:30:00 2017-12-12 23:15:00
3 2017-12-12 22:37:00 2017-12-12 22:37:00 2017-12-12 22:37:00
4 2017-12-13 18:25:00 2017-12-13 20:20:00 2017-12-13 20:20:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM