简体   繁体   中英

Calculating new column value in dataframe based on next row's column value

My lack of experience in working with python in the past year has made me rather rusty, and I'm getting back into coding again.

I have a dataframe of events that has a 'start_time' column. What I need to do is create an 'end_time' column that has a time value that is 1 second less than the next row's start_time. This is an ask for doing event time calculations.

The desired output:

start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08  09:15:03

I currently have code that will accomplish this, but from anything I've read here, and from what I remember, I really shouldn't be iterating through a dataframe in a for loop.

for ndx, row in df.iterrows():
    if ndx != df[atnp_df.columns[0]].count() - 1:
        df.iloc[ndx, 9] = pd.to_datetime(df.iloc[ndx+1, 8]) - timedelta(seconds=1)

(Hey, it works, but it's slow ...)

How do I do this pythonically? I know the solution should be something like this:

df['end_time'] = pd.to_datetime(df['start_time']) - timedelta(seconds=1)

But, this subtracts 1 second from the start_time in the same row. I'm not quite sure how to access the next row's start time in this way.

Any and all help is greatly appreciated!

offsets

df.assign(end_time=pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1)))

  start_time        end_time
0   00:00:00 0 days 07:59:59
1   08:00:00 0 days 08:20:04
2   08:20:05 0 days 08:29:19
3   08:29:20 0 days 08:29:20
4   08:29:21 0 days 08:35:14
5   08:35:15 0 days 08:55:21
6   08:55:22 0 days 08:57:20
7   08:57:21 0 days 09:02:23
8   09:02:24 0 days 09:14:07
9   09:14:08             NaT

A little cleaned up and returning formatted strings:

s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))

df.assign(end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str))

  start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08       NaT
​

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM