根据下一行的列值计算 dataframe 中的新列值

Question

My lack of experience in working with python in the past year has made me rather rusty, and I'm getting back into coding again.在过去的一年中，我缺乏使用 python 的经验，这让我相当生疏，我又要重新开始编码了。

I have a dataframe of events that has a 'start_time' column.我有一个包含“start_time”列的事件的 dataframe。 What I need to do is create an 'end_time' column that has a time value that is 1 second less than the next row's start_time.我需要做的是创建一个“end_time”列，其时间值比下一行的 start_time 小 1 秒。 This is an ask for doing event time calculations.这是进行事件时间计算的要求。

The desired output:所需的 output：

start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08  09:15:03

I currently have code that will accomplish this, but from anything I've read here, and from what I remember, I really shouldn't be iterating through a dataframe in a for loop.我目前有可以完成此操作的代码，但是从我在这里读到的任何内容以及我所记得的，我真的不应该在 for 循环中迭代 dataframe。

for ndx, row in df.iterrows():
    if ndx != df[atnp_df.columns[0]].count() - 1:
        df.iloc[ndx, 9] = pd.to_datetime(df.iloc[ndx+1, 8]) - timedelta(seconds=1)

(Hey, it works, but it's slow ...) （嘿，它有效，但它很慢......）

How do I do this pythonically?我如何以python方式执行此操作？ I know the solution should be something like this:我知道解决方案应该是这样的：

df['end_time'] = pd.to_datetime(df['start_time']) - timedelta(seconds=1)

But, this subtracts 1 second from the start_time in the same row.但是，这会从同一行中的 start_time 中减去 1 秒。 I'm not quite sure how to access the next row's start time in this way.我不太确定如何以这种方式访问下一行的开始时间。

Any and all help is greatly appreciated!非常感谢任何和所有帮助！

Answer 1

`offsets`

df.assign(end_time=pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1)))

  start_time        end_time
0   00:00:00 0 days 07:59:59
1   08:00:00 0 days 08:20:04
2   08:20:05 0 days 08:29:19
3   08:29:20 0 days 08:29:20
4   08:29:21 0 days 08:35:14
5   08:35:15 0 days 08:55:21
6   08:55:22 0 days 08:57:20
7   08:57:21 0 days 09:02:23
8   09:02:24 0 days 09:14:07
9   09:14:08             NaT

A little cleaned up and returning formatted strings:稍微清理一下并返回格式化的字符串：

s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))

df.assign(end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str))

  start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08       NaT

根据下一行的列值计算 dataframe 中的新列值

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-03-18 01:11:21

`offsets`

根据下一行的列值计算 dataframe 中的新列值

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-03-18 01:11:21

offsets

解决方案1
4 已采纳 2021-03-18 01:11:21

`offsets`