[英]Calculating new column value in dataframe based on next row's column value
My lack of experience in working with python in the past year has made me rather rusty, and I'm getting back into coding again.在过去的一年中,我缺乏使用 python 的经验,这让我相当生疏,我又要重新开始编码了。
I have a dataframe of events that has a 'start_time' column.我有一个包含“start_time”列的事件的 dataframe。 What I need to do is create an 'end_time' column that has a time value that is 1 second less than the next row's start_time.
我需要做的是创建一个“end_time”列,其时间值比下一行的 start_time 小 1 秒。 This is an ask for doing event time calculations.
这是进行事件时间计算的要求。
The desired output:所需的 output:
start_time end_time
0 00:00:00 07:59:59
1 08:00:00 08:20:04
2 08:20:05 08:29:19
3 08:29:20 08:29:20
4 08:29:21 08:35:14
5 08:35:15 08:55:21
6 08:55:22 08:57:20
7 08:57:21 09:02:23
8 09:02:24 09:14:07
9 09:14:08 09:15:03
I currently have code that will accomplish this, but from anything I've read here, and from what I remember, I really shouldn't be iterating through a dataframe in a for loop.我目前有可以完成此操作的代码,但是从我在这里读到的任何内容以及我所记得的,我真的不应该在 for 循环中迭代 dataframe。
for ndx, row in df.iterrows():
if ndx != df[atnp_df.columns[0]].count() - 1:
df.iloc[ndx, 9] = pd.to_datetime(df.iloc[ndx+1, 8]) - timedelta(seconds=1)
(Hey, it works, but it's slow ...) (嘿,它有效,但它很慢......)
How do I do this pythonically?我如何以python方式执行此操作? I know the solution should be something like this:
我知道解决方案应该是这样的:
df['end_time'] = pd.to_datetime(df['start_time']) - timedelta(seconds=1)
But, this subtracts 1 second from the start_time in the same row.但是,这会从同一行中的 start_time 中减去 1 秒。 I'm not quite sure how to access the next row's start time in this way.
我不太确定如何以这种方式访问下一行的开始时间。
Any and all help is greatly appreciated!非常感谢任何和所有帮助!
offsets
df.assign(end_time=pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1)))
start_time end_time
0 00:00:00 0 days 07:59:59
1 08:00:00 0 days 08:20:04
2 08:20:05 0 days 08:29:19
3 08:29:20 0 days 08:29:20
4 08:29:21 0 days 08:35:14
5 08:35:15 0 days 08:55:21
6 08:55:22 0 days 08:57:20
7 08:57:21 0 days 09:02:23
8 09:02:24 0 days 09:14:07
9 09:14:08 NaT
A little cleaned up and returning formatted strings:稍微清理一下并返回格式化的字符串:
s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))
df.assign(end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str))
start_time end_time
0 00:00:00 07:59:59
1 08:00:00 08:20:04
2 08:20:05 08:29:19
3 08:29:20 08:29:20
4 08:29:21 08:35:14
5 08:35:15 08:55:21
6 08:55:22 08:57:20
7 08:57:21 09:02:23
8 09:02:24 09:14:07
9 09:14:08 NaT
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.