简体   繁体   English

根据下一行的列值计算 dataframe 中的新列值

[英]Calculating new column value in dataframe based on next row's column value

My lack of experience in working with python in the past year has made me rather rusty, and I'm getting back into coding again.在过去的一年中,我缺乏使用 python 的经验,这让我相当生疏,我又要重新开始编码了。

I have a dataframe of events that has a 'start_time' column.我有一个包含“start_time”列的事件的 dataframe。 What I need to do is create an 'end_time' column that has a time value that is 1 second less than the next row's start_time.我需要做的是创建一个“end_time”列,其时间值比下一行的 start_time 小 1 秒。 This is an ask for doing event time calculations.这是进行事件时间计算的要求。

The desired output:所需的 output:

start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08  09:15:03

I currently have code that will accomplish this, but from anything I've read here, and from what I remember, I really shouldn't be iterating through a dataframe in a for loop.我目前有可以完成此操作的代码,但是从我在这里读到的任何内容以及我所记得的,我真的不应该在 for 循环中迭代 dataframe。

for ndx, row in df.iterrows():
    if ndx != df[atnp_df.columns[0]].count() - 1:
        df.iloc[ndx, 9] = pd.to_datetime(df.iloc[ndx+1, 8]) - timedelta(seconds=1)

(Hey, it works, but it's slow ...) (嘿,它有效,但它很......)

How do I do this pythonically?我如何以python方式执行此操作? I know the solution should be something like this:我知道解决方案应该是这样的:

df['end_time'] = pd.to_datetime(df['start_time']) - timedelta(seconds=1)

But, this subtracts 1 second from the start_time in the same row.但是,这会从同一行中的 start_time 中减去 1 秒。 I'm not quite sure how to access the next row's start time in this way.我不太确定如何以这种方式访问下一行的开始时间。

Any and all help is greatly appreciated!非常感谢任何和所有帮助!

offsets

df.assign(end_time=pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1)))

  start_time        end_time
0   00:00:00 0 days 07:59:59
1   08:00:00 0 days 08:20:04
2   08:20:05 0 days 08:29:19
3   08:29:20 0 days 08:29:20
4   08:29:21 0 days 08:35:14
5   08:35:15 0 days 08:55:21
6   08:55:22 0 days 08:57:20
7   08:57:21 0 days 09:02:23
8   09:02:24 0 days 09:14:07
9   09:14:08             NaT

A little cleaned up and returning formatted strings:稍微清理一下并返回格式化的字符串:

s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))

df.assign(end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str))

  start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08       NaT
​

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas Dataframe 根据同一列中的前一行值计算新行值 - Python Pandas Dataframe calculating new row value based on previous row value within same column 迭代 dataframe 并根据一列的值在具有前一行值的新列中执行操作 - iterrate over dataframe and based on the value of one column do operations in a new column with previous row's value 根据其他列的值将新列添加到数据框 - Adding new Column(s) to a dataframe based on value from other column 大熊猫-根据“下一个”行值创建新列 - pandas - create new column based off of 'next' row value 根据行中的第一个值向数据框添加新列 - Add a new column to a dataframe based on first value in row 在 Pandas dataframe 中,如何根据每一行的值在 append 中创建一个新的 True / False 列? - In Pandas dataframe, how to append a new column of True / False based on each row's value? 划分下一行的值并在数据框中创建列 - divide value of next row and create column in dataframe 将新的列pandas数据帧连续的下一个值 - Consecutive next value into new column pandas dataframe Pandas Dataframe基于前一行,将值添加到新列,但该列的最大值限于该列 - Pandas Dataframe Add a value to a new Column based on the previous row limited to the maximum value in that column 如何根据上面的行的值添加新列 - how to add new column based on the above row's value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM