简体   繁体   English

Pandas 将时间序列 dataframe 展平到相同的活动但不同的时间戳

[英]Pandas flatten a time-series dataframe on same activity but different timestamps

I'm looking to flatten certain processes.我希望扁平化某些流程。 Basically looking at duplicates that are right after each other.基本上是查看彼此紧随其后的重复项。 Let's say I have a dataframe:假设我有一个 dataframe:

d = {'time': [12-08-2020, 13-08-2020, 14-08-2020, 15-08-2020, 16-08-2020], 'state': [off, on, on, on, off]}
df = pd.DataFrame(data=d)

Then I would use time.shift() to create the "time_end" column.然后我会使用time.shift()来创建“time_end”列。 Basically the next rows time.基本上是下排时间。 result:结果:

         time state    time_end
0  12-08-2020   off  13-08-2020
1  13-08-2020    on  14-08-2020
2  14-08-2020    on  15-08-2020
3  15-08-2020    on  16-08-2020
4  16-08-2020   off         NaN

My question is now, how do I flatten it so that it becomes in actuality 3 lines like this:我现在的问题是,如何将它展平,使其实际上变成这样的 3 行:

         time state    time_end
0  12-08-2020   off  13-08-2020
1  13-08-2020    on  16-08-2020
4  16-08-2020   off         NaN

For my code I dont need repeat on's if they are followed by another on.对于我的代码,如果它们后面跟着另一个,我不需要重复。 Any help would be appreciated.任何帮助,将不胜感激。

We can get the grouping of consecutive same state by .shift() + .ne() + .cumsum() .我们可以通过.shift() + .ne() + .cumsum()得到连续相同的state的分组。

Then, for each group (of consecutive same state ), we get the first entry of time and last entry of time_end using .groupby() + .agg() , as follows:然后,对于每个组(连续相同的state ),我们使用.groupby() + .agg()获得time的第一个条目和time_end的最后一个条目,如下所示:

df['state_group'] = df['state'].ne(df['state'].shift()).cumsum()

df_out = df.groupby('state_group').agg({'time': 'first', 'state': 'first', 'time_end': 'last'}).reset_index(drop=True)

Result:结果:

print(df_out)

         time state    time_end
0  12-08-2020   off  13-08-2020
1  13-08-2020    on  16-08-2020
2  16-08-2020   off        None

Just for information, the following interim dataframe is created with the grouping of consecutive same state after the first line of codes above.仅供参考,以下临时 dataframe 是在上述第一行代码之后对连续相同的state进行分组创建的。 We based on this grouping to aggregate the desired flattened result.我们基于此分组来聚合所需的扁平化结果。

         time state    time_end  state_group
0  12-08-2020   off  13-08-2020            1
1  13-08-2020    on  14-08-2020            2
2  14-08-2020    on  15-08-2020            2
3  15-08-2020    on  16-08-2020            2
4  16-08-2020   off         NaN            3

We can filter the DataFrame based on where the current row's state value does not equal the next row's state value, then create the time_end column by shifting back the filtered time column:我们可以根据当前行的state值不等于下一行的state值来过滤 DataFrame,然后通过向后移回过滤后的time列来创建time_end列:

import pandas as pd

df = pd.DataFrame(data={
    'time': ['12-08-2020', '13-08-2020', '14-08-2020', '15-08-2020',
             '16-08-2020'],
    'state': ['off', 'on', 'on', 'on', 'off']
})

new_df = df[df['state'].ne(df['state'].shift())].reset_index(drop=True)
new_df['time_end'] = new_df['time'].shift(-1)

new_df : new_df :

         time state    time_end
0  12-08-2020   off  13-08-2020
1  13-08-2020    on  16-08-2020
2  16-08-2020   off         NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM