熊猫从时间序列列中获取日期范围

Question

I have a dataframe which looks something like this:我有一个看起来像这样的数据框：

id  ts  factor
A   2020-01-01  1
A   2020-01-02  1
A   2020-01-03  1
A   2020-01-04  1
A   2020-01-05  1
A   2020-01-06  10
A   2020-01-07  10
A   2020-01-08  10
A   2020-01-09  10
A   2020-01-10  10
A   2020-01-11  10
A   2020-01-12  10
A   2020-01-13  10
A   2020-01-14  10
A   2020-01-15  10
A   2020-01-16  10
A   2020-01-17  10
A   2020-01-18  1
A   2020-01-19  1
A   2020-01-20  1

my desire output is:我的愿望输出是：

id  start_ts    end_ts  factor
A   2020-01-01  2020-01-05  1
A   2020-01-06  2020-01-17  10
A   2020-01-18  2020-01-20  1

so far I can only think of groupby on factor and then do min and max operation, but that doesn't work for factor 1到目前为止，我只能在因子上考虑 groupby，然后进行最小和最大操作，但这不适用于因子 1

df.groupby(["factor"]).agg({'date' : [np.min, np.max]})

how can I achieve the output?我怎样才能实现输出？

Answer 1

Use cumsum on comparison with shift of factor to find the factor blocks, then add it to groupby :使用cumsum与factor移位进行比较以找到factor块，然后将其添加到groupby ：

blocks = df['factor'].ne(df['factor'].shift()).cumsum()
df.groupby(['id','factor',blocks], sort=False)['ts'].agg(['min','max'])

Output:输出：

                         min         max
id factor factor                        
A  1      1       2020-01-01  2020-01-05
   10     2       2020-01-06  2020-01-17
   1      3       2020-01-18  2020-01-20

Answer 2

slightly updated variant of @Quang Hoang with named grouping:带有命名分组的@Quang Hoang 的稍微更新的变体：

blocks = df['factor'].ne(df['factor'].shift()).cumsum()
blocks = blocks.rename("group")

df2 = df.groupby(['id', blocks,'factor']).agg(
    start_ts=('ts', 'min'),
    end_ts=('ts', 'max'))\
    .reset_index()\
    .drop("group", axis=1)

out:出去：

print(df2)
  id  factor    start_ts      end_ts
0  A       1  2020-01-01  2020-01-05
1  A      10  2020-01-06  2020-01-17
2  A       1  2020-01-18  2020-01-20

熊猫从时间序列列中获取日期范围

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-05-24 18:00:23

解决方案2
1 2022-05-24 18:35:51

熊猫从时间序列列中获取日期范围

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-05-24 18:00:23

解决方案2 1 2022-05-24 18:35:51

解决方案1
2 已采纳 2022-05-24 18:00:23

解决方案2
1 2022-05-24 18:35:51