简体   繁体   English

熊猫从时间序列列中获取日期范围

[英]Pandas get date range from timeseries column

I have a dataframe which looks something like this:我有一个看起来像这样的数据框:

id  ts  factor
A   2020-01-01  1
A   2020-01-02  1
A   2020-01-03  1
A   2020-01-04  1
A   2020-01-05  1
A   2020-01-06  10
A   2020-01-07  10
A   2020-01-08  10
A   2020-01-09  10
A   2020-01-10  10
A   2020-01-11  10
A   2020-01-12  10
A   2020-01-13  10
A   2020-01-14  10
A   2020-01-15  10
A   2020-01-16  10
A   2020-01-17  10
A   2020-01-18  1
A   2020-01-19  1
A   2020-01-20  1

my desire output is:我的愿望输出是:

id  start_ts    end_ts  factor
A   2020-01-01  2020-01-05  1
A   2020-01-06  2020-01-17  10
A   2020-01-18  2020-01-20  1

so far I can only think of groupby on factor and then do min and max operation, but that doesn't work for factor 1到目前为止,我只能在因子上考虑 groupby,然后进行最小和最大操作,但这不适用于因子 1

df.groupby(["factor"]).agg({'date' : [np.min, np.max]})

how can I achieve the output?我怎样才能实现输出?

Use cumsum on comparison with shift of factor to find the factor blocks, then add it to groupby :使用cumsumfactor移位进行比较以找到factor块,然后将其添加到groupby

blocks = df['factor'].ne(df['factor'].shift()).cumsum()
df.groupby(['id','factor',blocks], sort=False)['ts'].agg(['min','max'])

Output:输出:

                         min         max
id factor factor                        
A  1      1       2020-01-01  2020-01-05
   10     2       2020-01-06  2020-01-17
   1      3       2020-01-18  2020-01-20

slightly updated variant of @Quang Hoang with named grouping:带有命名分组的@Quang Hoang 的稍微更新的变体:

blocks = df['factor'].ne(df['factor'].shift()).cumsum()
blocks = blocks.rename("group")

df2 = df.groupby(['id', blocks,'factor']).agg(
    start_ts=('ts', 'min'),
    end_ts=('ts', 'max'))\
    .reset_index()\
    .drop("group", axis=1)

out:出去:

print(df2)
  id  factor    start_ts      end_ts
0  A       1  2020-01-01  2020-01-05
1  A      10  2020-01-06  2020-01-17
2  A       1  2020-01-18  2020-01-20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM