[英]Pandas get date range from timeseries column
I have a dataframe which looks something like this:我有一个看起来像这样的数据框:
id ts factor
A 2020-01-01 1
A 2020-01-02 1
A 2020-01-03 1
A 2020-01-04 1
A 2020-01-05 1
A 2020-01-06 10
A 2020-01-07 10
A 2020-01-08 10
A 2020-01-09 10
A 2020-01-10 10
A 2020-01-11 10
A 2020-01-12 10
A 2020-01-13 10
A 2020-01-14 10
A 2020-01-15 10
A 2020-01-16 10
A 2020-01-17 10
A 2020-01-18 1
A 2020-01-19 1
A 2020-01-20 1
my desire output is:我的愿望输出是:
id start_ts end_ts factor
A 2020-01-01 2020-01-05 1
A 2020-01-06 2020-01-17 10
A 2020-01-18 2020-01-20 1
so far I can only think of groupby on factor and then do min and max operation, but that doesn't work for factor 1到目前为止,我只能在因子上考虑 groupby,然后进行最小和最大操作,但这不适用于因子 1
df.groupby(["factor"]).agg({'date' : [np.min, np.max]})
how can I achieve the output?我怎样才能实现输出?
Use cumsum
on comparison with shift of factor
to find the factor
blocks, then add it to groupby
:使用
cumsum
与factor
移位进行比较以找到factor
块,然后将其添加到groupby
:
blocks = df['factor'].ne(df['factor'].shift()).cumsum()
df.groupby(['id','factor',blocks], sort=False)['ts'].agg(['min','max'])
Output:输出:
min max
id factor factor
A 1 1 2020-01-01 2020-01-05
10 2 2020-01-06 2020-01-17
1 3 2020-01-18 2020-01-20
slightly updated variant of @Quang Hoang with named grouping:带有命名分组的@Quang Hoang 的稍微更新的变体:
blocks = df['factor'].ne(df['factor'].shift()).cumsum()
blocks = blocks.rename("group")
df2 = df.groupby(['id', blocks,'factor']).agg(
start_ts=('ts', 'min'),
end_ts=('ts', 'max'))\
.reset_index()\
.drop("group", axis=1)
out:出去:
print(df2)
id factor start_ts end_ts
0 A 1 2020-01-01 2020-01-05
1 A 10 2020-01-06 2020-01-17
2 A 1 2020-01-18 2020-01-20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.