[英]Create custom sized bins of datetime Series in Pandas
I have multiple Pandas Series of datetime64 values that I want to bin into groups using arbitrary bin sizes.我有多个 Pandas 系列的 datetime64 值,我想使用任意 bin 大小将它们分组。
I've found the Series.to_period()
function which does exactly what I want except that I need more control over the chosen bin size.我找到了
Series.to_period()
function,它完全符合我的要求,只是我需要更多地控制所选的 bin 大小。 to_period
allows me to bin by full years, months, days, etc. but I also want to bin by 5 years, 6 hours or 15 minutes. to_period
允许我按整年、月、日等进行分类,但我也想按 5 年、6 小时或 15 分钟分类。 Using a syntax like 5Y
, 6H
or 15min
works in other corners of Pandas but apparently not here.使用
5Y
、 6H
或15min
之类的语法在 Pandas 的其他角落有效,但显然不在这里。
s = pd.Series(["2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04"], dtype="datetime64[ns]")
# Output as expected
s.dt.to_period("M").value_counts()
2020-02 4
Freq: M, dtype: int64
# Output as expected
s.dt.to_period("W").value_counts()
2020-01-27/2020-02-02 2
2020-02-03/2020-02-09 2
Freq: W-SUN, dtype: int64
# Output as expected
s.dt.to_period("D").value_counts()
2020-02-01 1
2020-02-02 1
2020-02-03 1
2020-02-04 1
Freq: D, dtype: int64
# Output unexpected (and wrong?)
s.dt.to_period("2D").value_counts()
2020-02-01 1
2020-02-02 1
2020-02-03 1
2020-02-04 1
Freq: 2D, dtype: int64
I believe that pd.Grouper
is what you're looking for.我相信
pd.Grouper
就是您要找的。
https://pandas.pydata.org/docs/reference/api/pandas.Grouper.html https://pandas.pydata.org/docs/reference/api/pandas.Grouper.html
It provides the flexibility of having multiple frequencies in addition to the standard ones: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases除了标准频率之外,它还提供了具有多个频率的灵活性: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
From the documentation:从文档中:
>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00 0
2000-10-01 23:37:00 3
2000-10-01 23:44:00 6
2000-10-01 23:51:00 9
2000-10-01 23:58:00 12
2000-10-02 00:05:00 15
2000-10-02 00:12:00 18
2000-10-02 00:19:00 21
2000-10-02 00:26:00 24
Freq: 7T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min')).sum()
2000-10-01 23:14:00 0
2000-10-01 23:31:00 9
2000-10-01 23:48:00 21
2000-10-02 00:05:00 54
2000-10-02 00:22:00 24
Freq: 17T, dtype: int64
NOTE: If you'd like to .groupby
a certain column then use the following syntax: df.groupby(pd.Grouper(key="my_col", freq="3M"))
注意:如果您想对特定列进行
.groupby
,请使用以下语法: df.groupby(pd.Grouper(key="my_col", freq="3M"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.