简体   繁体   English

在 Pandas 中创建自定义大小的日期时间系列容器

[英]Create custom sized bins of datetime Series in Pandas

I have multiple Pandas Series of datetime64 values that I want to bin into groups using arbitrary bin sizes.我有多个 Pandas 系列的 datetime64 值,我想使用任意 bin 大小将它们分组。

I've found the Series.to_period() function which does exactly what I want except that I need more control over the chosen bin size.我找到了Series.to_period() function,它完全符合我的要求,只是我需要更多地控制所选的 bin 大小。 to_period allows me to bin by full years, months, days, etc. but I also want to bin by 5 years, 6 hours or 15 minutes. to_period允许我按整年、月、日等进行分类,但我也想按 5 年、6 小时或 15 分钟分类。 Using a syntax like 5Y , 6H or 15min works in other corners of Pandas but apparently not here.使用5Y6H15min之类的语法在 Pandas 的其他角落有效,但显然不在这里。

s = pd.Series(["2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04"], dtype="datetime64[ns]")

# Output as expected
s.dt.to_period("M").value_counts()
2020-02    4
Freq: M, dtype: int64

# Output as expected
s.dt.to_period("W").value_counts()
2020-01-27/2020-02-02    2
2020-02-03/2020-02-09    2
Freq: W-SUN, dtype: int64

# Output as expected
s.dt.to_period("D").value_counts()
2020-02-01    1
2020-02-02    1
2020-02-03    1
2020-02-04    1
Freq: D, dtype: int64

# Output unexpected (and wrong?)
s.dt.to_period("2D").value_counts()
2020-02-01    1
2020-02-02    1
2020-02-03    1
2020-02-04    1
Freq: 2D, dtype: int64

I believe that pd.Grouper is what you're looking for.我相信pd.Grouper就是您要找的。

https://pandas.pydata.org/docs/reference/api/pandas.Grouper.html https://pandas.pydata.org/docs/reference/api/pandas.Grouper.html

It provides the flexibility of having multiple frequencies in addition to the standard ones: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases除了标准频率之外,它还提供了具有多个频率的灵活性: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

From the documentation:从文档中:

>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7T, dtype: int64

>>> ts.groupby(pd.Grouper(freq='17min')).sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, dtype: int64

NOTE: If you'd like to .groupby a certain column then use the following syntax: df.groupby(pd.Grouper(key="my_col", freq="3M"))注意:如果您想对特定列进行.groupby ,请使用以下语法: df.groupby(pd.Grouper(key="my_col", freq="3M"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM