简体   繁体   English

如何从Pandas的时间序列的最后四个完整季度中选择数据?

[英]How to select data from the last four complete quarters of a timeseries with Pandas?

Say I've got a dataframe with a datetime index which covers the last financial year and one day in the current financial year (starting on April 1) : 假设我有一个包含日期时间索引的数据框,该数据框涵盖了上一个会计年度和当前会计年度的某一天(从4月1日开始):

           Units
date
2016-01-01   8734   
2016-06-30   6120
2016-09-30   7346
2016-12-31   5925
2016-03-31   7542
2016-06-30   9916
2016-09-30   9547
2016-12-31   8063
2017-01-01   7000
2017-03-31   5672
2017-04-01   7856

I'd like to be able to select the data for the last complete four quarters - in this case ignoring the first and last rows. 我希望能够选择最后四个完整季度的数据-在这种情况下,忽略第一行和最后一行。

I know I can do this with slicing, thus: 我知道我可以通过切片来做到这一点,因此:

df["2016-04-01":"2017-03-31"]

What's the most elegant - pythonic - solution to filter the data according to the last four complete quarters programmatically? 什么是最优雅的-pythonic-解决方案,以编程方式根据最后四个完整季度过滤数据?

You should first define your quarters. 您应该首先定义您的宿舍。 You can use pd.period_range for that with the correct freq : example : 您可以将pd.period_range与正确的freq一起使用:example:

quarters = pd.period_range('2016Q1', '2017Q1', freq='Q-MAR')

This would give you a PeriodIndex on which you can change the frequency to get the dates you want with asfreq : 这将为您提供一个PeriodIndex,您可以在其上更改频率以使用asfreq获得所需的日期:

quarters.asfreq('D', 'E')

That would give you the PeriodIndex that you can use to slice your Index. 这将为您提供PeriodIndex,可用于对Index进行切片。

Here are more example in the documentation . 这里是文档中的更多示例。

pandas.DatetimeIndex.quarter Might also be useful. pandas.DatetimeIndex.quarter可能也有用。

And then you can use groupby to aggregate easily. 然后,您可以使用groupby轻松聚合。

Using Alex's pointer to the DateOffset functionality in Pandas I found a partial solution, as well as the datetime module: 使用Alex指向Pandas中DateOffset功能的指针,我找到了部分解决方案以及datetime模块:

import datetime
from pandas.tseries.offsets import *
now = datetime.datetime.now()
start_year = (now - BQuarterEnd(n=1) - (12 * MonthBegin())).to_datetime()
end_year = (now - BQuarterEnd(n=1) ).to_datetime()
df[start_year.strftime("%Y-%m-%d") : end_year.strftime("%Y-%m-%d")]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM