[英]How to iterate over time periods in pandas
Consider I have a pandas Series
with a DatetimeIndex
with daily frequency. 考虑我有一个带有每日频率的
DatetimeIndex
的pandas Series
。 I want to iterate over this Series
with arbitrary frequency and an arbitrary look-back window. 我想用任意频率和任意的回顾窗口迭代这个
Series
。 For example: Iterate half-yearly with a lookback window of 1y. 例如:使用1y的回顾窗口每半年迭代一次。
Something like this would be ideal: 这样的事情是理想的:
for df_year in df.timegroup(freq='6m', lookback='1y'):
# df_year will span one year of daily prices and be generated every 6 months
I know about TimeGrouper
but haven't figured out how it could do this. 我知道
TimeGrouper
但还没弄清楚它是如何做到这一点的。 Anyway, I could just code this manually but was hoping for a clever pandas
one-liner. 无论如何,我可以手动编码,但希望有一个聪明的
pandas
单线。
Edit: This is getting a bit closer: 编辑:这有点接近:
pd.rolling_apply(df, 252, lambda s: s.sum(), freq=pd.datetools.BMonthEnd())
This doesn't quite work, because it applies a lookback window of 252*BMonthEnd() while I'd like that to be independent and have a lookback window of 252 days every end of the month. 这不太有用,因为它应用252 * BMonthEnd()的回顾窗口,而我希望它是独立的,并且每个月末都有252 天的回顾窗口。
I think this is what you are looking for 我想这就是你要找的东西
Construct a series of a frequency. 构造一系列频率。 Using 1 for clarify here.
使用1来澄清这里。
In [77]: i = pd.date_range('20110101','20150101',freq='B')
In [78]: s = Series(1,index=i)
In [79]: s
Out[79]:
2011-01-03 1
2011-01-04 1
2011-01-05 1
2011-01-06 1
2011-01-07 1
..
2014-12-26 1
2014-12-29 1
2014-12-30 1
2014-12-31 1
2015-01-01 1
Freq: B, dtype: int64
In [80]: len(s)
Out[80]: 1044
Conform the index to another frequency. 使索引符合另一个频率。 This makes every index element be the end-of-month here.
这使得每个索引元素都是这里的月末。
In [81]: s.index = s.index.to_period('M').to_timestamp('M')
In [82]: s
Out[82]:
2011-01-31 1
2011-01-31 1
2011-01-31 1
2011-01-31 1
2011-01-31 1
..
2014-12-31 1
2014-12-31 1
2014-12-31 1
2014-12-31 1
2015-01-31 1
dtype: int64
Then its straightforward to resample to another frequency. 然后直接重新采样到另一个频率。 This gives you the number of business days in the period in this case.
这将为您提供此时期间的工作日数。
In [83]: s.resample('3M',how='sum')
Out[83]:
2011-01-31 21
2011-04-30 64
2011-07-31 65
2011-10-31 66
2012-01-31 66
..
2014-01-31 66
2014-04-30 63
2014-07-31 66
2014-10-31 66
2015-01-31 44
Freq: 3M, dtype: int64
This solution provides a one liner using list comprehension. 该解决方案使用列表理解提供单行。 Starting from the left of the time series and iterating forward (backward iteration could also be done), the iteration returns a subset of the index equal to the loopback window and jumps in a step size equal to the frequency.
从时间序列的左侧开始并向前迭代(也可以进行向后迭代),迭代返回索引的子集,该子集等于环回窗口,并以等于频率的步长跳转。 Note that the very last period is likely a stub with a length less than the lookback window.
请注意,最后一个句点可能是长度小于回顾窗口的存根。
This method uses days rather than month or week offsets. 此方法使用天而不是月或周偏移。
freq = 30 # Days
lookback = 60 # Days
idx = pd.date_range('2010-01-01', '2015-01-01')
[idx[(freq * n):(lookback + freq * n)] for n in range(int(len(idx) / freq))]
Out[86]:
[<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-01, ..., 2010-03-01]
Length: 60, Freq: D, Timezone: None,
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-31, ..., 2010-03-31]
Length: 60, Freq: D, Timezone: None,
...
Length: 60, Freq: D, Timezone: None,
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-11-06, ..., 2015-01-01]
Length: 57, Freq: D, Timezone: None]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.