简体   繁体   English

Pandas 每周重采样

[英]Pandas weekly resampling

I have a dataframe with daily market data (OHLCV) and am resampling it to weekly.我有一个带有每日市场数据 (OHLCV) 的 dataframe,我正在将其重新采样为每周。

My specific requirement is that the weekly dataframe's index labels must be the index labels of the first day of that week , whose data is present in the daily dataframe.我的具体要求是每周数据框的索引标签必须是该周第一天的索引标签,其数据存在于每日 dataframe 中。

For example, in July 2022, the trading week beginning 4th July (for US stocks) should be labelled 5th July, since 4th July was a holiday and not found in the daily dataframe, and the first date in that week found in the daily dataframe is 5th July.例如2022年7月,从7月4日开始的交易周(美股)应该标注为7月5日,因为7月4日是假期,在日线dataframe中找不到,而在日线dataframe中找到该周的第一个日期是7月5日。

The usual weekly resampling offset aliases and anchored offsets do not seem to have such an option.通常的每周重采样偏移量别名锚定偏移量似乎没有这样的选项。

I can achieve my requirement specifically for US stocks by importing USFederalHolidayCalendar from pandas.tseries.holiday and then using我可以通过从pandas.tseries.holiday USFederalHolidayCalendar使用来实现我对美国股票的具体要求

bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
dfw.index = dfw.index.map(lambda idx: bday_us.rollforward(idx))

where dfw is the already resampled weekly dataframe with W-MON as option.其中dfw是已经重新采样的每周 dataframe,其中W-MON作为选项。

However, this would mean that I'd have to use different trading calendars for each different exchange/market, which I'd very much like to avoid.然而,这意味着我必须为每个不同的交易所/市场使用不同的交易日历,我非常想避免这种情况。

Any pointers on how to do this simply so that the index label in the weekly dataframe is the index label of the first day of that week available in the daily dataframe would be much appreciated.任何关于如何简单地执行此操作的指示,以便每周 dataframe 中的索引 label 是该周第一天的索引 label,每天 dataframe 中可用的索引 label 将不胜感激。

You want to group all days by calendar week (Mon-Sun), then aggregate the data, and use the first observed date as the index, correct?您想按日历周(周一至周日)对所有日期进行分组,然后汇总数据,并使用第一个观察日期作为索引,对吗?

If so, W-MON is not applicable because you will group dates from Tuesday through Monday.如果是这样,则W-MON不适用,因为您将从星期二到星期一对日期进行分组。 Using W-SUN instead, you group by the calendar week where the index is the Sunday.改用W-SUN ,您可以按索引为星期日的日历周进行分组。 However, you can use method first on the date column to obtain the first observed date in this week and replace the index with this result.但是,您可以在日期列上使用 method first来获取本周的第一个观察日期,并将索引替换为该结果。

This is possible with either groupby or resample :这可以通过groupbyresample实现:

import numpy as np
import pandas as pd

# simulate daily data, drop a monday
date_range = pd.bdate_range(start='2022-06-06',end='2022-07-31')
date_range = date_range[~(date_range=='2022-07-04')]

# simulate data
df = pd.DataFrame(data = {
    'date': date_range,
    'return': np.random.random(size=len(date_range))
})

# resample with groupby
g = df.groupby([pd.Grouper(key='date', freq='W-SUN')])
result_groupby = g[['return']].mean() # example aggregation method
result_groupby['date_first_observed'] = g['date'].first()
result_groupby['date_last_observed'] = g['date'].last()
result_groupby.set_index('date_first_observed', inplace=True)

# resample with resample
df.index = df['date']
g = df.resample('W-SUN')
result_resample = g[['return']].mean() # example aggregation method
result_resample['date_first_observed'] = g['date'].first()
result_resample['date_last_observed'] = g['date'].last()
result_resample.set_index('date_first_observed', inplace=True)

This gives这给

>>> result_groupby
                       return date_last_observed
date_first_observed                             
2022-06-06           0.704949         2022-06-10
2022-06-13           0.460946         2022-06-17
2022-06-20           0.578682         2022-06-24
2022-06-27           0.361004         2022-07-01
2022-07-05           0.692309         2022-07-08
2022-07-11           0.569810         2022-07-15
2022-07-18           0.435222         2022-07-22
2022-07-25           0.454765         2022-07-29
>>> result_resample
                       return date_last_observed
date_first_observed                             
2022-06-06           0.704949         2022-06-10
2022-06-13           0.460946         2022-06-17
2022-06-20           0.578682         2022-06-24
2022-06-27           0.361004         2022-07-01
2022-07-05           0.692309         2022-07-08
2022-07-11           0.569810         2022-07-15
2022-07-18           0.435222         2022-07-22
2022-07-25           0.454765         2022-07-29

One row shows 2022-07-05 (Tuesday) instead of 2022-07-04 (Monday).一行显示2022-07-05 (星期二)而不是2022-07-04 (星期一)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM