[英]How to resample AAII weekly data to daily?
我想导入以下文件,该文件包含每周格式(仅星期四)的数据,并将其转换为每日文件,并具有从星期四到下一个星期三的值,并跳过星期六和星期日。
https://www.aaii.com/files/surveys/sentiment.xls
我可以导入它:
df = pd.read_excel("C:\\Users\\Public\\Portfolio\\exports\\sentiment.xls", sheet_name = "SENTIMENT", skiprows=3, parse_dates=['Date'], date_format='%m-%d-%y')
结果如下:
但是,据我所知。 即使是最简单的重采样也会失败
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
我尝试了df['Date'] = pd.to_datetime(df['Date'])
和其他方法,但均未获得成功。
关于如何做到这一点的想法?
你可以试试..
df = pd.read_excel("sentiment.xls", sheet_name = "SENTIMENT", skiprows=3, parse_dates=['Date'], date_format='%m-%d-%y')
您的日期列具有NaN值,因此当您尝试转换为datetime
它不会这样做。
>>> df['Date']
0 NaN
1 1987-06-26 00:00:00
2 1987-07-17 00:00:00
3 1987-07-24 00:00:00
4 1987-07-31 00:00:00
因此,您需要使用coerce
转换日期时间才能进行转换。
>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
现在您的日期已处理..
>>> df['Date']
0 NaT
1 1987-06-26
2 1987-07-17
3 1987-07-24
4 1987-07-31
5 1987-08-07
6 1987-08-14
7 1987-08-21
现在,将索引设置为“日期”列,然后按照注释中的说明重新采样:
>>> df.set_index('Date', inplace=True)
>>> df.head()
Bullish Neutral Bearish Total Mov Avg Spread Average +St. Dev. - St. Dev. High Low Close
Date
NaT NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1987-06-26 NaN NaN NaN NaN NaN NaN 0.382642 0.484295 0.280989 NaN NaN NaN
1987-07-17 NaN NaN NaN NaN NaN NaN 0.382642 0.484295 0.280989 314.59 307.63 314.59
1987-07-24 0.36 0.50 0.14 1.0 NaN 0.22 0.382642 0.484295 0.280989 311.39 307.81 309.27
1987-07-31 0.26 0.48 0.26 1.0 NaN 0.00 0.382642 0.484295 0.280989 318.66 310.65 318.66
我认为这是正确的答案,可以转换为每日,删除非交易日和周六/周日。
import pandas as pd
from pandas.tseries.offsets import BDay
# read csv, use SENTIMENT sheet, drop the first three rows, parse dates to datetime, index on date
df = pd.read_excel("C:\\Users\\Public\\Portfolio\\exports\\sentiment.xls", sheet_name = "SENTIMENT", skiprows=3, parse_dates=['Date'], date_format='%m-%d-%y', index_col ='Date')
df = df[3:].asfreq('D', method='ffill') # skip 3 lines then expand to daily and fill forward
df = df[df.index.map(BDay().onOffset)] # strip non-trading weekdays
df = df[df.index.dayofweek < 5] # strip Saturdays and Sundays
print(df.head(250))
可能有一种更优雅的方法,但是可以完成工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.