I get the Fama-French factors from Ken French's data library using pandas.io.data
, but I can't figure out how to convert the integer year-month date index (eg, 200105
) to a datetime
index so that I can take advantage of more pandas
features.
The following code runs, but my index attempt in the last un-commented line drops all data in DataFrame ff
. I also tried .reindex()
, but this doesn't change the index to range
. What is the pandas
way? Thanks!
import pandas as pd
from pandas.io.data import DataReader
import datetime as dt
ff = pd.DataFrame(DataReader("F-F_Research_Data_Factors", "famafrench")[0])
ff.columns = ['Mkt_rf', 'SMB', 'HML', 'rf']
start = ff.index[0]
start = dt.datetime(year=start//100, month=start%100, day=1)
end = ff.index[-1]
end = dt.datetime(year=end//100, month=end%100, day=1)
range = pd.DateRange(start, end, offset=pd.datetools.MonthEnd())
ff = pd.DataFrame(ff, index=range)
#ff.reindex(range)
reindex
realigns the existing index to the given index rather than changing the index. you can just do ff.index = range
if you've made sure the lengths and the alignment matches.
Parsing each original index value is much safer. The easy approach is to do this by converting to a string:
In [132]: ints
Out[132]: Int64Index([201201, 201201, 201201, ..., 203905, 203905, 203905])
In [133]: conv = lambda x: datetime.strptime(str(x), '%Y%m')
In [134]: dates = [conv(x) for x in ints]
In [135]: %timeit [conv(x) for x in ints]
1 loops, best of 3: 222 ms per loop
This is kind of slow, so if you have a lot observations you might want to use an optimize cython function in pandas:
In [144]: years = (ints // 100).astype(object)
In [145]: months = (ints % 100).astype(object)
In [146]: days = np.ones(len(years), dtype=object)
In [147]: import pandas.lib as lib
In [148]: %timeit Index(lib.try_parse_year_month_day(years, months, days))
100 loops, best of 3: 5.47 ms per loop
Here ints
has 10000 entries.
Try this list comprehensions, it works for me:
ff = pd.DataFrame(DataReader("F-F_Research_Data_Factors", "famafrench")[0])
ff.columns = ['Mkt_rf', 'SMB', 'HML', 'rf']
ff.index = [dt.datetime(d/100, d%100, 1) for d in ff.index]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.