将Fama-French因子中的整数索引转换为熊猫中的日期时间索引

Question

I get the Fama-French factors from Ken French's data library using pandas.io.data , but I can't figure out how to convert the integer year-month date index (eg, 200105 ) to a datetime index so that I can take advantage of more pandas features. 我使用pandas.io.data从Ken French的数据库中获得了Fama-French因子，但是我不知道如何将整数年月日期索引（例如200105 ）转换为datetime索引，以便我可以更多pandas功能的优势。

The following code runs, but my index attempt in the last un-commented line drops all data in DataFrame ff . 下面的代码运行，但是我在未注释的最后一行中的索引尝试将删除DataFrame ff所有数据。 I also tried .reindex() , but this doesn't change the index to range . 我也尝试过.reindex() ，但这不会将索引更改为range 。 What is the pandas way? pandas是什么方式？ Thanks! 谢谢！

import pandas as pd
from pandas.io.data import DataReader
import datetime as dt

ff = pd.DataFrame(DataReader("F-F_Research_Data_Factors", "famafrench")[0])
ff.columns = ['Mkt_rf', 'SMB', 'HML', 'rf']

start = ff.index[0]
start = dt.datetime(year=start//100, month=start%100, day=1)
end = ff.index[-1]
end = dt.datetime(year=end//100, month=end%100, day=1)
range = pd.DateRange(start, end, offset=pd.datetools.MonthEnd())
ff = pd.DataFrame(ff, index=range)
#ff.reindex(range)

Answer 1

reindex realigns the existing index to the given index rather than changing the index. reindex将现有索引与给定reindex重新对齐，而不是更改索引。 you can just do ff.index = range if you've made sure the lengths and the alignment matches. 如果您确定长度和对齐方式匹配，则只需执行ff.index = range 。

Parsing each original index value is much safer. 解析每个原始索引值要安全得多。 The easy approach is to do this by converting to a string: 一种简单的方法是通过转换为字符串来做到这一点：

In [132]: ints
Out[132]: Int64Index([201201, 201201, 201201, ..., 203905, 203905, 203905])

In [133]: conv = lambda x: datetime.strptime(str(x), '%Y%m')

In [134]: dates = [conv(x) for x in ints]

In [135]: %timeit [conv(x) for x in ints]
1 loops, best of 3: 222 ms per loop

This is kind of slow, so if you have a lot observations you might want to use an optimize cython function in pandas: 这有点慢，因此，如果您有很多观察，则可能要在熊猫中使用优化cython函数：

In [144]: years = (ints // 100).astype(object)

In [145]: months = (ints % 100).astype(object)

In [146]: days = np.ones(len(years), dtype=object)

In [147]: import pandas.lib as lib

In [148]: %timeit Index(lib.try_parse_year_month_day(years, months, days))
100 loops, best of 3: 5.47 ms per loop

Here ints has 10000 entries. 在这里， ints具有10000个条目。

Answer 2

Try this list comprehensions, it works for me: 试试这个列表理解，它对我有用：

ff = pd.DataFrame(DataReader("F-F_Research_Data_Factors", "famafrench")[0])
ff.columns = ['Mkt_rf', 'SMB', 'HML', 'rf']    
ff.index = [dt.datetime(d/100, d%100, 1) for d in ff.index]

将Fama-French因子中的整数索引转换为熊猫中的日期时间索引

问题描述

2 个解决方案

解决方案1
4 已采纳 2012-10-17 05:48:07

解决方案2
2 2013-06-03 08:29:09

将Fama-French因子中的整数索引转换为熊猫中的日期时间索引

问题描述

2 个解决方案

解决方案1 4 已采纳 2012-10-17 05:48:07

解决方案2 2 2013-06-03 08:29:09

解决方案1
4 已采纳 2012-10-17 05:48:07

解决方案2
2 2013-06-03 08:29:09