简体   繁体   English

在Python / Pandas中如何将Century-months转换为DateTimeIndex?

[英]In Python/Pandas how do I convert century-months to DateTimeIndex?

I am working with a dataset that encodes dates as the integer number of months since December 1899, so month 1 is January 1900 and month 1165 is January 1997. I would like to convert to a pandas DateTimeIndex. 我正在使用一个数据集,该数据集将日期编码为自1899年12月以来的整数月,因此第1个月是1900年1月,第1165年是1997年1月。我想转换为pandas DateTimeIndex。 So far the best I've come up with is: 到目前为止,我提出的最好的是:

month0 = np.datetime64('1899-12-15')
one_month = np.timedelta64(30, 'D') + np.timedelta64(10.5, 'h')
birthdates = pandas.DatetimeIndex(month0 + one_month * resp.cmbirth)

The start date is the 15th of the month, and the timedelta is 30 days 10.5 hours, the average length of a calendar month. 开始日期是该月的第15天,timedelta是30天10.5小时,即一个日历月的平均长度。 So the date within the month drifts by a day or two. 因此,月内的日期会漂移一两天。

So this seems a little hacky and I wondered if there's a better way. 所以这看起来有点hacky,我想知道是否有更好的方法。

You can use built-in pandas date-time functionality. 您可以使用内置的pandas日期时间功能。

import pandas as pd
import numpy as np

indexed_months = np.random.random_integers(0, high=1165, size=100)
month0 = pd.to_datetime('1899-12-01')
date_list = [month0 + pd.DateOffset(months=mnt) for mnt in indexed_months]
birthdates = pd.DatetimeIndex(date_list) 

I've made an assumption that your resp.cmbirth object looks like an array of integers between 0 and 1165. 我假设你的resp.cmbirth对象看起来像0到1165之间的整数数组。

I'm not quite clear on why you want the bin edges of the indices to be offset from the start or end of the month. 我不太清楚为什么你希望索引的bin边缘从月的开始或结束偏移。 This can be done: 这可以做到:

shifted_birthdates = birthdates.shift(15, freq=pd.datetools.day)

and similarly for hours if you want. 如果你愿意的话,几个小时。 There is also useful info in the answers to this SO question and the related pandas github issue . 在这个SO问题的答案和相关的pandas github问题中也有有用的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM