简体   繁体   English

使用时区感知索引的pandas to_Datetime转换

[英]pandas to_Datetime conversion with timezone aware index

I have a dataframe with timezone aware index 我有一个带有时区感知索引的数据帧

>>> dfn.index
Out[1]: 
DatetimeIndex(['2004-01-02 01:00:00+11:00', '2004-01-02 02:00:00+11:00',
               '2004-01-02 03:00:00+11:00', '2004-01-02 04:00:00+11:00',
               '2004-01-02 21:00:00+11:00', '2004-01-02 22:00:00+11:00'],
              dtype='datetime64[ns]', freq='H', tz='Australia/Sydney')

I save it in csv, then read it as follows: 我将它保存在csv中,然后按如下方式读取:

>>> dfn.to_csv('temp.csv')
>>> df= pd.read_csv('temp.csv', index_col=0 ,header=None )
>>> df.head()
Out[1]: 
                                1
0                                
NaN                        0.0000
2004-01-02 01:00:00+11:00  0.7519
2004-01-02 02:00:00+11:00  0.7520
2004-01-02 03:00:00+11:00  0.7515
2004-01-02 04:00:00+11:00  0.7502

The index is read as a string 索引读取为字符串

>>> df.index[1]
Out[3]: '2004-01-02 01:00:00+11:00'

On converting to_datetime, it changes the time as it adds +11 to hours 在转换为to_datetime时,它会将时间加上+11到小时

>>> df.index = pd.to_datetime(df.index)
>>> df.index[1]
Out[6]: Timestamp('2004-01-01 14:00:00')

I can now subtract 11 hours from the index to fix it, but is there a better way to handle this? 我现在可以从索引中减去11个小时来修复它,但有没有更好的方法来处理它?

I tried using the solution in answer here , but that slows down the code a lot. 我尝试在这里使用解决方案,但这会减慢代码的速度。

I think here is problem you need write and read header of file same way. 我认为这里是你需要以同样的方式写和读文件头的问题。 And for parse dates need parameter parse_dates . 对于解析日期需要参数parse_dates

#write to file header
dfn.to_csv('temp.csv')
#no read header
df= pd.read_csv('temp.csv', index_col=0 ,header=None)

Solution1: 解决方法1:

#no write header
dfn.to_csv('temp.csv', header=None)
#no read header
df= pd.read_csv('temp.csv', index_col=0 ,header=None, parse_dates=[0])

Solution2: 溶液2:

#write header
dfn.to_csv('temp.csv')
#read header
df= pd.read_csv('temp.csv', index_col=0, parse_dates=[0])

Unfortunately parse_date convert dates to UTC , so is necessary add timezones later: 不幸的是, parse_date将日期转换为UTC ,因此必须在以后添加时区:

df.index = df.index.tz_localize('UTC').tz_convert('Australia/Sydney')
print (df.index)
DatetimeIndex(['2004-01-02 01:00:00+11:00', '2004-01-02 02:00:00+11:00',
               '2004-01-02 03:00:00+11:00', '2004-01-02 04:00:00+11:00',
               '2004-01-02 05:00:00+11:00', '2004-01-02 06:00:00+11:00',
               '2004-01-02 07:00:00+11:00', '2004-01-02 08:00:00+11:00',
               '2004-01-02 09:00:00+11:00', '2004-01-02 10:00:00+11:00'],
              dtype='datetime64[ns, Australia/Sydney]', name=0, freq=None)

Sample for test: 测试样品:

idx = pd.date_range('2004-01-02 01:00:00', periods=10, freq='H', tz='Australia/Sydney')
dfn = pd.DataFrame({'col':range(len(idx))}, index=idx)
print (dfn)
                           col
2004-01-02 01:00:00+11:00    0
2004-01-02 02:00:00+11:00    1
2004-01-02 03:00:00+11:00    2
2004-01-02 04:00:00+11:00    3
2004-01-02 05:00:00+11:00    4
2004-01-02 06:00:00+11:00    5
2004-01-02 07:00:00+11:00    6
2004-01-02 08:00:00+11:00    7
2004-01-02 09:00:00+11:00    8
2004-01-02 10:00:00+11:00    9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM