[英]Pandas - how to convert RangeIndex into DateTimeIndex
I have the following dataframe. 我有以下数据帧。 It is OHLC one-minute data. 它是OHLC一分钟数据。 Obviously I need the T column to become and index in order to use time-series functionallity 显然我需要T列成为和索引才能使用时间序列函数
CHLOTV CHLOTV
13712 6873.0 6873.0 6873.0 6873.0 2018-01-13T17:17:00 799.448421
13713 6878.0 6878.0 6875.0 6875.0 2018-01-13T17:18:00 1707.578666
13714 6880.0 6880.0 6825.0 6825.0 2018-01-13T17:21:00 481.245707
13715 6876.0 6876.0 6876.0 6876.0 2018-01-13T17:22:00 839.177283
13716 6870.0 6878.0 6830.0 6878.0 2018-01-13T17:23:00 4336.830277
I used: 我用了:
df['T'] = pd.to_datetime(df['T'])
So far so good! 到现在为止还挺好! The T column is now recognised as a date T列现在被识别为日期
Check: 校验:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 13717 entries, 1970-01-01 00:00:00 to 1970-01-01 00:00:00.000013716
Data columns (total 7 columns):
BV 13717 non-null float64
C 13717 non-null float64
H 13717 non-null float64
L 13717 non-null float64
O 13717 non-null float64
T 13717 non-null datetime64[ns]
V 13717 non-null float64
dtypes: datetime64[ns](1), float64(6)
memory usage: 857.3 KB
And now comes the fun and unexplainable part: 现在来到了有趣且无法解释的部分:
df.set_index(df['T'])
C H L O T V
T
2018-01-03 17:27:00 5710.0 5710.0 5663.0 5667.0 2018-01-03 17:27:00 3863.030204
2018-01-03 17:28:00 5704.0 5710.0 5663.0 5710.0 2018-01-03 17:28:00 1208.627542
2018-01-03 17:29:00 5699.0 5699.0 5663.0 5663.0 2018-01-03 17:29:00 1755.123688
Still looks good, but when I check the type of index I get: 仍然看起来不错,但当我检查索引的类型时,我得到:
RangeIndex(start=0, stop=13717, step=1)
And now if I try: 现在,如果我尝试:
df.index = pd.to_datetime(df.index)
I end up with: 我最终得到:
DatetimeIndex([ '1970-01-01 00:00:00',
'1970-01-01 00:00:00.000000001',
'1970-01-01 00:00:00.000000002',
'1970-01-01 00:00:00.000000003',
'1970-01-01 00:00:00.000000004' and so on...
which is evidently wrong. 这显然是错的。
The questions are: 1. Why don't I get the normal DateTimeIndex if I am converting a date to index? 问题是:1。如果我将日期转换为索引,为什么不能获得正常的DateTimeIndex?
Thanks! 谢谢!
If input data are csv
the simpliest is use parameters parse_dates
and index_col
in read_csv
: 如果输入的数据csv
的simpliest是使用参数parse_dates
和index_col
在read_csv
:
df = pd.read_csv(file, parse_dates=['T'], index_col=['T'])
If not, then use your solution, don't forget assign back output of set_index
and if need drop column T
also after DatetimeIndex
use T
instead df['T']
: 如果没有,那么使用你的解决方案,不要忘记分配set_index
输出,如果需要在DatetimeIndex
使用T
而不是df['T']
后丢弃列T
:
df['T'] = pd.to_datetime('T')
df = df.set_index('T')
#alternative solution
#df.set_index('T', inplace=True)
Why don't I get the normal DateTimeIndex if I am converting a date to index? 如果我将日期转换为索引,为什么不能获得正常的DateTimeIndex?
Because your index is default ( 0,1,2..
), so df.index = pd.to_datetime(df.index)
parse integers
s like ns
and get weird datetimes. 因为你的索引是默认的( 0,1,2..
),所以df.index = pd.to_datetime(df.index)
解析integers
和ns
一样,得到奇怪的日期时间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.