简体   繁体   English

在Python中从Float解析日期

[英]Parsing Dates from Float in Python

I have the following data ( actually AirPassengers from http://vincentarelbundock.github.io/Rdatasets/datasets.html ) 我有以下数据(实际上是来自http://vincentarelbundock.github.io/Rdatasets/datasets.html的 AirPassengers)

     time             AirPassengers
1   1949.000000            112
2   1949.083333            118
3   1949.166667            132
4   1949.250000            129
5   1949.333333            121
6   1949.416667            135

How do I parse the time column in Python to be a date (TS) and not a float. 如何将Python中的时间列解析为日期(TS)而不是浮点数。 I need this as a basic step before I start time series forecasting 开始时间序列预测之前,我需要此作为基本步骤

Based on comments Time is in years and is a float (1949.000 is Jan 1949 and 1949.0833 is Feb 1949) 根据评论,时间以年为单位,并且是浮动的(1949.000是1949年1月,1949.0833是1949年2月)

I am using this to import data, I dont know how to use the date parser within read_csv 我正在使用它来导入数据,我不知道如何在read_csv中使用日期解析器

series = read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/AirPassengers.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, )

Updated- 更新-

one possible solution- ignore the float value and create a date time series using beginning, end and time interval 一种可能的解决方案-忽略浮点值,并使用开始,结束和时间间隔创建日期时间序列

series['dates']=pd.date_range('1949-01', '1961-01', freq='M')
series.head()

time    AirPassengers   dates
1   1949.000000 112 1949-01-31
2   1949.083333 118 1949-02-28
3   1949.166667 132 1949-03-31
4   1949.250000 129 1949-04-30
5   1949.333333 121 1949-05-31
In [45]:

series.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 144 entries, 1 to 144
Data columns (total 3 columns):
time             144 non-null float64
AirPassengers    144 non-null int64
dates            144 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 4.5 KB

Note the new problem- shows end day of month (not beginning) and our original problem of turning float values into datetime values remain 请注意,新问题-显示月份的结束日期(而不是开始),而我们原来将浮点值转换为日期时间值的问题仍然存在

Python version Python版本

!pip install version_information
%load_ext version_information
%version_information


Software    Version
Python  3.5.2 64bit [MSC v.1900 64 bit (AMD64)]
IPython 5.1.0
OS  Windows 7 6.1.7600 SP0

It looks like your input data isn't very precise. 看来您的输入数据不是很精确。 It's just : 只是 :

1949 + float(month)/12

You could just iterate over your line numbers : 您可以遍历行号:

import datetime
start_year = 1949
for line_number in range(20):
    print datetime.date(start_year + line_number/12, line_number % 12 + 1 , 1)

It outputs : 输出:

1949-01-01
1949-02-01
1949-03-01
1949-04-01
1949-05-01
1949-06-01
1949-07-01
1949-08-01
1949-09-01
1949-10-01
1949-11-01
1949-12-01
1950-01-01
1950-02-01
1950-03-01
1950-04-01
1950-05-01
1950-06-01
1950-07-01
1950-08-01

If you really want to parse the strings, you could try : 如果您真的想解析字符串,可以尝试:

import datetime

year_str = "1949.166667"
year_float = float(year_str)
year = int(year_float)
year_start = datetime.date(year,1,1)
delta = datetime.timedelta(days = int((year_float-year)*365) )

print year_start + delta
# 1949-03-02

This way, the steps between datapoints will be exactly a 1/12th of a year. 这样,数据点之间的步长将恰好是一年的1/12。

I suppose, 我想,

1949.000  = 1st jan 1949

and

1949.9999... = 31th dec 1949

Also, as Eric Duminil pointed out, your values seem to be month-rounded. 而且,正如埃里克·杜米尼尔(Eric Duminil)所指出的那样,您的价值观似乎是一个月的工作。 If that is true, then you can do something like this: 如果是这样,则可以执行以下操作:

import datetime
from dateutil.relativedelta import relativedelta

def floatToDate(date_as_float):
    year = int(date_as_float)
    months_offset = round((date_as_float - float(year)) * 12.0, 0)
    new_date = datetime.datetime(year,01,01,0,0,0,0)
    new_date = new_date + relativedelta(months=int(months_offset))
    return new_date

converted = floatToDate(1949.083333) # datetime 01-feb-1949

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM