![](/img/trans.png)
[英]How to create a datetime object from Year, Day, Hour and Minute columns (without month column) for 2 day data in Pandas?
[英]Pandas: How to create a datetime object from Week and Year?
我有一个数据框,它提供了两个整数列,其中包含一年中的年和周:
import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [2016,2016,2016,2017,2017,2017]
df = pd.DataFrame({"Week":L1,"Year":L2})
df
Out[72]:
Week Year
0 43 2016
1 44 2016
2 51 2016
3 2 2017
4 5 2017
5 12 2017
我需要从这两个数字创建一个日期时间对象。
我试过这个,但它抛出一个错误:
df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)
然后我尝试了这个,它有效但给出了错误的结果,即它完全忽略了这一周:
df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str)
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y'))
df
Out[74]:
Week Year S DT
0 43 2016 43-2016 2016-01-01
1 44 2016 44-2016 2016-01-01
2 51 2016 51-2016 2016-01-01
3 2 2017 2-2017 2017-01-01
4 5 2017 5-2017 2017-01-01
5 12 2017 12-2017 2017-01-01
我真的在 Python 的datetime
、Numpy 的datetime64
和 pandas Timestamp
之间迷失了,你能告诉我它是如何正确完成的吗?
我正在使用 Python 3,如果这有任何意义的话。
编辑:
从 Python 3.8 开始,使用 datetime.date 对象上新引入的方法可以轻松解决该问题: https : //docs.python.org/3/library/datetime.html#datetime.date.fromisocalendar
试试这个:
In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
pd.to_timedelta(df.Week.mul(7).astype(str) + ' days')
Out[19]:
0 2016-10-28
1 2016-11-04
2 2016-12-23
3 2017-01-15
4 2017-02-05
5 2017-03-26
dtype: datetime64[ns]
最初我在
s
有时间戳
从 UNIX 纪元时间戳解析它要容易得多:
df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')
10M 行 DF 的时序:
设置:
In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date'])
In [27]: df.shape
Out[27]: (10000000, 1)
In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9
In [30]: df
Out[30]:
date unix_ts
0 1970-01-01 00:00:00 0
1 1970-01-01 00:01:00 60
2 1970-01-01 00:02:00 120
3 1970-01-01 00:03:00 180
4 1970-01-01 00:04:00 240
5 1970-01-01 00:05:00 300
6 1970-01-01 00:06:00 360
7 1970-01-01 00:07:00 420
8 1970-01-01 00:08:00 480
9 1970-01-01 00:09:00 540
... ... ...
9999990 1989-01-05 10:30:00 599999400
9999991 1989-01-05 10:31:00 599999460
9999992 1989-01-05 10:32:00 599999520
9999993 1989-01-05 10:33:00 599999580
9999994 1989-01-05 10:34:00 599999640
9999995 1989-01-05 10:35:00 599999700
9999996 1989-01-05 10:36:00 599999760
9999997 1989-01-05 10:37:00 599999820
9999998 1989-01-05 10:38:00 599999880
9999999 1989-01-05 10:39:00 599999940
[10000000 rows x 2 columns]
检查:
In [31]: pd.to_datetime(df.unix_ts, unit='s')
Out[31]:
0 1970-01-01 00:00:00
1 1970-01-01 00:01:00
2 1970-01-01 00:02:00
3 1970-01-01 00:03:00
4 1970-01-01 00:04:00
5 1970-01-01 00:05:00
6 1970-01-01 00:06:00
7 1970-01-01 00:07:00
8 1970-01-01 00:08:00
9 1970-01-01 00:09:00
...
9999990 1989-01-05 10:30:00
9999991 1989-01-05 10:31:00
9999992 1989-01-05 10:32:00
9999993 1989-01-05 10:33:00
9999994 1989-01-05 10:34:00
9999995 1989-01-05 10:35:00
9999996 1989-01-05 10:36:00
9999997 1989-01-05 10:37:00
9999998 1989-01-05 10:38:00
9999999 1989-01-05 10:39:00
Name: unix_ts, Length: 10000000, dtype: datetime64[ns]
时间:
In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s')
10 loops, best of 3: 156 ms per loop
结论:我认为 156 毫秒转换 10.000.000 行并不算慢
就像@Gianmario Spacagna 提到的日期时间比 2018 年更高,使用%V
和%G
:
L1 = [43,44,51,2,5,12,52,53,1,2,5,52]
L2 = [2016,2016,2016,2017,2017,2017,2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})
df['new'] = pd.to_datetime(df.Week.astype(str)+
df.Year.astype(str).add('-1') ,format='%V%G-%u')
print (df)
Week Year new
0 43 2016 2016-10-24
1 44 2016 2016-10-31
2 51 2016 2016-12-19
3 2 2017 2017-01-09
4 5 2017 2017-01-30
5 12 2017 2017-03-20
6 52 2018 2018-12-24
7 53 2018 2018-12-31
8 1 2019 2018-12-31
9 2 2019 2019-01-07
10 5 2019 2019-01-28
11 52 2019 2019-12-23
从 2019 年开始的几周有些可疑。ISO-8601 标准将 2018 年 12 月 31 日指定为 2019 年的第 1 周。其他方法基于:
pd.to_datetime(df.Week.astype(str)+
df.Year.astype(str).add('-2') ,format='%W%Y-%w')
将提供从 2019 年开始转移的结果。
为了符合 ISO-8601 标准,您必须执行以下操作:
import pandas as pd
import datetime
L1 = [52,53,1,2,5,52]
L2 = [2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})
df['ISO'] = df['Year'].astype(str) + '-W' + df['Week'].astype(str) + '-1'
df['DT'] = df['ISO'].map(lambda x: datetime.datetime.strptime(x, "%G-W%V-%u"))
print(df)
它打印:
Week Year ISO DT
0 52 2018 2018-W52-1 2018-12-24
1 53 2018 2018-W53-1 2018-12-31
2 1 2019 2019-W1-1 2018-12-31
3 2 2019 2019-W2-1 2019-01-07
4 5 2019 2019-W5-1 2019-01-28
5 52 2019 2019-W52-1 2019-12-23
2018 年的第 53 周被忽略并映射到 2019 年的第 1 周。
如果您想遵循ISO 周日期
周从星期一开始。 每周的一年是星期四所在的公历年。 因此,一年中的第一周总是包含 1 月 4 日。 因此,在接近 1 月 1 日的某些日子里,ISO 周年编号与公历略有不同。
以下示例代码从 18Dec2016 Sun 开始生成 60 个日期的序列并添加适当的列。
它补充说:
示例代码如下:
# Generate Some Dates
dft1 = pd.DataFrame(pd.date_range('2016-12-18', freq='D', periods=60))
dft1.columns = ['e_FullDate']
dft1['e_FullDateWeekDay'] = dft1.e_FullDate.dt.day_name().str.slice(0,3)
#Add a Week Start Date (Monday)
dft1['e_week_start'] = dft1['e_FullDate'] - pd.to_timedelta(dft1['e_FullDate'].dt.weekday,
unit='D')
dft1['e_week_startWeekDay'] = dft1.e_week_start.dt.day_name().str.slice(0,3)
#Add a Week Start Year
dft1['e_week_start_yr'] = dft1.e_week_start.dt.year
#Add a Week Number of Week Start Monday
dft1['e_week_no'] = dft1['e_week_start'].dt.week
#Add a Week Start generate from Week Number and Year
dft1['e_week_start_from_week_no'] = pd.to_datetime(dft1.e_week_no.astype(str)+
dft1.e_week_start_yr.astype(str).add('-1') ,format='%W%Y-%w')
dft1['e_week_start_from_week_noWeekDay'] = dft1.e_week_start_from_week_no.dt.day_name().str.slice(0,3)
with pd.option_context('display.max_rows', 999, 'display.max_columns', 0, 'display.max_colwidth', 9999):
display(dft1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.