简体   繁体   English

如何在Python中有效地将字符串类型的数据帧列转换为datetime?

[英]How to convert efficiently a dataframe column of string type into datetime in Python?

I have a column with IDs and the time is encoded within. 我有一个带有ID的列,时间在其中编码。 For example: 例如:

0    020160910223200_T1
1    020160910223200_T1
2    020160910223203_T1
3    020160910223203_T1
4    020160910223206_T1
5    020160910223206_T1
6    020160910223209_T1
7    020160910223209_T1
8    020160910223213_T1
9    020160910223213_T1

If we remove the first and the last three characters, we obtain for the first row: 20160910223200 which should be converted to 2016-09-10 22:32:00. 如果我们删除第一个和最后三个字符,我们获得第一行:20160910223200,应该转换为2016-09-10 22:32:00。

My solution was to write a function which truncates the IDs and transforms to a datetime. 我的解决方案是编写一个截断ID并转换为日期时间的函数。 Then, I applied this function to my df column. 然后,我将此函数应用于我的df列。

from datetime import datetime
def MeasureIDtoTime(MeasureID):
    MeasureID = str(MeasureID)
    MeasureID = MeasureID[1:14]
    Time = datetime.strptime(MeasureID, '%Y%m%d%H%M%S')
    return Time
df['Time'] = df['MeasureID'].apply(MeasureIDtoTime)

This works properly, however is slow for my case. 这可以正常工作,但对我的情况来说速度很慢。 I have to deal with more than 20 million rows, and I need a faster solution. 我必须处理超过2000万行,我需要更快的解决方案。 Any idea for a more efficient solution? 想要更高效的解决方案吗?

Update 更新

According to @MaxU there is a better solution: 根据@MaxU,有一个更好的解决方案:

pd.to_datetime(df.ID.str[1:-3], format = '%Y%m%d%H%M%S')

This does the job in 32 seconds for 7.2 million rows. 对于720万行,这可以在32秒内完成工作。 However, in R thanks to lubridate::ymd_hms() function, I performed the task in less then 2 seconds. 但是,在R中,由于lubridate::ymd_hms()函数,我在不到2秒的时间内完成了任务。 So I am wondering if there exists a better solution for my problem in Python. 所以我想知道在Python中是否存在更好的解决方案。

UPDATE: performance optimization... 更新:性能优化......

Let's try to optimize it a little bit 我们试着稍微优化一下

DF shape: 50.000 x 1 DF形状:50.000 x 1

In [220]: df.head()
Out[220]:
                   ID
0  020160910223200_T1
1  020160910223200_T1
2  020160910223203_T1
3  020160910223203_T1
4  020160910223206_T1

In [221]: df.shape
Out[221]: (50000, 1)

In [222]: len(df)
Out[222]: 50000

Timing: 定时:

In [223]: %timeit df['ID'].apply(MeasureIDtoTime)
1 loop, best of 3: 929 ms per loop

In [224]: %timeit pd.to_datetime(df.ID.str[1:-3])
1 loop, best of 3: 5.68 s per loop

In [225]: %timeit pd.to_datetime(df.ID.str[1:-3], format='%Y%m%d%H%M%S')
1 loop, best of 3: 267 ms per loop    ### WINNER !

Conclusion: explicitly specifying the datetime format speeds it up 21 times. 结论:明确指定日期时间格式可将其加速21次。

NOTE: it's possible only if you have a constant datetime format 注意:只有当您具有恒定的日期时间格式时才可能

OLD answer: 老答案:

In [81]: pd.to_datetime(df.ID.str[1:-3])
Out[81]:
0   2016-09-10 22:32:00
1   2016-09-10 22:32:00
2   2016-09-10 22:32:03
3   2016-09-10 22:32:03
4   2016-09-10 22:32:06
5   2016-09-10 22:32:06
6   2016-09-10 22:32:09
7   2016-09-10 22:32:09
8   2016-09-10 22:32:13
9   2016-09-10 22:32:13
Name: ID, dtype: datetime64[ns]

where df is: 其中df是:

In [82]: df
Out[82]:
                   ID
0  020160910223200_T1
1  020160910223200_T1
2  020160910223203_T1
3  020160910223203_T1
4  020160910223206_T1
5  020160910223206_T1
6  020160910223209_T1
7  020160910223209_T1
8  020160910223213_T1
9  020160910223213_T1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 DataFrame 列类型从字符串转换为日期时间 - Convert DataFrame column type from string to datetime 在 Pandas 数据框中将列类型从字符串转换为日期时间格式 - Convert the column type from string to datetime format in Pandas dataframe 将字符串转换为datetime - python数据帧 - Convert string to datetime - python dataframe 如何使用日期时间 ohlcv 有效地将数组转换为 pandas dataframe,还将列值除以 100? - How to convert Array to pandas dataframe with datetime ohlcv efficiently, also divide column values by 100? 如何将字符串数据框列转换为datetime作为年份和周的格式? - How to convert string dataframe column to datetime as format with year and week? 将数据框中的对象(时间)类型列转换为日期时间 - Convert object (time) type column in dataframe to datetime 如何将字符串转换为 python 上 dataframe 列中的列表? - how to convert a string to a list in a dataframe column on python? 如何将一个integer类型的列转换为在python中键入datetime? - How to convert a column of type integer to type datetime in python? 如何将 Dataframe 的字符串日期时间转换为日期时间 - How to convert string datetime of a Dataframe into Datetime 如何仅在python中的Datetime类型的Dataframe列中保留时间 - How to only keep time from Dataframe column of type Datetime in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM