简体   繁体   English

将 DataFrame 列类型从字符串转换为日期时间

[英]Convert DataFrame column type from string to datetime

How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetime dtype?如何将 DataFrame 字符串列( dd/mm/yyyy格式)转换为 datetime dtype?

The easiest way is to use to_datetime :最简单的方法是使用to_datetime

df['col'] = pd.to_datetime(df['col'])

It also offers a dayfirst argument for European times (but beware this isn't strict ).它还为欧洲时间提供了dayfirst参数(但要注意这不是严格的)。

Here it is in action:这是在行动:

In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0   2005-05-23 00:00:00
dtype: datetime64[ns]

You can pass a specific format :您可以传递特定格式

In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0   2005-05-23
dtype: datetime64[ns]

If your date column is a string of the format '2017-01-01' you can use pandas astype to convert it to datetime.如果您的日期列是格式为“2017-01-01”的字符串,您可以使用 pandas astype 将其转换为日期时间。

df['date'] = df['date'].astype('datetime64[ns]')

or use datetime64[D] if you want Day precision and not nanoseconds或使用 datetime64[D] 如果您想要 Day 精度而不是纳秒

print(type(df_launath['date'].iloc[0]))

yields产量

<class 'pandas._libs.tslib.Timestamp'> the same as when you use pandas.to_datetime <class 'pandas._libs.tslib.Timestamp'>与使用 pandas.to_datetime 时相同

You can try it with other formats then '%Y-%m-%d' but at least this works.您可以尝试使用其他格式然后 '%Y-%m-%d' 但至少这是有效的。

You can use the following if you want to specify tricky formats:如果要指定棘手的格式,可以使用以下内容:

df['date_col'] =  pd.to_datetime(df['date_col'], format='%d/%m/%Y')

More details on format here:有关format更多详细信息,请访问:

If you have a mixture of formats in your date, don't forget to set infer_datetime_format=True to make life easier.如果您的日期中有多种格式,请不要忘记设置infer_datetime_format=True以使生活更轻松。

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

Source: pd.to_datetime来源: pd.to_datetime

or if you want a customized approach:或者如果你想要一个定制的方法:

def autoconvert_datetime(value):
    formats = ['%m/%d/%Y', '%m-%d-%y']  # formats to try
    result_format = '%d-%m-%Y'  # output format
    for dt_format in formats:
        try:
            dt_obj = datetime.strptime(value, dt_format)
            return dt_obj.strftime(result_format)
        except Exception as e:  # throws exception when format doesn't match
            pass
    return value  # let it be if it doesn't match

df['date'] = df['date'].apply(autoconvert_datetime)

Try this solution:试试这个解决方案:

  • Change '2022–12–31 00:00:00' to '2022–12–31 00:00:01' '2022–12–31 00:00:00' to '2022–12–31 00:00:01'
  • Then run this code: pandas.to_datetime(pandas.Series(['2022–12–31 00:00:01']))然后运行此代码: pandas.to_datetime(pandas.Series(['2022–12–31 00:00:01']))
  • Output: 2022–12–31 00:00:01 Output: 2022–12–31 00:00:01
Multiple datetime columns多个日期时间列

If you want to convert multiple string columns to datetime, then using apply() would be useful.如果要将多个字符串列转换为日期时间,那么使用apply()会很有用。

df[['date1', 'date2']] = df[['date1', 'date2']].apply(pd.to_datetime)

You can pass parameters to to_datetime as kwargs.您可以将参数作为 kwargs 传递给to_datetime

df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(pd.to_datetime, format="%m/%d/%Y")

Use format= to speed up使用format=来加速

If the column contains a time component and you know the format of the datetime/time, then passing the format explicitly would significantly speed up the conversion.如果该列包含时间部分并且您知道日期时间/时间的格式,那么显式传递格式将显着加快转换速度。 There's barely any difference if the column is only date, though.不过,如果该列只有日期,则几乎没有任何区别。 In my project, for a column with 5 millions rows, the difference was huge: ~2.5 min vs 6s.在我的项目中,对于具有 500 万行的列,差异是巨大的:~2.5 分钟对 6 秒。

It turns out explicitly specifying the format is about 25x faster.事实证明,明确指定格式大约快 25 倍。 The following runtime plot shows that there's a huge gap in performance depending on whether you passed format or not.以下运行时 plot 表明,根据您是否通过格式,性能存在巨大差距。

时机



The code used to produce the plot:用于生成 plot 的代码:

import perfplot
import random

mdYHM = range(1, 13), range(1, 29), range(2000, 2024), range(24), range(60)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')],
    labels=['pd.to_datetime(x)', "pd.to_datetime(x, format='%m/%d/%Y %H:%M')"],
    n_range=[2**k for k in range(19)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}" 
                               for m,d,Y,H,M in zip(*[random.choices(e, k=n) for e in mdYHM])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Pandas 数据框中将列类型从字符串转换为日期时间格式 - Convert the column type from string to datetime format in Pandas dataframe 如何在Python中有效地将字符串类型的数据帧列转换为datetime? - How to convert efficiently a dataframe column of string type into datetime in Python? 将 DataFrame 列类型从字符串转换为日期 - Convert DataFrame column type from string to date 将 dataframe 中的列从字符串类型转换为元组 - Convert a column in a dataframe from type string to tuple 将数据框中的对象(时间)类型列转换为日期时间 - Convert object (time) type column in dataframe to datetime 在pySpark中将字符串类型列转换为日期时间 - convert string type column to datetime in pySpark 从 Dataframe 列更新 SQL 列 Type Text 与 DateTime 类型 - Update SQL column Type Text with a DateTime type from Dataframe column 将字符串列直接转换为 Pandas DataFrame 中的日期格式(不是日期时间) - Convert String Column directly to Date format (not Datetime) in Pandas DataFrame 如何将字符串数据框列转换为datetime作为年份和周的格式? - How to convert string dataframe column to datetime as format with year and week? 仅当字符串长度不为零时,才将 dataframe 列转换为日期时间 - Convert dataframe column to datetime only if length of string is not zero
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM