简体   繁体   English

如何标准化熊猫中的不同日期格式?

[英]How to standardise different date formats in pandas?

I have a dataset in csv format which contains dates in a column.我有一个 csv 格式的数据集,其中包含一列中的日期。 I have imported this dataset into python pandas, and this date column is shown as an object.我已将此数据集导入 python pandas,并且此日期列显示为一个对象。 I need to convert this column to date time but i have a problem.我需要将此列转换为日期时间,但我遇到了问题。 This date column has date format in two formats 1. 11/7/2013 11:51 2. 13-07-2013 08:33:16此日期列具有两种格式的日期格式 1. 11/7/2013 11:51 2. 13-07-2013 08:33:16

I need to convert one format to another one in order to have a standard date format in my python to do analysis.我需要将一种格式转换为另一种格式,以便在我的 python 中使用标准日期格式进行分析。 How can i do this?我怎样才能做到这一点?

There are many rows of date in both these formats, so when i try to convert second format to first format using the below code这两种格式都有很多行日期,所以当我尝试使用下面的代码将第二种格式转换为第一种格式时

print(df['date'].apply(lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M')))打印(df['date'].apply(lambda x:pd.to_datetime(x,format='%d/%m/%Y %H:%M')))

i get the below error我收到以下错误

ValueError: time data '13-07-2013 08:33:16' does not match format '%d/%m/%Y %H:%M' (match) ValueError: 时间数据 '13-07-2013 08:33:16' 不匹配格式 '%d/%m/%Y %H:%M'(匹配)

so what would be the best method to standardise this column in one format?那么以一种格式标准化此列的最佳方法是什么?

尝试删除format参数并在传递给pd.to_datetime参数中设置infer_datetime_format=True

You can try the following 您可以尝试以下

import pandas as pd
import numpy as np


n=1000
ch = ['13-07-2013 08:33:16', '13/07/2013 08:33:16']
df = pd.DataFrame({"date": np.random.choice(ch,n)})

df["date"] = df["date"].str.replace("/","-").astype("M8[us]")

Update I just realize that the format I'm using is not the same you would like. 更新我只是意识到我使用的格式与您想要的格式不同。 I strongly suggest you to use a standard format as YYYY-MM-DD and datetime as type instead of string. 我强烈建议您使用标准格式(如YYYY-MM-DD和日期时间(而非字符串)作为类型。 There are a lot of post that explains why this is more efficient on RAM and then faster. 有很多文章解释了为什么这在RAM上效率更高,然后又更快。

A small comparative for a df with just 1000 rows 对于只有1000行的df的小比较

%%timeit
out = df["date"].str.replace("/","-").astype("M8[us]")

146 ms ± 5.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


%%timeit
out = df["date"].apply(lambda x: pd.to_datetime(x)\
                                   .strftime('%d/%m/%Y %H:%M'))

396 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


%%timeit
out = df['date'].apply(lambda x: pd.to_datetime(x,
                       format='%d/%m/%Y %H:%M',
                       infer_datetime_format= True))

425 ms ± 4.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

its better to use strftime(): 最好使用strftime():

df = pd.DataFrame({'Date': ['11/7/2013 11:51','13-07-2013 08:33:16']})
df['Clean_Date'] = df.Date.apply(lambda x: pd.to_datetime(x).strftime('%d/%m/%Y %H:%M'))
print(df)

output: 输出:

                  Date        Clean_Date
0      11/7/2013 11:51  07/11/2013 11:51
1  13-07-2013 08:33:16  13/07/2013 08:33

In Pandas v1 to_datetime function is very robust and can handle most date formats.在 Pandas v1 中to_datetime函数非常健壮,可以处理大多数日期格式。 With your example dates it is as easy as calling to_datetime on your series.使用您的示例日期, to_datetime在您的系列中调用to_datetime一样简单。

d = ['11/7/2013 11:51', '13-07-2013 08:33:16']
df = pd.DataFrame({'dates': d})
df = pd.to_datetime(df['dates'])
df

output输出

0   2013-11-07 11:51:00
1   2013-07-13 08:33:16
Name: dates, dtype: datetime64[ns]

Just how robust is to_datetime? to_datetime 有多健壮?

lets tests it using a dataset of 25 different date styles from here .让我们使用来自此处的 25 种不同日期样式的数据集对其进行测试。

http = r'https://www.ibm.com/docs/en/cmofz/10.1.0?topic=SSQHWE_10.1.0/com.ibm.ondemand.mp.doc/arsa0257.htm'
table = pd.read_html(http)
df = table[0]
df

# test which datestyles pandas can convert
df['Example_clean'] = pd.to_datetime(df['Example'])
print(df.dtypes)
df
# yes it converted all 25 different date formats!!
# Note - when using a time it automatically uses todays date.

output输出在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM