[英]How to standardise different date formats in pandas?
I have a dataset in csv format which contains dates in a column.我有一个 csv 格式的数据集,其中包含一列中的日期。 I have imported this dataset into python pandas, and this date column is shown as an object.
我已将此数据集导入 python pandas,并且此日期列显示为一个对象。 I need to convert this column to date time but i have a problem.
我需要将此列转换为日期时间,但我遇到了问题。 This date column has date format in two formats 1. 11/7/2013 11:51 2. 13-07-2013 08:33:16
此日期列具有两种格式的日期格式 1. 11/7/2013 11:51 2. 13-07-2013 08:33:16
I need to convert one format to another one in order to have a standard date format in my python to do analysis.我需要将一种格式转换为另一种格式,以便在我的 python 中使用标准日期格式进行分析。 How can i do this?
我怎样才能做到这一点?
There are many rows of date in both these formats, so when i try to convert second format to first format using the below code这两种格式都有很多行日期,所以当我尝试使用下面的代码将第二种格式转换为第一种格式时
print(df['date'].apply(lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M')))
打印(df['date'].apply(lambda x:pd.to_datetime(x,format='%d/%m/%Y %H:%M')))
i get the below error我收到以下错误
ValueError: time data '13-07-2013 08:33:16' does not match format '%d/%m/%Y %H:%M' (match)
ValueError: 时间数据 '13-07-2013 08:33:16' 不匹配格式 '%d/%m/%Y %H:%M'(匹配)
so what would be the best method to standardise this column in one format?那么以一种格式标准化此列的最佳方法是什么?
尝试删除format
参数并在传递给pd.to_datetime
参数中设置infer_datetime_format=True
You can try the following 您可以尝试以下
import pandas as pd
import numpy as np
n=1000
ch = ['13-07-2013 08:33:16', '13/07/2013 08:33:16']
df = pd.DataFrame({"date": np.random.choice(ch,n)})
df["date"] = df["date"].str.replace("/","-").astype("M8[us]")
Update I just realize that the format I'm using is not the same you would like. 更新我只是意识到我使用的格式与您想要的格式不同。 I strongly suggest you to use a standard format as
YYYY-MM-DD
and datetime as type instead of string. 我强烈建议您使用标准格式(如
YYYY-MM-DD
和日期时间(而非字符串)作为类型。 There are a lot of post that explains why this is more efficient on RAM and then faster. 有很多文章解释了为什么这在RAM上效率更高,然后又更快。
A small comparative for a df with just 1000 rows 对于只有1000行的df的小比较
%%timeit
out = df["date"].str.replace("/","-").astype("M8[us]")
146 ms ± 5.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
out = df["date"].apply(lambda x: pd.to_datetime(x)\
.strftime('%d/%m/%Y %H:%M'))
396 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
out = df['date'].apply(lambda x: pd.to_datetime(x,
format='%d/%m/%Y %H:%M',
infer_datetime_format= True))
425 ms ± 4.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
its better to use strftime(): 最好使用strftime():
df = pd.DataFrame({'Date': ['11/7/2013 11:51','13-07-2013 08:33:16']})
df['Clean_Date'] = df.Date.apply(lambda x: pd.to_datetime(x).strftime('%d/%m/%Y %H:%M'))
print(df)
output: 输出:
Date Clean_Date
0 11/7/2013 11:51 07/11/2013 11:51
1 13-07-2013 08:33:16 13/07/2013 08:33
In Pandas v1 to_datetime
function is very robust and can handle most date formats.在 Pandas v1 中
to_datetime
函数非常健壮,可以处理大多数日期格式。 With your example dates it is as easy as calling to_datetime
on your series.使用您的示例日期,
to_datetime
在您的系列中调用to_datetime
一样简单。
d = ['11/7/2013 11:51', '13-07-2013 08:33:16']
df = pd.DataFrame({'dates': d})
df = pd.to_datetime(df['dates'])
df
output输出
0 2013-11-07 11:51:00
1 2013-07-13 08:33:16
Name: dates, dtype: datetime64[ns]
Just how robust is to_datetime? to_datetime 有多健壮?
lets tests it using a dataset of 25 different date styles from here .让我们使用来自此处的 25 种不同日期样式的数据集对其进行测试。
http = r'https://www.ibm.com/docs/en/cmofz/10.1.0?topic=SSQHWE_10.1.0/com.ibm.ondemand.mp.doc/arsa0257.htm'
table = pd.read_html(http)
df = table[0]
df
# test which datestyles pandas can convert
df['Example_clean'] = pd.to_datetime(df['Example'])
print(df.dtypes)
df
# yes it converted all 25 different date formats!!
# Note - when using a time it automatically uses todays date.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.