[英]Converting columns in pandas to_datetime with specific format
I have the following code:我有以下代码:
import pandas as pd
import datetime
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States")[1]
df = df[:-1]
df.rename(columns={'Presidency[a].1':"Term"}, inplace = True)
df[['Start', 'End']] = df.Term.str.split("–", expand = True)
df['Start'] = pd.to_datetime(df['Start'].str.strip(), format = '%B %d, %Y', dayfirst = False)
When i run this code I get the following error:当我运行此代码时,出现以下错误:
ValueError: unconverted data remains: [i]
Please advise请指教
When I checked the data, it has some noisy entries like:当我检查数据时,它有一些嘈杂的条目,例如:
'March 4, 1913',
'March 4, 1913',
'March 4, 1921',
'August 2, 1923[r]',
'August 2, 1923[r]',
You will have to clear them by splitting over df.Start = pd.Series([i.split('[')[0] for i in df.Start.tolist()])
您必须通过拆分df.Start = pd.Series([i.split('[')[0] for i in df.Start.tolist()])
来清除它们
Then it should work fine.那么它应该可以正常工作。 See the output:查看输出:
In [28]: df.Start = pd.to_datetime(df['Start'].str.strip(), format = '%B %d, %Y', dayfirst = False)
In [29]: df.Start
Out[29]:
0 1789-04-30
1 1789-04-30
2 1797-03-04
Or just...要不就...
df.Start = df.Start.str.split("[", expand=True)
before converting to datetime在转换为日期时间之前
Several of the dates had annotations such as [i]
at the end of the string.一些日期在字符串的末尾有诸如[i]
注释。
The following uses pandas string replacement with a regular expression to remove the problematic annotations.下面使用带有正则表达式的 pandas 字符串替换来删除有问题的注释。
df['Start'] = pd.to_datetime(df['Start'].str.replace("\\[[az]\\]", "", regex=True))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.