简体   繁体   English

将 Pandas 中的列转换为特定格式的日期时间

[英]Converting columns in pandas to_datetime with specific format

I have the following code:我有以下代码:

import pandas as pd
import datetime
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States")[1]
df = df[:-1]
df.rename(columns={'Presidency[a].1':"Term"}, inplace = True)
df[['Start', 'End']] = df.Term.str.split("–", expand = True)
df['Start'] = pd.to_datetime(df['Start'].str.strip(), format = '%B %d, %Y', dayfirst = False)

When i run this code I get the following error:当我运行此代码时,出现以下错误:

ValueError: unconverted data remains: [i]

Please advise请指教

When I checked the data, it has some noisy entries like:当我检查数据时,它有一些嘈杂的条目,例如:

 'March 4, 1913',
 'March 4, 1913',
 'March 4, 1921',
 'August 2, 1923[r]',
 'August 2, 1923[r]',

You will have to clear them by splitting over df.Start = pd.Series([i.split('[')[0] for i in df.Start.tolist()])您必须通过拆分df.Start = pd.Series([i.split('[')[0] for i in df.Start.tolist()])来清除它们

Then it should work fine.那么它应该可以正常工作。 See the output:查看输出:

In [28]: df.Start = pd.to_datetime(df['Start'].str.strip(), format = '%B %d, %Y', dayfirst = False)

In [29]: df.Start
Out[29]: 
0    1789-04-30
1    1789-04-30
2    1797-03-04

Or just...要不就...

df.Start = df.Start.str.split("[", expand=True)

before converting to datetime在转换为日期时间之前

Several of the dates had annotations such as [i] at the end of the string.一些日期在字符串的末尾有诸如[i]注释。

The following uses pandas string replacement with a regular expression to remove the problematic annotations.下面使用带有正则表达式的 pandas 字符串替换来删除有问题的注释。

df['Start'] = pd.to_datetime(df['Start'].str.replace("\\[[az]\\]", "", regex=True))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM