简体   繁体   English

熊猫 read_csv 解析日期

[英]pandas read_csv parse dates

I have written this date parsing function我写了这个日期解析函数

def date_parser(string):
   try:
       date = pd.datetime.strptime(string, "%d/%m/%Y")
   except:
       date = pd.NaT
   return date

and I call it in pd.read_csv like this我像这样在 pd.read_csv 中调用它

df = pd.read_csv(os.path.join(path, file),
                 sep=";",
                 encoding="latin-1",
                 keep_default_na=False,
                 na_values=na_values,
                 index_col=False,
                 usecols=keep,
                 dtype=dtype,
                 date_parser=date_parser,
                 parse_dates=dates)

The problem is that in one of my dates column, I end up with mixed data types问题是在我的日期列之一中,我最终得到了混合数据类型

df[data].apply(type).value_counts()
  • class 'datetime.datetime'类“日期时间.日期时间”
  • class 'pandas._libs.tslibs.timestamps.Timestamp'类'pandas._libs.tslibs.timestamps.Timestamp'
  • class 'pandas._libs.tslibs.nattype.NaTType'类'pandas._libs.tslibs.nattype.NaTType'

I should only have the last two right?我应该只有最后两个吧?

I suggest change your function by to_datetime with errors='coerce' for return NaT if not matched format %d/%m/%Y :如果格式不匹配%d/%m/%Y我建议通过to_datetime更改您的函数, to_datetime使用errors='coerce'返回NaT

def date_parser(string):
   return pd.to_datetime(string, format="%d/%m/%Y", errors='coerce')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM