简体   繁体   English

如何将带有 Excel Serial Dates 和常规日期的列转换为 Pandas 日期时间?

[英]How to convert a column with Excel Serial Dates and regular dates to a pandas datetime?

I have a dataframe where there are birthdays that have regular dates mixed with Excel serial dates like this:我有一个数据框,其中有些生日的日期与 Excel 序列日期混合在一起,如下所示:

09/01/2020 12:00:00 AM
05/15/1985 12:00:00 AM
06/07/2013 12:00:00 AM
33233
26299
29428

I tried a solution from this answer , and all of the dates that are in the Excel serial format are blanked out, while preserving those that were in a normal date format.我从这个答案中尝试了一个解决方案,所有 Excel 串行格式的日期都被清除,同时保留那些处于正常日期格式的日期。

This is my code:这是我的代码:

import pandas as pd
import xlrd
import numpy as np
from numpy import *
from numpy.core import *
import os
import datetime
from datetime import datetime, timedelta
import glob

def from_excel_ordinal(ordinal, _epoch0=datetime(1899, 12, 31)):
    if ordinal >= 60:
        ordinal -= 1  # Excel leap year bug, 1900 is not a leap year!
    return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0)

path = 'C:\\Input'
os.chdir(path)
filelist = glob.glob('*BLAH*.xlsx')  
filename = os.fsdecode(filelist[0])
df = pd.read_excel(filename, sheet_name = 'Blah Blah') 
m = df['Birthday'].astype(str).str.isdigit()
df.loc[m, 'Birthday'] = df.loc[m, 'Birthday'].astype(int).apply(from_excel_ordinal)
df['Birthday'] = pd.to_datetime(df['Birthday'], errors = 'coerce')

I am not sure where I am going wrong with this since the code shouldn't be blanking out the birthdays like it is doing.我不确定我哪里出错了,因为代码不应该像现在这样将生日消隐。

  • All the dates can't be parsed in the same manner不能以相同的方式解析所有日期
  • Load the dataframe加载数据框
  • Cast the dates column as a str if it's not already.如果尚未将dates列转换为str
  • Use Boolean Indexing to select different date types使用布尔索引选择不同的日期类型
    • Assuming regular dates contain a /假设常规日期包含/
    • Assuming Excel serial dates do not contain a /假设 Excel 序列日期不包含/
  • Fix each dataframe separately based on its datetime type根据日期时间类型分别修复每个数据框
  • Concat the dataframes back together. CONCAT的dataframes重新走到一起。
import pandas as pd
from datetime import datetime

# load data
df = pd.DataFrame({'dates': ['09/01/2020', '05/15/1985', '06/07/2013', '33233', '26299', '29428']})

# display(df)

        dates
0  09/01/2020
1  05/15/1985
2  06/07/2013
3       33233
4       26299
5       29428

# set the column type as a str if it isn't already
df.dates = df.dates.astype('str')

# create a date mask based on the string containing a /
date_mask = df.dates.str.contains('/')

# split the dates out for excel
df_excel = df[~date_mask].copy()

# split the regular dates out
df_reg = df[date_mask].copy()

# convert reg dates to datetime
df_reg.dates = pd.to_datetime(df_reg.dates)

# convert excel dates to datetime; the column needs to be cast as ints
df_excel.dates = pd.TimedeltaIndex(df_excel.dates.astype(int), unit='d') + datetime(1900, 1, 1)

# combine the dataframes
df = pd.concat([df_reg, df_excel])

display(df)显示(df)

       dates
0 2020-09-01
1 1985-05-15
2 2013-06-07
3 1990-12-28
4 1972-01-03
5 1980-07-28

pd.TimedeltaIndex(dates_in_excel_serial_format, unit='d') + pd.datetime(1900,1,1) pd.TimedeltaIndex(dates_in_excel_serial_format, unit='d') + pd.datetime(1900,1,1)

Demo:演示:

> dates_in_excel_serial_format = [29428]
> pd.TimedeltaIndex(dates_in_excel_serial_format, unit='d') + pd.datetime(1900,1,1)
< DatetimeIndex(['1980-07-28'], dtype='datetime64[ns]', freq=None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 pandas 日期时间列转换为 Excel 序列日期 - Convert pandas datetime column to Excel serial date 转换字符串日期的 pandas 列以与 datetime.date 进行比较 - convert a pandas column of string dates to compare with datetime.date Python Pandas日期时间,如何将这些日期转换为Pandas日期时间? - Python pandas date time, how to convert these dates to pandas datetime? 如何在导入的 excel 列中使用 pandas 和日期时间计算重复日期的数量? - How do I count the number of repeated dates, using pandas & datetime, within an imported excel column? Pandas 日期 - 将日期时间转换为日期,包括 NaT - Pandas dates - Convert datetime to date including NaT 将日期转换为 numpy 日期时间 - Convert dates to numpy datetime 如何使用包含 1970 年之前的日期的非标准格式将 Pandas 系列字符串转换为 Pandas 日期时间 - How to convert Pandas Series of strings to Pandas datetime with non-standard formats that contain dates before 1970 如何将字符串日期列表转换为 python 中的日期时间 - How to convert list of string dates to datetime in python 如何检查datetime值列是否在日期列表中? - How to check if column of datetime values is in a list of dates? 在熊猫中处理日期-在日期时间中删除看不见的字符并转换为字符串 - working with dates in pandas - remove unseen characters in datetime and convert to string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM