简体   繁体   English

识别 pandas dataframe 列中的无效日期

[英]Identify invalid dates in pandas dataframe columns

Suppose we had the following dataframe-假设我们有以下数据框-

How can I create the fourth column 'Invalid dates' as specified below using the first three columns in the dataframe?如何使用 dataframe 中的前三列创建如下指定的第四列“无效日期”?

  Name       Date1       Date2  Invalid dates
0    A  01-02-2022  03-04-2000           None
1    B          23  12-12-2012          Date1
2    C  18-04-1993         abc          Date2
3    D          45         qcf   Date1, Date2

You can select the Dates column with filter (or any other method, including a manual list), compute a Series of invalid dates by converting to_datetime and sub-selecting the NaN values (ie invalid dates) with isna ,then stack and join to the original DataFrame:您可以使用filter (或任何其他方法,包括手动列表)select 日期列,通过转换to_datetime并使用isna子选择 NaN 值(即无效日期)来计算一系列无效日期,然后stackjoin原DataFrame:

s = (df
     .filter(like='Date') # keep only "Date" columns
      # convert to datetime, NaT will be invalid dates
     .apply(lambda s: pd.to_datetime(s, format='%d-%m-%Y', errors='coerce'))
     .isna()
     # reshape to long format (Series)
     .stack()
    )

out = (df
       .join(s[s].reset_index(level=1)    # keep only invalid dates
            .groupby(level=0)['level_1']  # for all initial indices
            .agg(','.join)                # join the column names
            .rename('Invalid Dates')
           )
       )

alternative with melt to reshape the DataFrame:melt替代 DataFrame 重塑:

cols = df.filter(like='Date').columns

out = df.merge(
    df.melt(id_vars='Name', value_vars=cols, var_name='Invalid Dates')
      .assign(value=lambda d: pd.to_datetime(d['value'], format='%d-%m-%Y',
                                             errors='coerce'))
      .loc[lambda d: d['value'].isna()]
      .groupby('Name')['Invalid Dates'].agg(','.join),
    left_on='Name', right_index=True, how='left'
)

output: output:

  Name       Date1       Date2 Invalid Dates
0    A  01-02-2022  03-04-2000           NaN
1    B          23  12-12-2012         Date1
2    C  18-04-1993         abc         Date2
3    D          45         qcf   Date1,Date2

Use DataFrame.filter for filter columns with substring Date , then convert to datetimes by to_datetime all columns of df1 with errors='coerce' for missing values if no match, so possible test them by DataFrame.isna and by DataFrame.dot extract columnsnames separated by , :使用DataFrame.filter过滤带有 substring Date的列,然后通过to_datetimedf1的所有列转换为 datetimes,如果不匹配则使用errors='coerce'缺少值,因此可以通过DataFrame.isnaDataFrame.dot来测试它们通过, :

df1 = df.filter(like='Date')
df['Invalid dates']=((df1.apply(lambda x:pd.to_datetime(x,format='%d-%m-%Y',errors='coerce'))
                        .isna() & df1.notna())
                        .dot(df1.columns + ',')
                        .str[:-1]
                        .replace('', np.nan))

print (df)
  Name       Date1       Date2 Invalid dates
0    A  01-02-2022  03-04-2000           NaN
1    B          23  12-12-2012         Date1
2    C  18-04-1993         abc         Date2
3    D          45         qcf   Date1,Date2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM