[英]Identify invalid dates in pandas dataframe columns
Suppose we had the following dataframe-假设我们有以下数据框-
How can I create the fourth column 'Invalid dates' as specified below using the first three columns in the dataframe?如何使用 dataframe 中的前三列创建如下指定的第四列“无效日期”?
Name Date1 Date2 Invalid dates
0 A 01-02-2022 03-04-2000 None
1 B 23 12-12-2012 Date1
2 C 18-04-1993 abc Date2
3 D 45 qcf Date1, Date2
You can select the Dates column with filter
(or any other method, including a manual list), compute a Series of invalid dates by converting to_datetime
and sub-selecting the NaN values (ie invalid dates) with isna
,then stack
and join
to the original DataFrame:您可以使用
filter
(或任何其他方法,包括手动列表)select 日期列,通过转换to_datetime
并使用isna
子选择 NaN 值(即无效日期)来计算一系列无效日期,然后stack
并join
原DataFrame:
s = (df
.filter(like='Date') # keep only "Date" columns
# convert to datetime, NaT will be invalid dates
.apply(lambda s: pd.to_datetime(s, format='%d-%m-%Y', errors='coerce'))
.isna()
# reshape to long format (Series)
.stack()
)
out = (df
.join(s[s].reset_index(level=1) # keep only invalid dates
.groupby(level=0)['level_1'] # for all initial indices
.agg(','.join) # join the column names
.rename('Invalid Dates')
)
)
alternative with melt
to reshape the DataFrame:用
melt
替代 DataFrame 重塑:
cols = df.filter(like='Date').columns
out = df.merge(
df.melt(id_vars='Name', value_vars=cols, var_name='Invalid Dates')
.assign(value=lambda d: pd.to_datetime(d['value'], format='%d-%m-%Y',
errors='coerce'))
.loc[lambda d: d['value'].isna()]
.groupby('Name')['Invalid Dates'].agg(','.join),
left_on='Name', right_index=True, how='left'
)
output: output:
Name Date1 Date2 Invalid Dates
0 A 01-02-2022 03-04-2000 NaN
1 B 23 12-12-2012 Date1
2 C 18-04-1993 abc Date2
3 D 45 qcf Date1,Date2
Use DataFrame.filter
for filter columns with substring Date
, then convert to datetimes by to_datetime
all columns of df1
with errors='coerce'
for missing values if no match, so possible test them by DataFrame.isna
and by DataFrame.dot
extract columnsnames separated by ,
:使用
DataFrame.filter
过滤带有 substring Date
的列,然后通过to_datetime
将df1
的所有列转换为 datetimes,如果不匹配则使用errors='coerce'
缺少值,因此可以通过DataFrame.isna
和DataFrame.dot
来测试它们通过,
:
df1 = df.filter(like='Date')
df['Invalid dates']=((df1.apply(lambda x:pd.to_datetime(x,format='%d-%m-%Y',errors='coerce'))
.isna() & df1.notna())
.dot(df1.columns + ',')
.str[:-1]
.replace('', np.nan))
print (df)
Name Date1 Date2 Invalid dates
0 A 01-02-2022 03-04-2000 NaN
1 B 23 12-12-2012 Date1
2 C 18-04-1993 abc Date2
3 D 45 qcf Date1,Date2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.