简体   繁体   中英

Validate Date strict to format - more than 4 character for year - pySpark

I am trying to validate date received in file against configured date format(using to_timestamp /to_date).

schema = StructType([ \
    StructField("date",StringType(),True), \
    StructField("active", StringType(), True)
    ])

input_data = [
    ("27/10/00",'Y'),
    ("27/10/01",'Y'),
    ("27/10/1",'Y'),
    ("27/10/16",'Y'),
    ("27/10",'Y'),
    ("27/10/99",'Y'),
    ("27/10/100",'Y'),
    ("27/10/992",'Y'),
    ("27/10/2000",'Y'),
    ("27/10/9999",'Y'),
    ("27/10/20015",'Y'), ]
date_format = "dd/MM/yyyy"

temp_df = spark.createDataFrame(data=input_data,schema=schema)

df = temp_df.select('*', 
                f.when(f.date_format(f.to_timestamp(f.col('date'), date_format), date_format).isNotNull(), True).otherwise(False).alias('Date_validation'),
                f.date_format(f.to_timestamp(f.col('date'), date_format), date_format).alias('converted_date'),
                )
df.show(truncate=False)

my expectation is the code should return False since YYYY is a four character date but the code returns True and prints the converted year as valid one.

在此处输入图像描述

is there anyway I could to a strict validation on date?

note: formats are configured by user which will be dynamic.

tried to_timestamp and to_date both are having same output

This is a bug since 2.4 and had been fixed from 3.0 forward. More information here . In short, there is a configuration called spark.sql.legacy.timeParserPolicy with 3 possible values:

  • LEGACY : Perform like 2.4 where you see the bug
  • EXCEPTION : Raise exceptions when has an invalid format
  • CORRECTED : Fix and return your expected result

This is the result with CORRECTED :

+-----------+------+---------------+--------------+
|date       |active|Date_validation|converted_date|
+-----------+------+---------------+--------------+
|27/10/00   |Y     |false          |null          |
|27/10/01   |Y     |false          |null          |
|27/10/1    |Y     |false          |null          |
|27/10/16   |Y     |false          |null          |
|27/10      |Y     |false          |null          |
|27/10/99   |Y     |false          |null          |
|27/10/100  |Y     |false          |null          |
|27/10/992  |Y     |false          |null          |
|27/10/2000 |Y     |true           |27/10/2000    |
|27/10/9999 |Y     |true           |27/10/9999    |
|27/10/20015|Y     |false          |null          |
+-----------+------+---------------+--------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM