简体   繁体   English

验证 sql spark java 的列

[英]Validate columns of sql spark java

Database entry:数据库入口:

id: 9
message: {"Start Date":"11-06-2020","End Date":"11-06-2020"}
Group: NULL

id: 10
message: {"Start Date":"11-06-2020","End Date":"11-06-2020"}
Group: NULL

How can i validate the message column in database and see if the start date is in correct format or not?如何验证数据库中的消息列并查看开始日期的格式是否正确?

My spark java code:我的火花 java 代码:

 String sqlQuery="select * from emp"; 
     Dataset<Row> df = spark.read().format("jdbc")
                .option("url", "jdbc:mysql://localhost:3306/employee")
                .option("query",sqlQuery)
                .option("user", "root")
                .option("password", "root")
                .load();

You can put a schema structure for "message" section and extract the start and end date out of it.您可以为“消息”部分放置一个模式结构,并从中提取开始和结束日期。

And you can create a custom UDF like "isValidTimestamp" to validate the start and end time.您可以创建一个自定义 UDF,如“isValidTimestamp”来验证开始和结束时间。

//  Sample code
import datetime
def isValidTimestamp(inputdate):
try:
    datetime.strptime(inputdate, '%Y-%m-%d %H:%M:%S')
    return True
except ValeError:
    return False

Also you can refer to this document for more information on timestamp validation in Spark.您还可以参考此文档以获取有关 Spark 中时间戳验证的更多信息。

https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM