简体   繁体   English

Scala spark数据框电话号码验证

[英]Scala spark data frame phone number validation

I want to perform some validation on column Phone number mentioned below and then update the "Correct" column with "Y" (when the number looks valid) or "N" (when invalid).我想对下面提到的电话号码列执行一些验证,然后用“Y”(当号码看起来有效时)或“N”(当无效时)更新“正确”列。

在此处输入图像描述

val df = Seq[(String)]("", "  ", null, "123456789a", "1111111111", "1.3-4567 80", " 1.23-4567 890 ", "1234567890").toDF("PhoneNumber")

val trimmed = regexp_replace(trim($"PhoneNumber"), "[ .-]", "")
val correct = trimmed.rlike(raw"\d{10,}") && 
              !(trimmed.rlike(raw"^(\d)\1*$$"))
val df2 = df.withColumn("Correct", when(correct, "Y").otherwise("N"))
 
df2.show()
// +---------------+-------+
// |    PhoneNumber|Correct|
// +---------------+-------+
// |               |      N|
// |               |      N|
// |           null|      N|
// |     123456789a|      N|
// |     1111111111|      N|
// |    1.3-4567 80|      N|
// | 1.23-4567 890 |      Y|
// |     1234567890|      Y|
// +---------------+-------+

trim($"PhoneNumber") removes leading and trailing spaces trim($"PhoneNumber")删除前导和尾随空格
regexp_replace(..., "[ .-]", "") removes spaces, dots and commas regexp_replace(..., "[ .-]", "")删除空格、点和逗号
.rlike(raw"\d{10,}") checks for 10 or more digits .rlike(raw"\d{10,}")检查 10 位或更多位
!(....rlike(raw"^(\d)\1*$$")) checks for all the same digits !(....rlike(raw"^(\d)\1*$$"))检查所有相同的数字

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM