简体   繁体   中英

Scala spark data frame phone number validation

I want to perform some validation on column Phone number mentioned below and then update the "Correct" column with "Y" (when the number looks valid) or "N" (when invalid).

在此处输入图像描述

val df = Seq[(String)]("", "  ", null, "123456789a", "1111111111", "1.3-4567 80", " 1.23-4567 890 ", "1234567890").toDF("PhoneNumber")

val trimmed = regexp_replace(trim($"PhoneNumber"), "[ .-]", "")
val correct = trimmed.rlike(raw"\d{10,}") && 
              !(trimmed.rlike(raw"^(\d)\1*$$"))
val df2 = df.withColumn("Correct", when(correct, "Y").otherwise("N"))
 
df2.show()
// +---------------+-------+
// |    PhoneNumber|Correct|
// +---------------+-------+
// |               |      N|
// |               |      N|
// |           null|      N|
// |     123456789a|      N|
// |     1111111111|      N|
// |    1.3-4567 80|      N|
// | 1.23-4567 890 |      Y|
// |     1234567890|      Y|
// +---------------+-------+

trim($"PhoneNumber") removes leading and trailing spaces
regexp_replace(..., "[ .-]", "") removes spaces, dots and commas
.rlike(raw"\d{10,}") checks for 10 or more digits
!(....rlike(raw"^(\d)\1*$$")) checks for all the same digits

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM