Given structured data in a string format how do I extract parts of the data effectively using pattern matching and regular expressions?
Example:
val input = Seq("name-12345","inval1d-12345","invalid-12here123","hello-54321","inval1d-1aa2")
case class Client(name:Option[String],clientID:Option[Int])
def parseClient(input:String):Option[Client] = {
val clientRegex = """([a-zA-Z]+)-([0-9]+)""".r
Option(input).flatMap(in => {
in match {
case clientRegex(name,clientID) => Some(Client(Some(name),Some(clientID.toInt)))
case _ => None
}
})
}
input.map(parseClient)
The issue with this however is that if I fail to validate a single part of the structured data then I parse None of it.
How could I use regular expressions to define in a hierarchical manor such as:
val nameRegex = """([a-zA-Z]+)""".r
val clientIDRegex = """([0-9]+)""".r
Then match these combined within a pattern?
The output from the example:
Seq(
Some(Client(Some("name"),Some(12345)))
,None
,None
,Some(Client(Some("hello"),Some(54321)))
,None
)
The required output:
Seq(
Some(Client(Some("name"),Some(12345)))
,Some(Client(None,Some(12345)))
,Some(Client(Some("invalid"),None))
,Some(Client(Some("hello"),Some(54321)))
,None
)
This should give the expected outcome:
val input = Seq("name-12345", "inval1d-12345", "invalid-12here123", "hello-54321")
case class Client(name: Option[String], clientID: Option[Int])
def parseClient(input: String): Option[Client] = {
val clientRegex = """(?:([a-zA-Z^-]+)|[^-]*)-(?:([0-9]+)|.*)""".r
input match {
case clientRegex(null, null) => None
case clientRegex(name, id) => Some(Client(Option(name), Option(id).map(_.toInt)))
case _ =>
None
}
}
input.map(parseClient)
I removed the flatMap construct since this was unnecessary. Interesting part here is the regex:
"""(?:([a-zA-Z^-]+)|[^-]*)-(?:([0-9]+)|.*)"""
I made changed it so it expects either the correct values and therefore captures it in the group ( ([a-zA-Z^-]+)
for name and ([0-9]+)
for id ) but also added the other cases (no valid name or id). Everything is in non-capture groups (?:) so it is grouped correctly.
If something is not as expected in the capture groups, the group will be null, which is handled in the match-case.
EDIT Made a correction to the code so that it works for completely invalid input and removed unnecessary if-statements
EDIT 2 Adapted the code according to comment of OP taking advantage of Option(null) => None evaluation
You are probably looking for something like applicative you can chain. you can do something like this using the Validated
type from cats:
val houseNumber = parseClient("house_number").andThen{ n =>
if (isValid(n)) Validated.valid(n)
else Validated.invalid(ParseError("house_number"))
}
and I would opt to using to atto : it has the ParseResult
type the keeps all the information on parsing the string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.