简体   繁体   中英

Extract the repetitive parts of a String by Regex pattern matching in Scala

I have this code for extracting the repetitive : separated sections of a regex, which does not give me the right output.

val pattern = """([a-zA-Z]+)(:([a-zA-Z]+))*""".r

for (p <- pattern findAllIn "it:is:very:great just:because:it is") p match {

     case pattern("it", pattern(is, pattern(very, great))) => println("it: "+ is + very+ great)

     case pattern(it, _,rest) => println( it+" : "+ rest)

     case pattern(it, is, very, great) => println(it +" : "+ is +" : "+ very +" : " + great)

     case _ => println("match failure")
   }

What am I doing wrong?

How can I write a case expression which allows me to extract each : separated part of the pattern regex?

What is the right syntax with which to solve this?

How can the match against unknown number of arguments to be extracted from a regex be done?

In this case print:

it : is : very : great

just : because : it

is

You can't use repeated capturing group like that, it only saves the last captured value as the current group value.

You can still get the matches you need with a \b[a-zA-Z]+(?::[a-zA-Z]+)*\b regex and then split each match with : :

val text = "it:is:very:great just:because:it is"
val regex = """\b[a-zA-Z]+(?::[a-zA-Z]+)*\b""".r
val results = regex.findAllIn(text).map(_ split ':').toList
results.foreach { x => println(x.mkString(", ")) }
// => it, is, very, great
//    just, because, it
//    is

See the Scala demo . Regex details :

  • \b - word boundary
  • [a-zA-Z]+ - one or more ASCII letters
  • (?::[a-zA-Z]+)* - zero or more repetitions of
    • : - a colon
    • [a-zA-Z]+ - one or more ASCII letters
  • \b - word boundary

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM