简体   繁体   中英

Scala - how to extract a path after an s3:/ prefix with a Regex?

I want to extract a path starting from a /bucketName after the s3:/ prefix. For now I've managed to extract the s3:/ prefix itself.

import scala.util.matching.Regex

val s3Path = "s3://bucketName/dataDir"
val pattern = new Regex("(s3-|s3\\.)?(.*)\\:/")

val pathString: String = (pattern findFirstIn s3Path).getOrElse("")

// prints s3:/
println(pathString) 

How could I get /bucketName/dataDir instead?

丢失的 。*

 val pattern = new Regex("(s3-|s3\\.)?(.*)\\:/.*")

You may use

val pattern = "(?<=s3:/).+".r
val str = "s3://bucketName/data"
println(pattern.findFirstIn(str).getOrElse(""))

See the Scala demo .

Details

  • (?<=s3:/).+ - matches a location that is immediately preceded with s3:/ and then matches any 1+ chars other than line break chars
  • pattern.findFirstIn(str) - finds the first occurrence of the pattern in the string.

If you want, you may also use pattern matching for this - this way, you do not need a lookbehind, just a capturing group around .+ :

val pattern = "s3:/(.+)".r
val str = "s3://bucketName/data"
val m = str match {
    case pattern(url) => url
    case _ => ""
}
println(s"URL: ${m}") // => URL: /bucketName/data

See a Scala demo .

One caveat: this requires a full string match.

You could also match s3:/ and capture any char except a newline (.+) in a group:

s3:/(.+)

Regex demo | Scala demo

val s3Path = "s3://bucketName/dataDir"
val pattern = "s3:/(.+)".r
pattern.findFirstMatchIn(s3Path).foreach(m ⇒ println(m.group(1)))

Result

/bucketName/dataDir

For that you also can use String interpolation:

val s3Path = "s3://bucketName/dataDir" match {
  case s"s3://$dir" => dir
  case _ => "invalid"
}

println(s3Path)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM