繁体   English   中英

Scala-字符串匹配RegEx

[英]Scala - String matches RegEx

这是在Scala 2.11.8上

我正在尝试阅读和解析Scala中的文本文件。 尝试执行string.matches时看到意外的行为(对我而言)。

说我有一个具有以下内容的file.txt

#############
# HEADING 1
#############

- The zeroth line item, if there can be one
- First Line item
- Second Line item
- Here is the third
    and this one has some details
- A fourth one followed by empty line

- Fifth line item

读取文件,然后解析内容,因此-

val source = scala.io.Source.fromFile("file.txt")
val lines = try source.getLines.filterNot(_.matches("#.*")).mkString("\n") finally source.close
val items = lines.split("""(\n-|^-)\s""").filter(_.nonEmpty)

现在,尝试使用其结果解析各个订单项:

// print the first few items
scala> items(0)
res0: String = The zeroth line item, if there can be one

scala> items(1)
res1: String = First Line item

scala> items(3)
res2: String =
Here is the third
    and this one has some details

scala> items(4)
res3: String =
"A fourth one followed by empty line
"

scala> items(5)
res4: String =
"Fifth line item

"

现在进行一些匹配

// Matching the items with RegEx
scala> items(0).matches("The.*")
res5: Boolean = true

scala> items(1).matches("First.*")
res6: Boolean = true

scala> items(3).matches("Here is.*")
res7: Boolean = false                    // ??

scala> items(4).matches("A fourth.*")
res8: Boolean = false                    // ??


// But startsWith seems to recognize it just fine!
scala> items(3).startsWith("Here is")
res9: Boolean = true

scala> items(4).startsWith("A fourth")
res10: Boolean = true

// Even this doesn't match
scala> items(4).matches(".*A fourth.*")
res11: Boolean = false                    // ?

我的观察是,仅当该项目仅包含一行时,才会发生这种情况。 即,当项目跨越多行时(包括下一行为空)

这是预期的行为吗? 如何使用RegEx一致地进行匹配?

考虑使用正则表达式开头的(?s)标志激活DOTALL模式。 例:

val text = 
  """|- The zeroth line item, if there can be one
     |- First Line item
     |- Second Line item
     |- Here is the third
     |    and this one has some details
     |- A fourth one followed by empty line
     |
     |- Fifth line item
     |
     |""".stripMargin


val items = text.split("""(\n-|^-)\s""").filter(_.nonEmpty)

def describeMatch(str: String, regex: String): Unit = {
  println("-" * 60)
  println("The string\n>>>%s<<<\n%s".format(
    str,
    (if (str.matches(regex)) "Matches" else "Doesn't match") + s" >>>$regex<<<"
  ))
}

describeMatch(items(0), "The.*")
describeMatch(items(1), "First.*")
describeMatch(items(3), "Here is.*")
describeMatch(items(3), "(?s)Here is.*")
describeMatch(items(4), "A fourth.*")
describeMatch(items(4), "(?s)A fourth.*")
describeMatch(items(4), ".*A fourth.*$")
describeMatch(items(4), "(?s)^A fourth.*$")

输出应该说明一切:

------------------------------------------------------------
The string
>>>The zeroth line item, if there can be one<<<
Matches >>>The.*<<<
------------------------------------------------------------
The string
>>>First Line item<<<
Matches >>>First.*<<<
------------------------------------------------------------
The string
>>>Here is the third
    and this one has some details<<<
Doesn't match >>>Here is.*<<<
------------------------------------------------------------
The string
>>>Here is the third
    and this one has some details<<<
Matches >>>(?s)Here is.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>.*A fourth.*$<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)^A fourth.*$<<<

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM