简体   繁体   English

Scala-字符串匹配RegEx

[英]Scala - String matches RegEx

This is on Scala 2.11.8 这是在Scala 2.11.8上

I'm trying to read and parse a text file in Scala. 我正在尝试阅读和解析Scala中的文本文件。 Seeing an unexpected behavior (for me) when trying to do string.matches . 尝试执行string.matches时看到意外的行为(对我而言)。

Say I have a file.txt with below contents 说我有一个具有以下内容的file.txt

#############
# HEADING 1
#############

- The zeroth line item, if there can be one
- First Line item
- Second Line item
- Here is the third
    and this one has some details
- A fourth one followed by empty line

- Fifth line item

Read the file, and parse the contents, thus - 读取文件,然后解析内容,因此-

val source = scala.io.Source.fromFile("file.txt")
val lines = try source.getLines.filterNot(_.matches("#.*")).mkString("\n") finally source.close
val items = lines.split("""(\n-|^-)\s""").filter(_.nonEmpty)

Now, trying to parse individual line items with their result: 现在,尝试使用其结果解析各个订单项:

// print the first few items
scala> items(0)
res0: String = The zeroth line item, if there can be one

scala> items(1)
res1: String = First Line item

scala> items(3)
res2: String =
Here is the third
    and this one has some details

scala> items(4)
res3: String =
"A fourth one followed by empty line
"

scala> items(5)
res4: String =
"Fifth line item

"

Now for some matching 现在进行一些匹配

// Matching the items with RegEx
scala> items(0).matches("The.*")
res5: Boolean = true

scala> items(1).matches("First.*")
res6: Boolean = true

scala> items(3).matches("Here is.*")
res7: Boolean = false                    // ??

scala> items(4).matches("A fourth.*")
res8: Boolean = false                    // ??


// But startsWith seems to recognize it just fine!
scala> items(3).startsWith("Here is")
res9: Boolean = true

scala> items(4).startsWith("A fourth")
res10: Boolean = true

// Even this doesn't match
scala> items(4).matches(".*A fourth.*")
res11: Boolean = false                    // ?

My observation is this happens only when the item contains anything but a single line. 我的观察是,仅当该项目仅包含一行时,才会发生这种情况。 ie when the item spans multiple lines (including having an empty following line) 即,当项目跨越多行时(包括下一行为空)

Is this behavior expected? 这是预期的行为吗? How to consistently match using RegEx? 如何使用RegEx一致地进行匹配?

Consider activating the DOTALL mode using the (?s) flag in the beginning of the regex. 考虑使用正则表达式开头的(?s)标志激活DOTALL模式。 Example: 例:

val text = 
  """|- The zeroth line item, if there can be one
     |- First Line item
     |- Second Line item
     |- Here is the third
     |    and this one has some details
     |- A fourth one followed by empty line
     |
     |- Fifth line item
     |
     |""".stripMargin


val items = text.split("""(\n-|^-)\s""").filter(_.nonEmpty)

def describeMatch(str: String, regex: String): Unit = {
  println("-" * 60)
  println("The string\n>>>%s<<<\n%s".format(
    str,
    (if (str.matches(regex)) "Matches" else "Doesn't match") + s" >>>$regex<<<"
  ))
}

describeMatch(items(0), "The.*")
describeMatch(items(1), "First.*")
describeMatch(items(3), "Here is.*")
describeMatch(items(3), "(?s)Here is.*")
describeMatch(items(4), "A fourth.*")
describeMatch(items(4), "(?s)A fourth.*")
describeMatch(items(4), ".*A fourth.*$")
describeMatch(items(4), "(?s)^A fourth.*$")

The output should speak for itself: 输出应该说明一切:

------------------------------------------------------------
The string
>>>The zeroth line item, if there can be one<<<
Matches >>>The.*<<<
------------------------------------------------------------
The string
>>>First Line item<<<
Matches >>>First.*<<<
------------------------------------------------------------
The string
>>>Here is the third
    and this one has some details<<<
Doesn't match >>>Here is.*<<<
------------------------------------------------------------
The string
>>>Here is the third
    and this one has some details<<<
Matches >>>(?s)Here is.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>.*A fourth.*$<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)^A fourth.*$<<<

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM