簡體   English   中英

Scala-字符串匹配RegEx

[英]Scala - String matches RegEx

這是在Scala 2.11.8上

我正在嘗試閱讀和解析Scala中的文本文件。 嘗試執行string.matches時看到意外的行為(對我而言)。

說我有一個具有以下內容的file.txt

#############
# HEADING 1
#############

- The zeroth line item, if there can be one
- First Line item
- Second Line item
- Here is the third
    and this one has some details
- A fourth one followed by empty line

- Fifth line item

讀取文件,然后解析內容,因此-

val source = scala.io.Source.fromFile("file.txt")
val lines = try source.getLines.filterNot(_.matches("#.*")).mkString("\n") finally source.close
val items = lines.split("""(\n-|^-)\s""").filter(_.nonEmpty)

現在,嘗試使用其結果解析各個訂單項:

// print the first few items
scala> items(0)
res0: String = The zeroth line item, if there can be one

scala> items(1)
res1: String = First Line item

scala> items(3)
res2: String =
Here is the third
    and this one has some details

scala> items(4)
res3: String =
"A fourth one followed by empty line
"

scala> items(5)
res4: String =
"Fifth line item

"

現在進行一些匹配

// Matching the items with RegEx
scala> items(0).matches("The.*")
res5: Boolean = true

scala> items(1).matches("First.*")
res6: Boolean = true

scala> items(3).matches("Here is.*")
res7: Boolean = false                    // ??

scala> items(4).matches("A fourth.*")
res8: Boolean = false                    // ??


// But startsWith seems to recognize it just fine!
scala> items(3).startsWith("Here is")
res9: Boolean = true

scala> items(4).startsWith("A fourth")
res10: Boolean = true

// Even this doesn't match
scala> items(4).matches(".*A fourth.*")
res11: Boolean = false                    // ?

我的觀察是,僅當該項目僅包含一行時,才會發生這種情況。 即,當項目跨越多行時(包括下一行為空)

這是預期的行為嗎? 如何使用RegEx一致地進行匹配?

考慮使用正則表達式開頭的(?s)標志激活DOTALL模式。 例:

val text = 
  """|- The zeroth line item, if there can be one
     |- First Line item
     |- Second Line item
     |- Here is the third
     |    and this one has some details
     |- A fourth one followed by empty line
     |
     |- Fifth line item
     |
     |""".stripMargin


val items = text.split("""(\n-|^-)\s""").filter(_.nonEmpty)

def describeMatch(str: String, regex: String): Unit = {
  println("-" * 60)
  println("The string\n>>>%s<<<\n%s".format(
    str,
    (if (str.matches(regex)) "Matches" else "Doesn't match") + s" >>>$regex<<<"
  ))
}

describeMatch(items(0), "The.*")
describeMatch(items(1), "First.*")
describeMatch(items(3), "Here is.*")
describeMatch(items(3), "(?s)Here is.*")
describeMatch(items(4), "A fourth.*")
describeMatch(items(4), "(?s)A fourth.*")
describeMatch(items(4), ".*A fourth.*$")
describeMatch(items(4), "(?s)^A fourth.*$")

輸出應該說明一切:

------------------------------------------------------------
The string
>>>The zeroth line item, if there can be one<<<
Matches >>>The.*<<<
------------------------------------------------------------
The string
>>>First Line item<<<
Matches >>>First.*<<<
------------------------------------------------------------
The string
>>>Here is the third
    and this one has some details<<<
Doesn't match >>>Here is.*<<<
------------------------------------------------------------
The string
>>>Here is the third
    and this one has some details<<<
Matches >>>(?s)Here is.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>.*A fourth.*$<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)^A fourth.*$<<<

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM