[英]Scala - String matches RegEx
這是在Scala 2.11.8上
我正在嘗試閱讀和解析Scala中的文本文件。 嘗試執行string.matches
時看到意外的行為(對我而言)。
說我有一個具有以下內容的file.txt
#############
# HEADING 1
#############
- The zeroth line item, if there can be one
- First Line item
- Second Line item
- Here is the third
and this one has some details
- A fourth one followed by empty line
- Fifth line item
讀取文件,然后解析內容,因此-
val source = scala.io.Source.fromFile("file.txt")
val lines = try source.getLines.filterNot(_.matches("#.*")).mkString("\n") finally source.close
val items = lines.split("""(\n-|^-)\s""").filter(_.nonEmpty)
現在,嘗試使用其結果解析各個訂單項:
// print the first few items
scala> items(0)
res0: String = The zeroth line item, if there can be one
scala> items(1)
res1: String = First Line item
scala> items(3)
res2: String =
Here is the third
and this one has some details
scala> items(4)
res3: String =
"A fourth one followed by empty line
"
scala> items(5)
res4: String =
"Fifth line item
"
現在進行一些匹配
// Matching the items with RegEx
scala> items(0).matches("The.*")
res5: Boolean = true
scala> items(1).matches("First.*")
res6: Boolean = true
scala> items(3).matches("Here is.*")
res7: Boolean = false // ??
scala> items(4).matches("A fourth.*")
res8: Boolean = false // ??
// But startsWith seems to recognize it just fine!
scala> items(3).startsWith("Here is")
res9: Boolean = true
scala> items(4).startsWith("A fourth")
res10: Boolean = true
// Even this doesn't match
scala> items(4).matches(".*A fourth.*")
res11: Boolean = false // ?
我的觀察是,僅當該項目僅包含一行時,才會發生這種情況。 即,當項目跨越多行時(包括下一行為空)
這是預期的行為嗎? 如何使用RegEx一致地進行匹配?
考慮使用正則表達式開頭的(?s)
標志激活DOTALL
模式。 例:
val text =
"""|- The zeroth line item, if there can be one
|- First Line item
|- Second Line item
|- Here is the third
| and this one has some details
|- A fourth one followed by empty line
|
|- Fifth line item
|
|""".stripMargin
val items = text.split("""(\n-|^-)\s""").filter(_.nonEmpty)
def describeMatch(str: String, regex: String): Unit = {
println("-" * 60)
println("The string\n>>>%s<<<\n%s".format(
str,
(if (str.matches(regex)) "Matches" else "Doesn't match") + s" >>>$regex<<<"
))
}
describeMatch(items(0), "The.*")
describeMatch(items(1), "First.*")
describeMatch(items(3), "Here is.*")
describeMatch(items(3), "(?s)Here is.*")
describeMatch(items(4), "A fourth.*")
describeMatch(items(4), "(?s)A fourth.*")
describeMatch(items(4), ".*A fourth.*$")
describeMatch(items(4), "(?s)^A fourth.*$")
輸出應該說明一切:
------------------------------------------------------------
The string
>>>The zeroth line item, if there can be one<<<
Matches >>>The.*<<<
------------------------------------------------------------
The string
>>>First Line item<<<
Matches >>>First.*<<<
------------------------------------------------------------
The string
>>>Here is the third
and this one has some details<<<
Doesn't match >>>Here is.*<<<
------------------------------------------------------------
The string
>>>Here is the third
and this one has some details<<<
Matches >>>(?s)Here is.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>.*A fourth.*$<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)^A fourth.*$<<<
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.