[英]Scala - String matches RegEx
这是在Scala 2.11.8上
我正在尝试阅读和解析Scala中的文本文件。 尝试执行string.matches
时看到意外的行为(对我而言)。
说我有一个具有以下内容的file.txt
#############
# HEADING 1
#############
- The zeroth line item, if there can be one
- First Line item
- Second Line item
- Here is the third
and this one has some details
- A fourth one followed by empty line
- Fifth line item
读取文件,然后解析内容,因此-
val source = scala.io.Source.fromFile("file.txt")
val lines = try source.getLines.filterNot(_.matches("#.*")).mkString("\n") finally source.close
val items = lines.split("""(\n-|^-)\s""").filter(_.nonEmpty)
现在,尝试使用其结果解析各个订单项:
// print the first few items
scala> items(0)
res0: String = The zeroth line item, if there can be one
scala> items(1)
res1: String = First Line item
scala> items(3)
res2: String =
Here is the third
and this one has some details
scala> items(4)
res3: String =
"A fourth one followed by empty line
"
scala> items(5)
res4: String =
"Fifth line item
"
现在进行一些匹配
// Matching the items with RegEx
scala> items(0).matches("The.*")
res5: Boolean = true
scala> items(1).matches("First.*")
res6: Boolean = true
scala> items(3).matches("Here is.*")
res7: Boolean = false // ??
scala> items(4).matches("A fourth.*")
res8: Boolean = false // ??
// But startsWith seems to recognize it just fine!
scala> items(3).startsWith("Here is")
res9: Boolean = true
scala> items(4).startsWith("A fourth")
res10: Boolean = true
// Even this doesn't match
scala> items(4).matches(".*A fourth.*")
res11: Boolean = false // ?
我的观察是,仅当该项目仅包含一行时,才会发生这种情况。 即,当项目跨越多行时(包括下一行为空)
这是预期的行为吗? 如何使用RegEx一致地进行匹配?
考虑使用正则表达式开头的(?s)
标志激活DOTALL
模式。 例:
val text =
"""|- The zeroth line item, if there can be one
|- First Line item
|- Second Line item
|- Here is the third
| and this one has some details
|- A fourth one followed by empty line
|
|- Fifth line item
|
|""".stripMargin
val items = text.split("""(\n-|^-)\s""").filter(_.nonEmpty)
def describeMatch(str: String, regex: String): Unit = {
println("-" * 60)
println("The string\n>>>%s<<<\n%s".format(
str,
(if (str.matches(regex)) "Matches" else "Doesn't match") + s" >>>$regex<<<"
))
}
describeMatch(items(0), "The.*")
describeMatch(items(1), "First.*")
describeMatch(items(3), "Here is.*")
describeMatch(items(3), "(?s)Here is.*")
describeMatch(items(4), "A fourth.*")
describeMatch(items(4), "(?s)A fourth.*")
describeMatch(items(4), ".*A fourth.*$")
describeMatch(items(4), "(?s)^A fourth.*$")
输出应该说明一切:
------------------------------------------------------------
The string
>>>The zeroth line item, if there can be one<<<
Matches >>>The.*<<<
------------------------------------------------------------
The string
>>>First Line item<<<
Matches >>>First.*<<<
------------------------------------------------------------
The string
>>>Here is the third
and this one has some details<<<
Doesn't match >>>Here is.*<<<
------------------------------------------------------------
The string
>>>Here is the third
and this one has some details<<<
Matches >>>(?s)Here is.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>.*A fourth.*$<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)^A fourth.*$<<<
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.