This is on Scala 2.11.8
I'm trying to read and parse a text file in Scala. Seeing an unexpected behavior (for me) when trying to do string.matches
.
Say I have a file.txt
with below contents
#############
# HEADING 1
#############
- The zeroth line item, if there can be one
- First Line item
- Second Line item
- Here is the third
and this one has some details
- A fourth one followed by empty line
- Fifth line item
Read the file, and parse the contents, thus -
val source = scala.io.Source.fromFile("file.txt")
val lines = try source.getLines.filterNot(_.matches("#.*")).mkString("\n") finally source.close
val items = lines.split("""(\n-|^-)\s""").filter(_.nonEmpty)
Now, trying to parse individual line items with their result:
// print the first few items
scala> items(0)
res0: String = The zeroth line item, if there can be one
scala> items(1)
res1: String = First Line item
scala> items(3)
res2: String =
Here is the third
and this one has some details
scala> items(4)
res3: String =
"A fourth one followed by empty line
"
scala> items(5)
res4: String =
"Fifth line item
"
Now for some matching
// Matching the items with RegEx
scala> items(0).matches("The.*")
res5: Boolean = true
scala> items(1).matches("First.*")
res6: Boolean = true
scala> items(3).matches("Here is.*")
res7: Boolean = false // ??
scala> items(4).matches("A fourth.*")
res8: Boolean = false // ??
// But startsWith seems to recognize it just fine!
scala> items(3).startsWith("Here is")
res9: Boolean = true
scala> items(4).startsWith("A fourth")
res10: Boolean = true
// Even this doesn't match
scala> items(4).matches(".*A fourth.*")
res11: Boolean = false // ?
My observation is this happens only when the item contains anything but a single line. ie when the item spans multiple lines (including having an empty following line)
Is this behavior expected? How to consistently match using RegEx?
Consider activating the DOTALL
mode using the (?s)
flag in the beginning of the regex. Example:
val text =
"""|- The zeroth line item, if there can be one
|- First Line item
|- Second Line item
|- Here is the third
| and this one has some details
|- A fourth one followed by empty line
|
|- Fifth line item
|
|""".stripMargin
val items = text.split("""(\n-|^-)\s""").filter(_.nonEmpty)
def describeMatch(str: String, regex: String): Unit = {
println("-" * 60)
println("The string\n>>>%s<<<\n%s".format(
str,
(if (str.matches(regex)) "Matches" else "Doesn't match") + s" >>>$regex<<<"
))
}
describeMatch(items(0), "The.*")
describeMatch(items(1), "First.*")
describeMatch(items(3), "Here is.*")
describeMatch(items(3), "(?s)Here is.*")
describeMatch(items(4), "A fourth.*")
describeMatch(items(4), "(?s)A fourth.*")
describeMatch(items(4), ".*A fourth.*$")
describeMatch(items(4), "(?s)^A fourth.*$")
The output should speak for itself:
------------------------------------------------------------
The string
>>>The zeroth line item, if there can be one<<<
Matches >>>The.*<<<
------------------------------------------------------------
The string
>>>First Line item<<<
Matches >>>First.*<<<
------------------------------------------------------------
The string
>>>Here is the third
and this one has some details<<<
Doesn't match >>>Here is.*<<<
------------------------------------------------------------
The string
>>>Here is the third
and this one has some details<<<
Matches >>>(?s)Here is.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)A fourth.*<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Doesn't match >>>.*A fourth.*$<<<
------------------------------------------------------------
The string
>>>A fourth one followed by empty line
<<<
Matches >>>(?s)^A fourth.*$<<<
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.