I have a file that has sections like this,
flags...id, description, used, color
AB, "Abandoned", 0, 13168840
DM, "Demolished", 0, 15780518
OP, "Operational", 0, 15780518...
where ...
represents a series of control characters eg ETX and STX. I am trying to grab multiple lines from the file.
I am using the following code:
f = File.open(somePath)
r = f.grep(/flags.+id, description, used, color(?<data>(?:.|\s)*?)[\x00-\x08]/)
This code does not work. I do not understand why. The documentation of grep appears to insinuate that the file is parsed line by line. I have a feeling that this may be the reason why the regular expression isn't returning any results.
file.each_line
to capture the data?String#scan
comes to the rescue:
File.read('/path/to/file').scan(
/flags.+id, description, used, color(?<data>(?:.|\s)*?)[\x00-\x08]/m
)
You need to enable multiline mode. .
doesn't match newlines by default.
From the documentation https://ruby-doc.org/core-2.1.1/Regexp.html
/./ - Any character except a newline.
/./m - Any character (the m modifier enables multiline mode)
Am I correct that grep uses line-by-line parsing?
Yes. Try on your file:
r = File.open(somePath) do |f|
f.grep(/[A-Z]{2},/)
end
puts r
# => AB, "Abandoned", 0, 13168840
# DM, "Demolished", 0, 15780518
# OP, "Operational", 0, 15780518
puts r.inspect
# => ["AB, \"Abandoned\", 0, 13168840\n", "DM, \"Demolished\", 0, 15780518\n", "OP, \"Operational\", 0, 15780518\n"]
Is this why my regex isn't working as intended?
Not only. What are you searching for, with [\\x00-\\x08]? An ascii or an hexadecimal character?
Would it be better to use file.each_line to capture the data?
File#grep
sounds good.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.