简体   繁体   中英

Multiline file grep

I have a file that has sections like this,

flags...id, description, used, color
AB, "Abandoned", 0, 13168840
DM, "Demolished", 0, 15780518
OP, "Operational", 0, 15780518...

where ... represents a series of control characters eg ETX and STX. I am trying to grab multiple lines from the file.

I am using the following code:

f = File.open(somePath)
r = f.grep(/flags.+id, description, used, color(?<data>(?:.|\s)*?)[\x00-\x08]/)

This code does not work. I do not understand why. The documentation of grep appears to insinuate that the file is parsed line by line. I have a feeling that this may be the reason why the regular expression isn't returning any results.

  1. Am I correct that grep uses line-by-line parsing? Is this why my regex isn't working as intended?
  2. Would it be better to use file.each_line to capture the data?
  3. Are there better/cleaner alternatives to all of the above?

String#scan comes to the rescue:

File.read('/path/to/file').scan(
  /flags.+id, description, used, color(?<data>(?:.|\s)*?)[\x00-\x08]/m
)

You need to enable multiline mode. . doesn't match newlines by default.

From the documentation https://ruby-doc.org/core-2.1.1/Regexp.html

/./ - Any character except a newline.
/./m - Any character (the m modifier enables multiline mode)

Am I correct that grep uses line-by-line parsing?

Yes. Try on your file:

r = File.open(somePath) do |f|
  f.grep(/[A-Z]{2},/)
end

puts r
# => AB, "Abandoned", 0, 13168840
#    DM, "Demolished", 0, 15780518
#    OP, "Operational", 0, 15780518

puts r.inspect
# => ["AB, \"Abandoned\", 0, 13168840\n", "DM, \"Demolished\", 0, 15780518\n", "OP, \"Operational\", 0, 15780518\n"]

Is this why my regex isn't working as intended?

Not only. What are you searching for, with [\\x00-\\x08]? An ascii or an hexadecimal character?

Would it be better to use file.each_line to capture the data?

File#grep sounds good.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM