简体   繁体   中英

How do I split a file by a grep with ruby?

I have a file that contains many bits of code, and I'd like to refactor all of them into their own files. The file in question has some 30k lines, so I don't want to do it by hand.

Each of the sections starts:

module MyModule

(I've changed that name)


Is there a function to split the file per-mark? When I use File.readlines I can't find a nice way to split the array.

I don't care how you'd think to name them.

I refactored your code.

File.read('lib/odin.rb').split(/module Odin/).each do |mod|
    File.open("#{mod[/class (\w+)/, 1]}.rb", "w") do |f| 
        f.write("module Odin")
        f.write(mod)
    end
end

I've found the answer, in writing out the question in detail.

I'm posting it as an answer, but I'll award the answer to someone else that has a better solution:

big_file = File.readlines 'lib/odin.rb'
big_file.
  join(' ').
  split(/module Odin/). 
  map!{|w| w.prepend("module Odin\n") }.
  each do |f| 
    name = "#{f.match(/class ([a-zA-Z]+)/)[1].underscore}.rb"
    File.open(name, "w") do |n| 
      n.write(f)
    end
  end

I also thought of a nice way to name the output files based on content; but I don't care how you'd think to name them.

Ruby has a great method that is part of Enumerable called slice_before :

require 'pp'

modules = DATA.readlines.map(&:chomp).slice_before(/^module MyModule/).map{ |a| a.join("\n") }
pp modules

__END__
module MyModule
  # 1 stuff
end

module MyModule
  # 2 stuff
end

module MyModule
  # 3 stuff
end

This is the output showing what modules contains:

["module MyModule\n  # 1 stuff\nend\n",
 "module MyModule\n  # 2 stuff\nend\n",
 "module MyModule\n  # 3 stuff\nend"]

DATA is Ruby sleight-of-hand inherited from Perl. Everything in the source file after __END__ is considered part of a "data" block, which is made available to the running code by the interpreter in the DATA file handle, and acts like a data file. That means we can use IO methods on it, such as readlines , similarly to how we'd use IO.readlines . I'm using __END__ and DATA here because they're convenient for simple tests and short scripts.

readlines doesn't remove the trailing line-end when it reads the line, which is what map(&:chomp) does. DATA.read.split("\\n") would have accomplished the same thing.

slice_before is the magic that makes this work. It takes an array and iterates through it, creating sub-arrays that start each time the pattern finds a hit. Following that it's just a case of rejoining the contents of the sub-arrays back into a single string, prior to writing to the files.

After that you just have to loop over modules , saving each one to a different file:

modules.each.with_index(1) do |m, i|
  File.write("module_#{ i }.rb", m)
end

with_index is a nice little method in Enumerator, that is useful when we need to know which item in an array we're processing. It's similar to each_with_index except we can specify the starting offset value, 1 in this case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM