简体   繁体   English

我如何用gru用grep分割文件?

[英]How do I split a file by a grep with ruby?

I have a file that contains many bits of code, and I'd like to refactor all of them into their own files. 我有一个包含很多代码的文件,我想将所有代码重构为自己的文件。 The file in question has some 30k lines, so I don't want to do it by hand. 有问题的文件大约有30k行,所以我不想手动处理。

Each of the sections starts: 每个部分均始于:

module MyModule

(I've changed that name) (我改了名字)


Is there a function to split the file per-mark? 是否有按标记分割文件的功能? When I use File.readlines I can't find a nice way to split the array. 当我使用File.readlines我找不到拆分数组的好方法。

I don't care how you'd think to name them. 我不在乎您如何命名它们。

I refactored your code. 我重构了您的代码。

File.read('lib/odin.rb').split(/module Odin/).each do |mod|
    File.open("#{mod[/class (\w+)/, 1]}.rb", "w") do |f| 
        f.write("module Odin")
        f.write(mod)
    end
end

I've found the answer, in writing out the question in detail. 通过详细写出问题,我找到了答案。

I'm posting it as an answer, but I'll award the answer to someone else that has a better solution: 我将其发布为答案,但是我会将答案授予具有更好解决方案的其他人:

big_file = File.readlines 'lib/odin.rb'
big_file.
  join(' ').
  split(/module Odin/). 
  map!{|w| w.prepend("module Odin\n") }.
  each do |f| 
    name = "#{f.match(/class ([a-zA-Z]+)/)[1].underscore}.rb"
    File.open(name, "w") do |n| 
      n.write(f)
    end
  end

I also thought of a nice way to name the output files based on content; 我还想到了一种基于内容命名输出文件的好方法。 but I don't care how you'd think to name them. 但我不在乎您如何命名它们。

Ruby has a great method that is part of Enumerable called slice_before : Ruby有一个很棒的方法,它是Enumerable的一部分,称为slice_before

require 'pp'

modules = DATA.readlines.map(&:chomp).slice_before(/^module MyModule/).map{ |a| a.join("\n") }
pp modules

__END__
module MyModule
  # 1 stuff
end

module MyModule
  # 2 stuff
end

module MyModule
  # 3 stuff
end

This is the output showing what modules contains: 这是显示哪些modules包含的输出:

["module MyModule\n  # 1 stuff\nend\n",
 "module MyModule\n  # 2 stuff\nend\n",
 "module MyModule\n  # 3 stuff\nend"]

DATA is Ruby sleight-of-hand inherited from Perl. DATA是从Perl继承的Ruby技巧。 Everything in the source file after __END__ is considered part of a "data" block, which is made available to the running code by the interpreter in the DATA file handle, and acts like a data file. __END__之后的源文件中的__END__均视为“数据”块的一部分,解释器在DATA文件句柄中将其提供给正在运行的代码,其作用类似于数据文件。 That means we can use IO methods on it, such as readlines , similarly to how we'd use IO.readlines . 这意味着我们可以像使用IO.readlines一样在其上使用IO方法,例如readlines I'm using __END__ and DATA here because they're convenient for simple tests and short scripts. 我在这里使用__END__DATA ,因为它们对于简单的测试和简短的脚本很方便。

readlines doesn't remove the trailing line-end when it reads the line, which is what map(&:chomp) does. readlines读取行时不会删除行尾,这是map(&:chomp)所做的。 DATA.read.split("\\n") would have accomplished the same thing. DATA.read.split("\\n")将完成相同的操作。

slice_before is the magic that makes this work. slice_before是使这项工作起作用的魔力。 It takes an array and iterates through it, creating sub-arrays that start each time the pattern finds a hit. 它需要一个数组并对其进行遍历,从而创建子数组,该子数组在每次模式找到匹配时都开始。 Following that it's just a case of rejoining the contents of the sub-arrays back into a single string, prior to writing to the files. 接下来,只是在写入文件之前将子数组的内容重新合并为单个字符串的情况。

After that you just have to loop over modules , saving each one to a different file: 之后,您只需要遍历modules ,将每个modules保存到另一个文件中:

modules.each.with_index(1) do |m, i|
  File.write("module_#{ i }.rb", m)
end

with_index is a nice little method in Enumerator, that is useful when we need to know which item in an array we're processing. with_index是Enumerator中一个不错的小方法,当我们需要知道要处理的数组中的哪个项目时,该方法很有用。 It's similar to each_with_index except we can specify the starting offset value, 1 in this case. 它类似于each_with_index不同之处each_with_index我们可以指定起始偏移值,在这种情况下为1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM