简体   繁体   中英

Append line every n lines to file - ruby

I need to add line to text file every 1000 line. The main issue is that file is about 3GB so I can't load the whole file to string or array. To work with big files I usually use File.foreach but I can't find any information about working with index in this case. Is there any other option to resolve this issue without the need to load whole file to memory?

Here are a couple options.

Option 1:

File.open('output.txt', 'w') do |outfile|
    File.foreach('input.txt').each_with_index do |line, i|
        outfile.puts(line)
        outfile.puts '--- 1000 ---' if (i + 1) % 1000 == 0 && i != 0
    end
end

This inserts the line '--- 1000 ---' after every 1000 lines from the original file. It has some drawbacks though. Mainly it has to check each index and check that we are not at line zero with every line. But it works. And it works on large files without hogging memory.

Option 2:

File.open('output.txt', 'w') do |outfile|
    File.foreach('input.txt').each_slice(1000) do |lines|
        outfile.puts(lines)
        outfile.puts '--- 1000 ---'
    end
end

This code does almost exactly the same thing using Enumerable 's each_slice method. It yields an array of every 1000 lines, writes them out using puts (which accepts Array s), then writes our marker line after it. It then repeats for the next 1000 lines. The difference is if the file isn't a multiple of 1000 lines the last call to this block will yield an array smaller than 1000 lines and our code will still append our line of text after it.

We can fix this by testing the array's length and only writing out our line if the array is exactly 1000 lines. Which will be true for every batch of 1000 lines except the last one (given a file that is not a multiple of 1000 lines).

Option 2a:

File.open('output.txt', 'w') do |outfile|
    File.foreach('input.txt').each_slice(1000) do |lines|
        outfile.puts(lines)
        outfile.puts '--- 1000 ---' unless lines.size < 1000
    end
end

This extra check is only needed if appending that line to the end of the file is a problem for you. Otherwise you can leave it out for a small performance boost.

Speaking of performance, here is how each option performed on a 335.5 MB file containing 1,000,000 paragraphs of Lorem Ipsum. Each benchmark is total time to process the entire file 100 times.

Option 1:

103.859825  44.646519 148.506344 (152.286349)
[Finished in 152.6s]

Option 2:

 96.249542  43.780160 140.029702 (145.210728)
[Finished in 145.7s]

Option 2a:

 98.041073  45.788944 143.830017 (149.769698)
[Finished in 150.2s]

As you can see, option 2 is the fastest. Keep in mind, options 2/2a will in theory use more memory since it loads 1000 lines at a time, but even then it's capped at a very small level so handling enormous files shouldn't be a problem. However they are all so close I would recommend going with whatever option reads the best or makes the most sense.

Hope this helped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM