简体   繁体   中英

How do I remove lines of data in the middle of a text file with Ruby

I know how to write to a file, and read from a file, but I don't know how to modify a file besides reading the entire file into memory, manipulating it, and rewriting the entire file. For large files this isn't very productive.

I don't really know the difference between append and write.

Eg

If I have a file containing:

Person1,will,23
Person2,Richard,32
Person3,Mike,44

How would I be able just to delete line containing Person2?

You can delete a line in a several ways:

  • Simulate deletion. That is, just overwrite line's content with spaces. Later, when you read and process the file, just ignore such empty lines.

    Pros : this is easy and fast. Cons : it's not real deletion of data (file doesn't shrink) and you need to do more work when reading/processing the file.

    Code:

     f = File.new(filename, 'r+') f.each do |line| if should_be_deleted(line) # seek back to the beginning of the line. f.seek(-line.length, IO::SEEK_CUR) # overwrite line with spaces and add a newline char f.write(' ' * (line.length - 1)) f.write("\\n") end end f.close File.new(filename).each {|line| p line } # >> "Person1,will,23\\n" # >> " \\n" # >> "Person3,Mike,44\\n" 
  • Do real deletion. This means that line will no longer exist. So you will have to read next line and overwrite the current line with it. Then repeat this for all following lines until the end of file is reached. This seems to be error prone task (lines of different lengths, etc), so here's an error-free alternative: open temp file, write to it lines up to (but not including) the line you want to delete, skip the line you want to delete, write the rest to the temp file. Delete the original file and rename temporary one to use its name. Done.

    While this is technically a total rewrite of the file, it does differ from what you asked. The file doesn't need to be loaded fully to memory. You need only one line at a time. Ruby provides a method for this: IO#each_line .

    Pros : No assumptions. Lines get deleted. Reading code needs not to be altered. Cons : lots more work when deleting the line (not only the code, but also IO/CPU time).

    There is a snippet that illustrates this approach in @azgult's answer .

As files are saved essentially as a continuous block of data to the disk, removing any part of it necessitates rewriting at least what comes after it. This does in essence mean that - as you say - it isn't particularly efficient for large files. It is therefore generally a good idea to limit file sizes so that such problems don't occur.

A few "compromise" solutions might be to copy the file over line by line to a second file and then moving that to replace the first. This avoids loading the file into memory but does not avoid any hard disk access:

require 'fileutils'

open('file.txt', 'r') do |f|
  open('file.txt.tmp', 'w') do |f2|
    f.each_line do |line|
       f2.write(line) unless line.start_with? "Person2"
    end
  end
end
FileUtils.mv 'file.txt.tmp', 'file.txt'

Even more efficiently would be to read-write open the file and skip ahead to the position you want to delete and then shift the rest of the data back - but that would make for some quite ugly code (and I can't be asked to do that now).

You could open the file and read it line by line, appending lines you want to keep to a new file. This allows you the most control over which lines are kept, without destroying the original file.

File.open('output_file_path', 'w') do |output| # 'w' for a new file, 'a' append to existing
  File.open('input_file_path', 'r') do |input|
    line = input.readline
    if keep_line(line) # logic here to determine if the line should be kept
      output.write(line)
    end
  end
end

If you know the position of the beginning and end of the chunk you want to remove, you can open the file, read to the start, then seek to the end and continue reading.

Look up parameters to the read method, and read about seeking here:

http://ruby-doc.org/core-2.0/IO.html#method-i-read

Read here :

File.open('output.txt', 'w') do |out_file|
  File.open('input.txt', 'r').each do |line|
    out_file.print line.sub('Person2', '')
  end
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM