简体   繁体   中英

How to modify file contents as string whilst having access to methods effecting lines within string?

When editing the contents of a file I have been using the approach of:

  1. Open the file in read mode
  2. Convert file contents to a string with the .read() method and assign to another variable
  3. Close the file
  4. Do things to the string
  5. Open the original file in write mode
  6. Write the string to file
  7. Close the file

For example:

fo = open('file.html', r)
fo_as_string = fo.read()
fo.close()
#  # #
# do stuff to fo_as_string here
#  # #
fo = open('file.html', w)
fo.write(fo_as_string)
fo.close()

I now find myself in the situation however where I need to remove any white space at the beginning of lines and I think as I have converted the file object to a string there is no way to target this whitespace, at a 'line' level, with string methods like lstrip and rstrip.

So I guess I am after logic advice on how to retain the flexibility of having the file contents as a string for manipulation, but also be able to target lines within the string for specific line manipulation when required, as in the example above.

Use a for-loop , a for-loop over a file object returns one line at a time.

#use `with` statement for handling files, it automatically closes the file for you.
with open('file.html') as fo, open('file1.html', 'w') as fo1:
   for line in fo:                   #reads one line at a time, memory efficient
      #do something with line, line.strip()
      fo1.write(line + '\n')              #write line to to fo1

If you're trying to modify the same file then use fileinput module:

import fileinput
for line in fileinput.input('file.html', inplace = True):
   #do something with line
   print line  #writes the line back to 'file.html'

You can also get individual lines from file.read() as well, split it using:

fo_as_string = fo.read()
lines = fo_as_string.splitlines()

But file.read() loads the whole file into memory, so it is not much memory efficient.

Other alternatives are f.readlines() and list(f) , both return a list of all lines from the file object.

Depending on the size of the file, and the processes you want to do to each line, there are a couple of answers that might work for you.

First, if you're intent on keeping the entire file in memory while you process it, you could save it as a list of lines, process some or all of the lines, and rejoin them with your standard line delimiter when you wish to write them to disk:

linesep = '\n'
with open('file.html', 'r') as fin:
    input_lines = fin.readlines()


# Do your per-line transformation
modified_lines = [line.lstrip() for line in input_lines]

# Join the lines into one string to do whole-string processing
whole_string = linesep.join(modified_lines)

# whatever full-string processing you're looking for, do here

# Write to disk
with open('file1.html', 'w') as output_file:
    output_file.write(whole_string)

Or you could specify your own line separator, and do the input parsing by hand:

linesep = '\n'
input_lines_by_hand = fin.read.split(linesep)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM