简体   繁体   English

如何修改文本文件?

[英]How to modify a text file?

I'm using Python, and would like to insert a string into a text file without deleting or copying the file.我正在使用 Python,并且想在不删除或复制文件的情况下将字符串插入文本文件。 How can I do that?我怎样才能做到这一点?

Unfortunately there is no way to insert into the middle of a file without re-writing it.不幸的是,没有办法在不重写文件的情况下插入文件的中间。 As previous posters have indicated, you can append to a file or overwrite part of it using seek but if you want to add stuff at the beginning or the middle, you'll have to rewrite it.正如之前的海报所指出的,您可以使用 seek 附加到文件或覆盖文件的一部分,但如果您想在开头或中间添加内容,则必须重写它。

This is an operating system thing, not a Python thing.这是操作系统的事情,而不是 Python 的事情。 It is the same in all languages.在所有语言中都是一样的。

What I usually do is read from the file, make the modifications and write it out to a new file called myfile.txt.tmp or something like that.我通常做的是从文件中读取,进行修改并将其写入一个名为 myfile.txt.tmp 或类似的新文件。 This is better than reading the whole file into memory because the file may be too large for that.这比将整个文件读入内存要好,因为文件可能太大了。 Once the temporary file is completed, I rename it the same as the original file.临时文件完成后,我将其重命名为与原始文件相同。

This is a good, safe way to do it because if the file write crashes or aborts for any reason, you still have your untouched original file.这是一种很好、安全的方法,因为如果文件写入因任何原因崩溃或中止,您仍然拥有未触及的原始文件。

Depends on what you want to do.取决于你想做什么。 To append you can open it with "a":要附加,您可以使用“a”打开它:

 with open("foo.txt", "a") as f:
     f.write("new line\n")

If you want to preprend something you have to read from the file first:如果你想预先准备一些东西,你必须先从文件中读取:

with open("foo.txt", "r+") as f:
     old = f.read() # read everything in the file
     f.seek(0) # rewind
     f.write("new line\n" + old) # write the new line before

The fileinput module of the Python standard library will rewrite a file inplace if you use the inplace=1 parameter:如果使用 inplace=1 参数,Python 标准库的fileinput模块将就地重写文件:

import sys
import fileinput

# replace all occurrences of 'sit' with 'SIT' and insert a line after the 5th
for i, line in enumerate(fileinput.input('lorem_ipsum.txt', inplace=1)):
    sys.stdout.write(line.replace('sit', 'SIT'))  # replace 'sit' and write
    if i == 4: sys.stdout.write('\n')  # write a blank line after the 5th line

Rewriting a file in place is often done by saving the old copy with a modified name.就地重写文件通常是通过使用修改后的名称保存旧副本来完成的。 Unix folks add a ~ to mark the old one. Unix 人添加一个~来标记旧的。 Windows folks do all kinds of things -- add .bak or .old -- or rename the file entirely or put the ~ on the front of the name. Windows 人员会做各种各样的事情——添加 .bak 或 .old——或完全重命名文件或将 ~ 放在名称的前面。

import shutil
shutil.move(afile, afile + "~")

destination= open(aFile, "w")
source= open(aFile + "~", "r")
for line in source:
    destination.write(line)
    if <some condition>:
        destination.write(<some additional line> + "\n")

source.close()
destination.close()

Instead of shutil , you can use the following.而不是shutil ,您可以使用以下内容。

import os
os.rename(aFile, aFile + "~")

Python's mmap module will allow you to insert into a file. Python 的 mmap 模块将允许您插入文件。 The following sample shows how it can be done in Unix (Windows mmap may be different).以下示例显示了如何在 Unix 中完成它(Windows mmap 可能不同)。 Note that this does not handle all error conditions and you might corrupt or lose the original file.请注意,这并不能处理所有错误情况,您可能会损坏或丢失原始文件。 Also, this won't handle unicode strings.此外,这不会处理 unicode 字符串。

import os
from mmap import mmap

def insert(filename, str, pos):
    if len(str) < 1:
        # nothing to insert
        return

    f = open(filename, 'r+')
    m = mmap(f.fileno(), os.path.getsize(filename))
    origSize = m.size()

    # or this could be an error
    if pos > origSize:
        pos = origSize
    elif pos < 0:
        pos = 0

    m.resize(origSize + len(str))
    m[pos+len(str):] = m[pos:origSize]
    m[pos:pos+len(str)] = str
    m.close()
    f.close()

It is also possible to do this without mmap with files opened in 'r+' mode, but it is less convenient and less efficient as you'd have to read and temporarily store the contents of the file from the insertion position to EOF - which might be huge.也可以在没有 mmap 的情况下使用以 'r+' 模式打开的文件执行此操作,但它不太方便且效率较低,因为您必须从插入位置读取文件内容并将其临时存储到 EOF - 这可能是巨大的。

As mentioned by Adam you have to take your system limitations into consideration before you can decide on approach whether you have enough memory to read it all into memory replace parts of it and re-write it.正如亚当所提到的,您必须先考虑您的系统限制,然后才能决定是否有足够的内存将其全部读入内存替换部分并重新写入。

If you're dealing with a small file or have no memory issues this might help:如果您正在处理一个小文件或没有内存问题,这可能会有所帮助:

Option 1) Read entire file into memory, do a regex substitution on the entire or part of the line and replace it with that line plus the extra line.选项1)将整个文件读入内存,对整个或部分行进行正则表达式替换,并将其替换为该行加上额外的行。 You will need to make sure that the 'middle line' is unique in the file or if you have timestamps on each line this should be pretty reliable.您需要确保“中间行”在文件中是唯一的,或者如果每行都有时间戳,这应该是非常可靠的。

# open file with r+b (allow write and binary mode)
f = open("file.log", 'r+b')   
# read entire content of file into memory
f_content = f.read()
# basically match middle line and replace it with itself and the extra line
f_content = re.sub(r'(middle line)', r'\1\nnew line', f_content)
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content 
f.truncate()
# re-write the content with the updated content
f.write(f_content)
# close file
f.close()

Option 2) Figure out middle line, and replace it with that line plus the extra line.选项2)找出中间线,并将其替换为该线加上额外的线。

# open file with r+b (allow write and binary mode)
f = open("file.log" , 'r+b')   
# get array of lines
f_content = f.readlines()
# get middle line
middle_line = len(f_content)/2
# overwrite middle line
f_content[middle_line] += "\nnew line"
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content 
f.truncate()
# re-write the content with the updated content
f.write(''.join(f_content))
# close file
f.close()

Wrote a small class for doing this cleanly.写了一个小班来干净地做到这一点。

import tempfile

class FileModifierError(Exception):
    pass

class FileModifier(object):

    def __init__(self, fname):
        self.__write_dict = {}
        self.__filename = fname
        self.__tempfile = tempfile.TemporaryFile()
        with open(fname, 'rb') as fp:
            for line in fp:
                self.__tempfile.write(line)
        self.__tempfile.seek(0)

    def write(self, s, line_number = 'END'):
        if line_number != 'END' and not isinstance(line_number, (int, float)):
            raise FileModifierError("Line number %s is not a valid number" % line_number)
        try:
            self.__write_dict[line_number].append(s)
        except KeyError:
            self.__write_dict[line_number] = [s]

    def writeline(self, s, line_number = 'END'):
        self.write('%s\n' % s, line_number)

    def writelines(self, s, line_number = 'END'):
        for ln in s:
            self.writeline(s, line_number)

    def __popline(self, index, fp):
        try:
            ilines = self.__write_dict.pop(index)
            for line in ilines:
                fp.write(line)
        except KeyError:
            pass

    def close(self):
        self.__exit__(None, None, None)

    def __enter__(self):
        return self

    def __exit__(self, type, value, traceback):
        with open(self.__filename,'w') as fp:
            for index, line in enumerate(self.__tempfile.readlines()):
                self.__popline(index, fp)
                fp.write(line)
            for index in sorted(self.__write_dict):
                for line in self.__write_dict[index]:
                    fp.write(line)
        self.__tempfile.close()

Then you can use it this way:然后你可以这样使用它:

with FileModifier(filename) as fp:
    fp.writeline("String 1", 0)
    fp.writeline("String 2", 20)
    fp.writeline("String 3")  # To write at the end of the file

If you know some unix you could try the following:如果你知道一些 unix,你可以尝试以下方法:

Notes: $ means the command prompt注:$ 表示命令提示符

Say you have a file my_data.txt with content as such:假设您有一个 my_data.txt 文件,其内容如下:

$ cat my_data.txt
This is a data file
with all of my data in it.

Then using the os module you can use the usual sed commands然后使用os模块,您可以使用通常的sed命令

import os

# Identifiers used are:
my_data_file = "my_data.txt"
command = "sed -i 's/all/none/' my_data.txt"

# Execute the command
os.system(command)

If you aren't aware of sed, check it out, it is extremely useful.如果您不了解 sed,请查看它,它非常有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM