如何查找和替换文本文件中的多行？

Question

I am running Python 2.7.我正在运行 Python 2.7。

I have three text files: data.txt , find.txt , and replace.txt .我有三个文本文件： data.txt ， find.txt和replace.txt 。 Now, find.txt contains several lines that I want to search for in data.txt and replace that section with the content in replace.txt .现在， find.txt包含几行文字，我想搜索data.txt和替换的内容部分replace.txt 。 Here is a simple example:这是一个简单的例子：

data.txt数据.txt

pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit

find.txt查找.txt

apple
banana
cherry

replace.txt替换.txt

1
2
3

So, in the above example, I want to search for all occurences of apple , banana , and cherry in the data and replace those lines with 1,2,3 .因此，在上面的示例中，我想在数据中搜索所有出现的apple 、 banana和cherry并将这些行替换为1,2,3 。

I am having some trouble with the right approach to this as my data.txt is about 1MB so I want to be as efficient as possible.我在使用正确的方法时遇到了一些麻烦，因为我的data.txt大约为 1MB，所以我希望尽可能高效。 One dumb way is to concatenate everything into one long string and use replace , and then output to a new text file so all the line breaks will be restored.一种愚蠢的方法是将所有内容连接成一个长字符串并使用replace ，然后输出到一个新的文本文件，以便恢复所有换行符。

import re

data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')

data_str = ""
find_str = ""
replace_str = "" 

for line in data: # concatenate it into one long string
    data_str += line

for line in find: # concatenate it into one long string
    find_str += line

for line in replace: 
    replace_str += line


new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)

But this seems so convoluted and inefficient for a large data file like mine.但是对于像我这样的大型数据文件来说，这似乎非常复杂且效率低下。 Also, the replace function appears to be deprecated so that's not good.此外， replace功能似乎已被弃用，所以这并不好。

Another way is to step through the lines and keep a track of which line you found a match.另一种方法是遍历线路并跟踪您找到匹配的线路。

Something like this:像这样的东西：

location = 0

LOOP1: 
for find_line in find:
    for i, data_line in enumerate(data).startingAtLine(location):
        if find_line == data_line:
            location = i # found possibility

for idx in range(NUMBER_LINES_IN_FIND):
    if find_line[idx] != data_line[idx+location]  # compare line by line
        #if the subsequent lines don't match, then go back and search again
        goto LOOP1

Not fully formed code, I know.我知道没有完全形成的代码。 I don't even know if it's possible to search through a file from a certain line on or between certain lines but again, I'm just a bit confused in the logic of it all.我什至不知道是否可以从某些行上或某些行之间的某个行搜索文件，但同样，我对这一切的逻辑有点困惑。 What is the best way to do this?做这个的最好方式是什么？

Thanks!谢谢！

Answer 1

If the file is large, you want to read and write one line at a time , so the whole thing isn't loaded into memory at once.如果文件很大，您希望一次read和write一行，因此不会立即将整个文件加载到内存中。

# create a dict of find keys and replace values
findlines = open('find.txt').read().split('\n')
replacelines = open('replace.txt').read().split('\n')
find_replace = dict(zip(findlines, replacelines))

with open('data.txt') as data:
    with open('new_data.txt', 'w') as new_data:
        for line in data:
            for key in find_replace:
                if key in line:
                    line = line.replace(key, find_replace[key])
            new_data.write(line)

Edit: I changed the code to read().split('\\n') instead of readliens() so \\n isn't included in the find and replace strings编辑：我将代码更改为read().split('\\n')而不是readliens()所以\\n不包含在查找和替换字符串中

Answer 2

couple things here:这里有几件事：

replace is not deprecated, see this discussion for details: Python 2.7: replace method of string object deprecated不推荐使用替换，有关详细信息，请参阅此讨论： Python 2.7：不推荐使用字符串对象的替换方法

If you are worried about reading data.txt in to memory all at once, you should be able to just iterate over data.txt one line at a time如果您担心一次将 data.txt 全部读入内存，您应该能够一次遍历 data.txt 一行

data = open("data.txt", 'r')
for line in data:
    # fix the line

so all that's left is coming up with a whole bunch of find/replace pairs and fixing each line.所以剩下的就是想出一大堆查找/替换对并修复每一行。 Check out the zip function for a handy way to do that查看zip功能以方便地执行此操作

find = open("find.txt", 'r').readlines()
replace = open("replace.txt", 'r').readlines()
new_data = open("new_data.txt", 'w')
for find_token, replace_token in zip(find, replace):
    new_line = line.replace(find_token, replace_token)
    new_data.write(new_line + os.linesep)

如何查找和替换文本文件中的多行？

问题描述

2 个解决方案

解决方案1
7 已采纳 2014-02-07 20:40:02

解决方案2
1 2014-02-07 20:55:35

如何查找和替换文本文件中的多行？

问题描述

2 个解决方案

解决方案1 7 已采纳 2014-02-07 20:40:02

解决方案2 1 2014-02-07 20:55:35

解决方案1
7 已采纳 2014-02-07 20:40:02

解决方案2
1 2014-02-07 20:55:35