简体   繁体   English

如何查找和替换文本文件中的多行?

[英]How to find and replace multiple lines in text file?

I am running Python 2.7.我正在运行 Python 2.7。

I have three text files: data.txt , find.txt , and replace.txt .我有三个文本文件: data.txtfind.txtreplace.txt Now, find.txt contains several lines that I want to search for in data.txt and replace that section with the content in replace.txt .现在, find.txt包含几行文字,我想搜索data.txt和替换的内容部分replace.txt Here is a simple example:这是一个简单的例子:

data.txt数据.txt

pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit

find.txt查找.txt

apple
banana
cherry

replace.txt替换.txt

1
2
3

So, in the above example, I want to search for all occurences of apple , banana , and cherry in the data and replace those lines with 1,2,3 .因此,在上面的示例中,我想在数据中搜索所有出现的applebananacherry并将这些行替换为1,2,3

I am having some trouble with the right approach to this as my data.txt is about 1MB so I want to be as efficient as possible.我在使用正确的方法时遇到了一些麻烦,因为我的data.txt大约为 1MB,所以我希望尽可能高效。 One dumb way is to concatenate everything into one long string and use replace , and then output to a new text file so all the line breaks will be restored.一种愚蠢的方法是将所有内容连接成一个长字符串并使用replace ,然后输出到一个新的文本文件,以便恢复所有换行符。

import re

data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')

data_str = ""
find_str = ""
replace_str = "" 

for line in data: # concatenate it into one long string
    data_str += line

for line in find: # concatenate it into one long string
    find_str += line

for line in replace: 
    replace_str += line


new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)

But this seems so convoluted and inefficient for a large data file like mine.但是对于像我这样的大型数据文件来说,这似乎非常复杂且效率低下。 Also, the replace function appears to be deprecated so that's not good.此外, replace功能似乎已被弃用,所以这并不好。

Another way is to step through the lines and keep a track of which line you found a match.另一种方法是遍历线路并跟踪您找到匹配的线路。

Something like this:像这样的东西:

location = 0

LOOP1: 
for find_line in find:
    for i, data_line in enumerate(data).startingAtLine(location):
        if find_line == data_line:
            location = i # found possibility

for idx in range(NUMBER_LINES_IN_FIND):
    if find_line[idx] != data_line[idx+location]  # compare line by line
        #if the subsequent lines don't match, then go back and search again
        goto LOOP1

Not fully formed code, I know.我知道没有完全形成的代码。 I don't even know if it's possible to search through a file from a certain line on or between certain lines but again, I'm just a bit confused in the logic of it all.我什至不知道是否可以从某些行上或某些行之间的某个行搜索文件,但同样,我对这一切的逻辑有点困惑。 What is the best way to do this?做这个的最好方式是什么?

Thanks!谢谢!

If the file is large, you want to read and write one line at a time , so the whole thing isn't loaded into memory at once.如果文件很大,您希望一次readwrite一行,因此不会立即将整个文件加载到内存中。

# create a dict of find keys and replace values
findlines = open('find.txt').read().split('\n')
replacelines = open('replace.txt').read().split('\n')
find_replace = dict(zip(findlines, replacelines))

with open('data.txt') as data:
    with open('new_data.txt', 'w') as new_data:
        for line in data:
            for key in find_replace:
                if key in line:
                    line = line.replace(key, find_replace[key])
            new_data.write(line)

Edit: I changed the code to read().split('\\n') instead of readliens() so \\n isn't included in the find and replace strings编辑:我将代码更改为read().split('\\n')而不是readliens()所以\\n不包含在查找和替换字符串中

couple things here:这里有几件事:

replace is not deprecated, see this discussion for details: Python 2.7: replace method of string object deprecated不推荐使用替换,有关详细信息,请参阅此讨论: Python 2.7:不推荐使用字符串对象的替换方法

If you are worried about reading data.txt in to memory all at once, you should be able to just iterate over data.txt one line at a time如果您担心一次将 data.txt 全部读入内存,您应该能够一次遍历 data.txt 一行

data = open("data.txt", 'r')
for line in data:
    # fix the line

so all that's left is coming up with a whole bunch of find/replace pairs and fixing each line.所以剩下的就是想出一大堆查找/替换对并修复每一行。 Check out the zip function for a handy way to do that查看zip功能以方便地执行此操作

find = open("find.txt", 'r').readlines()
replace = open("replace.txt", 'r').readlines()
new_data = open("new_data.txt", 'w')
for find_token, replace_token in zip(find, replace):
    new_line = line.replace(find_token, replace_token)
    new_data.write(new_line + os.linesep)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM