[英]How to find and replace multiple lines in text file?
I am running Python 2.7.我正在运行 Python 2.7。
I have three text files: data.txt
, find.txt
, and replace.txt
.我有三个文本文件:
data.txt
, find.txt
和replace.txt
。 Now, find.txt
contains several lines that I want to search for in data.txt
and replace that section with the content in replace.txt
.现在,
find.txt
包含几行文字,我想搜索data.txt
和替换的内容部分replace.txt
。 Here is a simple example:这是一个简单的例子:
data.txt数据.txt
pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit
find.txt查找.txt
apple
banana
cherry
replace.txt替换.txt
1
2
3
So, in the above example, I want to search for all occurences of apple
, banana
, and cherry
in the data and replace those lines with 1,2,3
.因此,在上面的示例中,我想在数据中搜索所有出现的
apple
、 banana
和cherry
并将这些行替换为1,2,3
。
I am having some trouble with the right approach to this as my data.txt
is about 1MB so I want to be as efficient as possible.我在使用正确的方法时遇到了一些麻烦,因为我的
data.txt
大约为 1MB,所以我希望尽可能高效。 One dumb way is to concatenate everything into one long string and use replace
, and then output to a new text file so all the line breaks will be restored.一种愚蠢的方法是将所有内容连接成一个长字符串并使用
replace
,然后输出到一个新的文本文件,以便恢复所有换行符。
import re
data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')
data_str = ""
find_str = ""
replace_str = ""
for line in data: # concatenate it into one long string
data_str += line
for line in find: # concatenate it into one long string
find_str += line
for line in replace:
replace_str += line
new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)
But this seems so convoluted and inefficient for a large data file like mine.但是对于像我这样的大型数据文件来说,这似乎非常复杂且效率低下。 Also, the
replace
function appears to be deprecated so that's not good.此外,
replace
功能似乎已被弃用,所以这并不好。
Another way is to step through the lines and keep a track of which line you found a match.另一种方法是遍历线路并跟踪您找到匹配的线路。
Something like this:像这样的东西:
location = 0
LOOP1:
for find_line in find:
for i, data_line in enumerate(data).startingAtLine(location):
if find_line == data_line:
location = i # found possibility
for idx in range(NUMBER_LINES_IN_FIND):
if find_line[idx] != data_line[idx+location] # compare line by line
#if the subsequent lines don't match, then go back and search again
goto LOOP1
Not fully formed code, I know.我知道没有完全形成的代码。 I don't even know if it's possible to search through a file from a certain line on or between certain lines but again, I'm just a bit confused in the logic of it all.
我什至不知道是否可以从某些行上或某些行之间的某个行搜索文件,但同样,我对这一切的逻辑有点困惑。 What is the best way to do this?
做这个的最好方式是什么?
Thanks!谢谢!
If the file is large, you want to read
and write
one line at a time , so the whole thing isn't loaded into memory at once.如果文件很大,您希望一次
read
和write
一行,因此不会立即将整个文件加载到内存中。
# create a dict of find keys and replace values
findlines = open('find.txt').read().split('\n')
replacelines = open('replace.txt').read().split('\n')
find_replace = dict(zip(findlines, replacelines))
with open('data.txt') as data:
with open('new_data.txt', 'w') as new_data:
for line in data:
for key in find_replace:
if key in line:
line = line.replace(key, find_replace[key])
new_data.write(line)
Edit: I changed the code to read().split('\\n')
instead of readliens()
so \\n
isn't included in the find and replace strings编辑:我将代码更改为
read().split('\\n')
而不是readliens()
所以\\n
不包含在查找和替换字符串中
couple things here:这里有几件事:
replace is not deprecated, see this discussion for details: Python 2.7: replace method of string object deprecated不推荐使用替换,有关详细信息,请参阅此讨论: Python 2.7:不推荐使用字符串对象的替换方法
If you are worried about reading data.txt in to memory all at once, you should be able to just iterate over data.txt one line at a time如果您担心一次将 data.txt 全部读入内存,您应该能够一次遍历 data.txt 一行
data = open("data.txt", 'r')
for line in data:
# fix the line
so all that's left is coming up with a whole bunch of find/replace pairs and fixing each line.所以剩下的就是想出一大堆查找/替换对并修复每一行。 Check out the zip function for a handy way to do that
查看zip功能以方便地执行此操作
find = open("find.txt", 'r').readlines()
replace = open("replace.txt", 'r').readlines()
new_data = open("new_data.txt", 'w')
for find_token, replace_token in zip(find, replace):
new_line = line.replace(find_token, replace_token)
new_data.write(new_line + os.linesep)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.