[英]python regex - How do I replace a group from a txt file to another group from another txt file?
So, I have the following txt files: 所以,我有以下txt文件:
test1.txt (It's all in the same line.) test1.txt(全部在同一行。)
(hello)(bye)
text2.txt (It's in two different lines.) text2.txt(它有两个不同的行。)
(This actually works)
(Amazing!)
And I have the following regex pattern 我有以下正则表达式模式
\((.*?)\)
Which obviously selects all the words that are inside the parenthesis. 这显然选择了括号内的所有单词。
What I want to do is to replace the words inside the () in test1.txt with the words inside the () in test2.txt, leaving test1.txt like: 我想要做的是将test1.txt中的()内的单词替换为test2.txt中的()内的单词,将test1.txt保留为:
(This actually works)(Amazing!)
I tried the following code, but it doesn't seem to work. 我尝试了以下代码,但它似乎不起作用。 What did I do wrong?
我做错了什么?
import re
pattern = re.compile("\((.*?)\)")
for line in enumerate(open("test1.txt")):
match = re.finditer(pattern, line)
for line in enumerate(open("test2.txt")):
pattern.sub(match, line)
I think I made a very big error, it's one of my first programs in python. 我想我犯了一个很大的错误,这是我在python中的第一个程序之一。
Okay, there are several problems: 好的,有几个问题:
finditer
method returns a match object, not a string. finditer
方法返回匹配对象,而不是字符串。 findall
returns a list of matched string groups findall
返回匹配的字符串组列表 line
was not a line but a list of [line_number, line_string_content]
. line
不是一行而是一个[line_number, line_string_content]
的列表。 I use it in last code block. So you can try to first catch the content: 所以你可以尝试先抓住内容:
pattern = re.compile("\((.*?)\)")
for line in open("test2.txt"):
match = pattern.findall(line)
#match contains the list ['Amazing!'] from the last line of test2, your variable match is overwritten on each line of the file...
note: If you compile your pattern, you can use it as object to call the re methods. 注意:如果编译模式,可以将其用作对象来调用re方法。
If you want to do it line by line (big file?). 如果你想逐行(大文件?)这样做。
An other option whould be to load the entire file and create a multiline regex. 另一种选择是加载整个文件并创建多行正则表达式。
matches = []
for line in open("test2.txt"):
matches.extend(pattern.findall(line))
#matches contains the list ['This actually works','Amazing!']
Then replace the content of the parenthesis by you matches items: 然后用匹配项替换括号内容:
for line in open("test1.txt"):
for i, match in enumerate(pattern.findall(line)):
re.sub(match, matches[i], line)
note: doing this will raise exception if there is more (string in parenthesis)
in test1.txt than in test2.txt... 注意:如果test1.txt中有更多
(string in parenthesis)
中的(string in parenthesis)
而不是test2.txt,这样做会引发异常...
If you want to write an output file you should do 如果你想写一个输出文件,你应该这样做
with open('fileout.txt', 'w') as outfile:
for line in enumerate(open("test1.txt")):
#another writing for the same task (in one line!)
newline = [re.sub(match, matches[i], line) for i, match in enumerate(pattern.findall(line))][0]
outfile.write(newline)
You can use the feature of re.sub()
to allow a callable as a replacement pattern and create on-the-spot lambda
function to go through your matches from test2.txt
to achieve your result, eg 你可以使用
re.sub()
的特性来允许一个callable作为替换模式,并创建一个现场lambda
函数来完成你的test2.txt
匹配,以实现你的结果,例如
import re
# slightly changed to use lookahead and lookbehind groups for a proper match/substitution
pattern = re.compile(r"(?<=\()(.*?)(?=\))")
# you can also use r"(\(.*?\))" if you're preserving the brackets
with open("test2.txt", "r") as f: # open test2.txt for reading
words = pattern.findall(f.read()) # grabs all the found words in test2.txt
with open("test1.txt", "r+") as f: # open test1.txt for reading and writing
# read the content of test1.txt and replace each match with the next `words` list value
content = pattern.sub(lambda x: words.pop(0) if words else x.group(), f.read())
f.seek(0) # rewind the file to the beginning
f.write(content) # write the new, 'updated' content
f.truncate() # truncate the rest of the file (if any)
For test1.txt
containing: 对于包含以下内容的
test1.txt
:
(hello)(bye)
and test2.txt
containing: 和
test2.txt
包含:
(This actually works) (Amazing!)
executing the above script will change test1.txt
to: 执行上面的脚本会将
test1.txt
更改为:
(This actually works)(Amazing!)
It will also account for mismatches in the files by iterative replacing only up to how many matches were found in test2.txt
(eg if your test1.txt
contained (hello)(bye)(pie)
it will be changed to (This actually works)(Amazing!)(pie)
). 它还会通过迭代替换在
test2.txt
中找到多少匹配来解释文件中的不匹配(例如,如果你的test1.txt
包含(hello)(bye)(pie)
它将被更改为(This actually works)(Amazing!)(pie)
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.