python regex - 如何将txt文件中的组替换为另一个txt文件中的另一个组？

Question

So, I have the following txt files: 所以，我有以下txt文件：

test1.txt (It's all in the same line.) test1.txt（全部在同一行。）

(hello)(bye)

text2.txt (It's in two different lines.) text2.txt（它有两个不同的行。）

(This actually works)
(Amazing!)

And I have the following regex pattern 我有以下正则表达式模式

\((.*?)\)

Which obviously selects all the words that are inside the parenthesis. 这显然选择了括号内的所有单词。

What I want to do is to replace the words inside the () in test1.txt with the words inside the () in test2.txt, leaving test1.txt like: 我想要做的是将test1.txt中的（）内的单词替换为test2.txt中的（）内的单词，将test1.txt保留为：

(This actually works)(Amazing!)

I tried the following code, but it doesn't seem to work. 我尝试了以下代码，但它似乎不起作用。 What did I do wrong? 我做错了什么？

import re

pattern = re.compile("\((.*?)\)")

for line in enumerate(open("test1.txt")):
    match = re.finditer(pattern, line)

for line in enumerate(open("test2.txt")):
    pattern.sub(match, line)

I think I made a very big error, it's one of my first programs in python. 我想我犯了一个很大的错误，这是我在python中的第一个程序之一。

Answer 1

Okay, there are several problems: 好的，有几个问题：

finditer method returns a match object, not a string. finditer方法返回匹配对象，而不是字符串。 findall returns a list of matched string groups findall返回匹配的字符串组列表
you do the contrary you said. 你说的是相反的。 Do you want to replace data in test1 by data from test2 don't you? 你想用test2中的数据替换test1中的数据不是吗？
enumerate returns a tuple so your var line was not a line but a list of [line_number, line_string_content] . 枚举返回一个元组，因此你的var line不是一行而是一个[line_number, line_string_content]的列表。 I use it in last code block. 我在最后一个代码块中使用它。

So you can try to first catch the content: 所以你可以尝试先抓住内容：

pattern = re.compile("\((.*?)\)")
for line in open("test2.txt"):
    match = pattern.findall(line)
#match contains the list ['Amazing!'] from the last line of test2, your variable match is overwritten on each line of the file...

note: If you compile your pattern, you can use it as object to call the re methods. 注意：如果编译模式，可以将其用作对象来调用re方法。

If you want to do it line by line (big file?). 如果你想逐行（大文件？）这样做。
An other option whould be to load the entire file and create a multiline regex. 另一种选择是加载整个文件并创建多行正则表达式。

matches = []
for line in open("test2.txt"):
    matches.extend(pattern.findall(line))
#matches contains the list ['This actually works','Amazing!']

Then replace the content of the parenthesis by you matches items: 然后用匹配项替换括号内容：

for line in open("test1.txt"):
    for i, match in enumerate(pattern.findall(line)):
        re.sub(match, matches[i], line)

note: doing this will raise exception if there is more (string in parenthesis) in test1.txt than in test2.txt... 注意：如果test1.txt中有更多(string in parenthesis)中的(string in parenthesis)而不是test2.txt，这样做会引发异常...

If you want to write an output file you should do 如果你想写一个输出文件，你应该这样做

with open('fileout.txt', 'w') as outfile:
    for line in enumerate(open("test1.txt")):
        #another writing for the same task (in one line!)
        newline = [re.sub(match, matches[i], line) for i, match in enumerate(pattern.findall(line))][0]
        outfile.write(newline)

Answer 2

You can use the feature of re.sub() to allow a callable as a replacement pattern and create on-the-spot lambda function to go through your matches from test2.txt to achieve your result, eg 你可以使用re.sub()的特性来允许一个callable作为替换模式，并创建一个现场lambda函数来完成你的test2.txt匹配，以实现你的结果，例如

import re

# slightly changed to use lookahead and lookbehind groups for a proper match/substitution
pattern = re.compile(r"(?<=\()(.*?)(?=\))")
# you can also use r"(\(.*?\))" if you're preserving the brackets

with open("test2.txt", "r") as f:  # open test2.txt for reading
    words = pattern.findall(f.read())  # grabs all the found words in test2.txt

with open("test1.txt", "r+") as f:  # open test1.txt for reading and writing
    # read the content of test1.txt and replace each match with the next `words` list value
    content = pattern.sub(lambda x: words.pop(0) if words else x.group(), f.read())
    f.seek(0)  # rewind the file to the beginning
    f.write(content)  # write the new, 'updated' content
    f.truncate()  # truncate the rest of the file (if any)

For test1.txt containing: 对于包含以下内容的test1.txt ：

(hello)(bye)

and test2.txt containing: 和test2.txt包含：

(This actually works)
(Amazing!)

executing the above script will change test1.txt to: 执行上面的脚本会将test1.txt更改为：

(This actually works)(Amazing!)

It will also account for mismatches in the files by iterative replacing only up to how many matches were found in test2.txt (eg if your test1.txt contained (hello)(bye)(pie) it will be changed to (This actually works)(Amazing!)(pie) ). 它还会通过迭代替换在test2.txt中找到多少匹配来解释文件中的不匹配（例如，如果你的test1.txt包含(hello)(bye)(pie)它将被更改为(This actually works)(Amazing!)(pie) ）。

python regex - 如何将txt文件中的组替换为另一个txt文件中的另一个组？

问题描述

2 个解决方案

解决方案1
2 2017-07-19 16:18:30

解决方案2
0 2017-07-19 16:09:19

python regex - 如何将txt文件中的组替换为另一个txt文件中的另一个组？

问题描述

2 个解决方案

解决方案1 2 2017-07-19 16:18:30

解决方案2 0 2017-07-19 16:09:19

解决方案1
2 2017-07-19 16:18:30

解决方案2
0 2017-07-19 16:09:19