简体   繁体   English

python regex - 如何将txt文件中的组替换为另一个txt文件中的另一个组?

[英]python regex - How do I replace a group from a txt file to another group from another txt file?

So, I have the following txt files: 所以,我有以下txt文件:

test1.txt (It's all in the same line.) test1.txt(全部在同一行。)

(hello)(bye)

text2.txt (It's in two different lines.) text2.txt(它有两个不同的行。)

(This actually works)
(Amazing!)

And I have the following regex pattern 我有以下正则表达式模式

\((.*?)\)

Which obviously selects all the words that are inside the parenthesis. 这显然选择了括号内的所有单词。

What I want to do is to replace the words inside the () in test1.txt with the words inside the () in test2.txt, leaving test1.txt like: 我想要做的是将test1.txt中的()内的单词替换为test2.txt中的()内的单词,将test1.txt保留为:

(This actually works)(Amazing!)

I tried the following code, but it doesn't seem to work. 我尝试了以下代码,但它似乎不起作用。 What did I do wrong? 我做错了什么?

import re

pattern = re.compile("\((.*?)\)")

for line in enumerate(open("test1.txt")):
    match = re.finditer(pattern, line)

for line in enumerate(open("test2.txt")):
    pattern.sub(match, line)

I think I made a very big error, it's one of my first programs in python. 我想我犯了一个很大的错误,这是我在python中的第一个程序之一。

Okay, there are several problems: 好的,有几个问题:

  1. finditer method returns a match object, not a string. finditer方法返回匹配对象,而不是字符串。 findall returns a list of matched string groups findall返回匹配的字符串组列表
  2. you do the contrary you said. 你说的是相反的。 Do you want to replace data in test1 by data from test2 don't you? 你想用test2中的数据替换test1中的数据不是吗?
  3. enumerate returns a tuple so your var line was not a line but a list of [line_number, line_string_content] . 枚举返回一个元组,因此你的var line不是一行而是一个[line_number, line_string_content]的列表。 I use it in last code block. 我在最后一个代码块中使用它。

So you can try to first catch the content: 所以你可以尝试先抓住内容:

pattern = re.compile("\((.*?)\)")
for line in open("test2.txt"):
    match = pattern.findall(line)
#match contains the list ['Amazing!'] from the last line of test2, your variable match is overwritten on each line of the file...

note: If you compile your pattern, you can use it as object to call the re methods. 注意:如果编译模式,可以将其用作对象来调用re方法。

If you want to do it line by line (big file?). 如果你想逐行(大文件?)这样做。
An other option whould be to load the entire file and create a multiline regex. 另一种选择是加载整个文件并创建多行正则表达式。

matches = []
for line in open("test2.txt"):
    matches.extend(pattern.findall(line))
#matches contains the list ['This actually works','Amazing!']

Then replace the content of the parenthesis by you matches items: 然后用匹配项替换括号内容:

for line in open("test1.txt"):
    for i, match in enumerate(pattern.findall(line)):
        re.sub(match, matches[i], line)

note: doing this will raise exception if there is more (string in parenthesis) in test1.txt than in test2.txt... 注意:如果test1.txt中有更多(string in parenthesis)中的(string in parenthesis)而不是test2.txt,这样做会引发异常...

If you want to write an output file you should do 如果你想写一个输出文件,你应该这样做

with open('fileout.txt', 'w') as outfile:
    for line in enumerate(open("test1.txt")):
        #another writing for the same task (in one line!)
        newline = [re.sub(match, matches[i], line) for i, match in enumerate(pattern.findall(line))][0]
        outfile.write(newline)

You can use the feature of re.sub() to allow a callable as a replacement pattern and create on-the-spot lambda function to go through your matches from test2.txt to achieve your result, eg 你可以使用re.sub()的特性来允许一个callable作为替换模式,并创建一个现场lambda函数来完成你的test2.txt匹配,以实现你的结果,例如

import re

# slightly changed to use lookahead and lookbehind groups for a proper match/substitution
pattern = re.compile(r"(?<=\()(.*?)(?=\))")
# you can also use r"(\(.*?\))" if you're preserving the brackets

with open("test2.txt", "r") as f:  # open test2.txt for reading
    words = pattern.findall(f.read())  # grabs all the found words in test2.txt

with open("test1.txt", "r+") as f:  # open test1.txt for reading and writing
    # read the content of test1.txt and replace each match with the next `words` list value
    content = pattern.sub(lambda x: words.pop(0) if words else x.group(), f.read())
    f.seek(0)  # rewind the file to the beginning
    f.write(content)  # write the new, 'updated' content
    f.truncate()  # truncate the rest of the file (if any)

For test1.txt containing: 对于包含以下内容的test1.txt

(hello)(bye)

and test2.txt containing: test2.txt包含:

(This actually works)
(Amazing!)

executing the above script will change test1.txt to: 执行上面的脚本会将test1.txt更改为:

(This actually works)(Amazing!)

It will also account for mismatches in the files by iterative replacing only up to how many matches were found in test2.txt (eg if your test1.txt contained (hello)(bye)(pie) it will be changed to (This actually works)(Amazing!)(pie) ). 它还会通过迭代替换在test2.txt中找到多少匹配来解释文件中的不匹配(例如,如果你的test1.txt包含(hello)(bye)(pie)它将被更改为(This actually works)(Amazing!)(pie) )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用 Python 中另一个 txt 文件中的行替换 txt 文件中的行 - How do I replace a line in a txt file with a line from another txt file in Python 如何在 python 中将字符串从一个 txt 文件替换为另一个 txt 文件 - How to replace strings from one txt file to another txt file, in python 我如何有效地测试一个 txt 文件中列表中的字符串是否在另一个 txt 中? - how do i efficiently test to see if a string from a list in a txt file is in another txt? 如果它们存在于另一个 txt 文件中,如何从 txt 文件中删除它们 - how to remove lines from a txt file if they exist on an another txt file 如何使用来自另一个 .txt 的列表过滤 .csv/.txt 文件 - how to filter a .csv/.txt file using a list from another .txt 如何将从 txt 文件中选择的文件复制到另一个文件夹 python - How to copy files selected from a txt file to another folder python 从 python 中的 a.txt 文件中获取一组数字的平均值 - getting average of group of numbers from a .txt file in python Python。 我如何从txt文件中拆分()? - Python. how do i do split() from txt file? 如何从 one.txt 文件中提取一堆行并将它们添加到另一个文件中? - How do I extract a bunch of lines from one .txt file and add them to another? 从txt文件中选取零件,然后使用python复制到另一个文件 - Pick parts from a txt file and copy to another file with python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM