简体   繁体   English

Python-用不同的替换替换字符串中的多重匹配

[英]Python - replace multipe matches in a string with different replacements

I have two text files and replace the XXX placeholders with the actual matches from the second file - in the order that is given in the second file. 我有两个文本文件,并用第二个文件中的实际匹配项替换XXX占位符-按照第二个文件中给出的顺序。

The first text is a file with multiple lines and multiple placeholders in one line. 第一个文本是一个在一行中包含多行和多个占位符的文件。

The European Union consists of the following states XXX, XXX, XXX, XXX, XXX, .... The three biggest nations within the European Union are XXX, XXX, XXX. 欧盟由以下国家组成:XXX,XXX,XXX,XXX,XXX,...。欧盟内三个最大的国家是XXX,XXX,XXX。

The second file is a list with one match per line: 第二个文件是每行一个匹配项的列表:

Poland Netherlands Denmark Spain Italy Germany France 波兰荷兰丹麦西班牙意大利意大利德国法国

I'd like to have it replaced as following: 我希望将其替换为以下内容:

The European Union consists of the following states Poland, Netherlands, Denmark, Spain, Italy, .... The three biggest nations within the European Union are Germany, France, XXX. 欧盟包括以下国家:波兰,荷兰,丹麦,西班牙,意大利,...。欧盟内三个最大的国家是德国,法国,XXX。

So far I've got this coded: 到目前为止,我已经对此进行了编码:

import re
file1 = open("text.txt")

file2 = open("countries.txt") 
output = open("output.txt", "w")
countrylist = []

i=0
for line in file2:
    countrylist[i:] = verweise
    i=i+1

j=0
for line in file1:
    if "XXX" in line:
        line = re.sub("XXX", countrylist[j], line)
        j=j+1
    output.write(line)
    output.flush()
output.close

My problem is that the regular expression replacement is valid not only for the first occurrence/match but for the whole first line. 我的问题是正则表达式替换不仅对第一次出现/匹配有效,而且对整个第一行都是有效的。 So my output looks like this right now: 所以我的输出现在看起来像这样:

The European Union consists of the following states Poland, Poland, Poland, Poland, Poland, .... The three biggest nations within the European Union are Netherlands, Netherlands, Netherlands. 欧盟由以下国家组成:波兰,波兰,波兰,波兰,波兰,...。欧盟内三个最大的国家是荷兰,荷兰,荷兰。

How can I match every single occurrence of XXX to one line of my country list? 如何将每次出现的XXX与我的国家/地区列表中的一行匹配?

Thanks for any help! 谢谢你的帮助!

在re模块.sub(replacement, string[, count=0]) count = 1应该仅替换第一次出现的情况。

You can call a function for each match the sub finds: 您可以为sub匹配项找到的每个匹配项调用一个函数:

countries = [ 'Poland', 'Netherlands', 'Denmark', 'Spain', 'Italy' ]

def f(match, countriesIter=iter(countries)):
    return countriesIter.next()

line = "The European Union consists of the following states XXX, XXX, XXX, XXX, XXX"

print re.compile('XXX').sub(f, line)

This will print: 这将打印:

The European Union consists of the following states Poland, Netherlands, Denmark, Spain, Italy

Depending on your knowledge it might be better to use a global counter to step through the list of country names: 根据您的知识,最好使用全局计数器逐步浏览国家/地区名称列表:

count = 0
def f(match):
  global count
  result = countries[count]
  count += 1
  return result

This is less elegant but way better to understand in case you have no deeper experience with the Python internals and generators etc. 如果您对Python内部和生成器等没有更深入的了解,这会显得不太优雅,但是更容易理解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM