Python 使用列表进行搜索和替换

Question

I have several lines of a file that I'm looping through and have stored as strings and I'm looking to perform a simple search and replace in each line using either the method built into python strings str.replace() or using regular expressions re.sub() but using a list as argument for the old substring.我有几行我正在循环并存储为字符串的文件，我希望使用 python 字符串 str.replace str.replace()中内置的方法或使用正则表达式在每一行中执行简单的搜索和替换re.sub()但使用列表作为旧 substring 的参数。 I know the format usually goes as follows:我知道格式通常如下：

string.replace('oldsubstring','newsubstring')

However, if I have a list of strings: ['word1', 'word2', 'word3'] , is it possible to use this as the oldsubstring argument so that if any of the elements in the list are found in string , that element is replaced with newsubstring .但是，如果我有一个字符串列表： ['word1', 'word2', 'word3'] ，是否可以将其用作oldsubstring参数，以便如果在string中找到列表中的任何元素，那元素被替换为newsubstring 。 I know this is possible using a double nested for loop that loops through all of my lines and my list of strings, but I'm looking for a more efficient algorithm to accomplish this.我知道这可以使用循环遍历我的所有行和字符串列表的双重嵌套 for 循环来实现，但我正在寻找一种更有效的算法来实现这一点。

Follow Up Question:跟进问题：

Another problem I have found is that there are times where my list of strings will look like:我发现的另一个问题是有时我的字符串列表看起来像：

['word1', 'word1_suffix', 'word2', 'word3'] NOTE: Order of these elements is not guaranteed to be the same each run. ['word1', 'word1_suffix', 'word2', 'word3']注意：这些元素的顺序不保证每次运行都相同。

When using the double nested for loop method, if word1_suffix appears in the current line I'm looking at, and I then loop through my list of strings, if word1 happens to appear in my list of strings first, the replacement will be newsubstring_suffix rather than replacing the entire substring: word1_suffix with newsubstring .使用双嵌套for循环方法时，如果word1_suffix出现在我正在查看的当前行中，然后我循环遍历我的字符串列表，如果word1恰好首先出现在我的字符串列表中，则替换将是newsubstring_suffix而不是而不是用newsubstring word1_suffix

NOTE: I know that using a regular expression I can ensure that word1_suffix is it's own full word surrounded by spaces, but that are times where I do want a part of my line that follows the format: word1_miscellaneous to be replaced as newsubstring_miscellaneous so that method will not entirely solve my problem.注意：我知道使用正则表达式我可以确保word1_suffix是它自己的由空格包围的完整单词，但有时我确实希望我的行的一部分遵循以下格式： word1_miscellaneous被替换为newsubstring_miscellaneous以便该方法不会完全解决我的问题。

Answer 1

With re.sub you can use the greedy character of regex to make sure word1_suffix isn't replaced by newsubstring_suffix :使用re.sub您可以使用正则表达式的贪婪字符来确保word1_suffix不会被newsubstring_suffix替换：

your_string = "hello word1_suffix world word3"

word_list = ['word1', 'word1_suffix', 'word2', 'word3']
word_set = set(word_list)

# pattern to match all 'words' (succession of letters, digits and _):
word_pattern = re.compile(r'\w+')
print(re.sub(word_pattern, lambda x: "newsubstring" if x.group() in word_set else x.group(), your_string))

The lambda function check if the matched group is in word_set and replace it with newsubstring . lambda function 检查匹配组是否在word_set中并将其替换为newsubstring 。

Output: Output：

hello newsubstring world newsubstring

Python 使用列表进行搜索和替换

问题描述

1 个解决方案

解决方案1
0 2021-12-09 07:22:25

Python 使用列表进行搜索和替换

问题描述

1 个解决方案

解决方案1 0 2021-12-09 07:22:25

解决方案1
0 2021-12-09 07:22:25