简体   繁体   English

Python 使用列表进行搜索和替换

[英]Python Search and Replace Using a List for Search

I have several lines of a file that I'm looping through and have stored as strings and I'm looking to perform a simple search and replace in each line using either the method built into python strings str.replace() or using regular expressions re.sub() but using a list as argument for the old substring.我有几行我正在循环并存储为字符串的文件,我希望使用 python 字符串 str.replace str.replace()中内置的方法或使用正则表达式在每一行中执行简单的搜索和替换re.sub()但使用列表作为旧 substring 的参数。 I know the format usually goes as follows:我知道格式通常如下:

string.replace('oldsubstring','newsubstring')

However, if I have a list of strings: ['word1', 'word2', 'word3'] , is it possible to use this as the oldsubstring argument so that if any of the elements in the list are found in string , that element is replaced with newsubstring .但是,如果我有一个字符串列表: ['word1', 'word2', 'word3'] ,是否可以将其用作oldsubstring参数,以便如果在string中找到列表中的任何元素,那元素被替换为newsubstring I know this is possible using a double nested for loop that loops through all of my lines and my list of strings, but I'm looking for a more efficient algorithm to accomplish this.我知道这可以使用循环遍历我的所有行和字符串列表的双重嵌套 for 循环来实现,但我正在寻找一种更有效的算法来实现这一点。

Follow Up Question:跟进问题:

Another problem I have found is that there are times where my list of strings will look like:我发现的另一个问题是有时我的字符串列表看起来像:

['word1', 'word1_suffix', 'word2', 'word3'] NOTE: Order of these elements is not guaranteed to be the same each run. ['word1', 'word1_suffix', 'word2', 'word3']注意:这些元素的顺序不保证每次运行都相同。

When using the double nested for loop method, if word1_suffix appears in the current line I'm looking at, and I then loop through my list of strings, if word1 happens to appear in my list of strings first, the replacement will be newsubstring_suffix rather than replacing the entire substring: word1_suffix with newsubstring .使用双嵌套for循环方法时,如果word1_suffix出现在我正在查看的当前行中,然后我循环遍历我的字符串列表,如果word1恰好首先出现在我的字符串列表中,则替换将是newsubstring_suffix而不是而不是用newsubstring word1_suffix

NOTE: I know that using a regular expression I can ensure that word1_suffix is it's own full word surrounded by spaces, but that are times where I do want a part of my line that follows the format: word1_miscellaneous to be replaced as newsubstring_miscellaneous so that method will not entirely solve my problem.注意:我知道使用正则表达式我可以确保word1_suffix是它自己的由空格包围的完整单词,但有时我确实希望我的行的一部分遵循以下格式: word1_miscellaneous被替换为newsubstring_miscellaneous以便该方法不会完全解决我的问题。

With re.sub you can use the greedy character of regex to make sure word1_suffix isn't replaced by newsubstring_suffix :使用re.sub您可以使用正则表达式的贪婪字符来确保word1_suffix不会被newsubstring_suffix替换:

your_string = "hello word1_suffix world word3"

word_list = ['word1', 'word1_suffix', 'word2', 'word3']
word_set = set(word_list)

# pattern to match all 'words' (succession of letters, digits and _):
word_pattern = re.compile(r'\w+')
print(re.sub(word_pattern, lambda x: "newsubstring" if x.group() in word_set else x.group(), your_string))

The lambda function check if the matched group is in word_set and replace it with newsubstring . lambda function 检查匹配组是否在word_set中并将其替换为newsubstring

Output: Output:

hello newsubstring world newsubstring

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM