[英]Python Search and Replace Using a List for Search
I have several lines of a file that I'm looping through and have stored as strings and I'm looking to perform a simple search and replace in each line using either the method built into python strings str.replace()
or using regular expressions re.sub()
but using a list as argument for the old substring.我有几行我正在循环并存储为字符串的文件,我希望使用 python 字符串 str.replace str.replace()
中内置的方法或使用正则表达式在每一行中执行简单的搜索和替换re.sub()
但使用列表作为旧 substring 的参数。 I know the format usually goes as follows:我知道格式通常如下:
string.replace('oldsubstring','newsubstring')
However, if I have a list of strings: ['word1', 'word2', 'word3']
, is it possible to use this as the oldsubstring
argument so that if any of the elements in the list are found in string
, that element is replaced with newsubstring
.但是,如果我有一个字符串列表: ['word1', 'word2', 'word3']
,是否可以将其用作oldsubstring
参数,以便如果在string
中找到列表中的任何元素,那元素被替换为newsubstring
。 I know this is possible using a double nested for loop that loops through all of my lines and my list of strings, but I'm looking for a more efficient algorithm to accomplish this.我知道这可以使用循环遍历我的所有行和字符串列表的双重嵌套 for 循环来实现,但我正在寻找一种更有效的算法来实现这一点。
Follow Up Question:跟进问题:
Another problem I have found is that there are times where my list of strings will look like:我发现的另一个问题是有时我的字符串列表看起来像:
['word1', 'word1_suffix', 'word2', 'word3']
NOTE: Order of these elements is not guaranteed to be the same each run. ['word1', 'word1_suffix', 'word2', 'word3']
注意:这些元素的顺序不保证每次运行都相同。
When using the double nested for loop method, if word1_suffix
appears in the current line I'm looking at, and I then loop through my list of strings, if word1
happens to appear in my list of strings first, the replacement will be newsubstring_suffix
rather than replacing the entire substring: word1_suffix
with newsubstring
.使用双嵌套for循环方法时,如果word1_suffix
出现在我正在查看的当前行中,然后我循环遍历我的字符串列表,如果word1
恰好首先出现在我的字符串列表中,则替换将是newsubstring_suffix
而不是而不是用newsubstring
word1_suffix
NOTE: I know that using a regular expression I can ensure that word1_suffix
is it's own full word surrounded by spaces, but that are times where I do want a part of my line that follows the format: word1_miscellaneous
to be replaced as newsubstring_miscellaneous
so that method will not entirely solve my problem.注意:我知道使用正则表达式我可以确保word1_suffix
是它自己的由空格包围的完整单词,但有时我确实希望我的行的一部分遵循以下格式: word1_miscellaneous
被替换为newsubstring_miscellaneous
以便该方法不会完全解决我的问题。
With re.sub
you can use the greedy character of regex to make sure word1_suffix
isn't replaced by newsubstring_suffix
:使用re.sub
您可以使用正则表达式的贪婪字符来确保word1_suffix
不会被newsubstring_suffix
替换:
your_string = "hello word1_suffix world word3"
word_list = ['word1', 'word1_suffix', 'word2', 'word3']
word_set = set(word_list)
# pattern to match all 'words' (succession of letters, digits and _):
word_pattern = re.compile(r'\w+')
print(re.sub(word_pattern, lambda x: "newsubstring" if x.group() in word_set else x.group(), your_string))
The lambda function check if the matched group is in word_set
and replace it with newsubstring
. lambda function 检查匹配组是否在word_set
中并将其替换为newsubstring
。
Output: Output:
hello newsubstring world newsubstring
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.