使用正则表达式匹配和替换

Question

There is a list of string A which is some how matching with another list of string B. I wanted to replace string A with list of matching string B using regular expression.有一个字符串 A 的列表，它与另一个字符串 B 的列表如何匹配。我想使用正则表达式将字符串 A 替换为匹配的字符串 B 的列表。 However I am not getting the correct result.但是我没有得到正确的结果。

The solution should be A == ["Yogesh","Numita","Hero","Yogesh"] .解决方案应该是A == ["Yogesh","Numita","Hero","Yogesh"] 。

import re

A = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar"]
B=["Yogesh","Numita","Hero"]

for i in A:
    for j in B:
        replaced=re.sub('i','j',i)
        
print(replaced)

Answer 1

this one works to me:这个对我有用：

lst=[]
for a in A:
    lst.append([b for b in B if b.lower() in a.lower()][0])

This returns element from list B if it is found at A list.如果在 A 列表中找到元素，则返回列表 B 中的元素。 It's necessary to compare lowercased words.有必要比较小写单词。 The [0] is added for getting string instead of list from comprehension list.添加[0]是为了从理解列表中获取字符串而不是列表。

Answer 2

If looping over B , you don't need a regular expression;如果循环B ，则不需要正则表达式； you can simply use membership testing .您可以简单地使用会员测试。

A regex might result in better performance, as membership testing will scan each string in A for every string in B , resulting in O(len(A) * len(B) performance) .正则表达式可能会带来更好的性能，因为成员资格测试将扫描A中的每个字符串以查找B中的每个字符串，从而导致O(len(A) * len(B) performance) 。

As long as the individual terms don't contain any metacharacters and can appear in any context, the simplest way to form the regex is to join the entries of B with the alternation operation :只要单个术语不包含任何元字符并且可以出现在任何上下文中，形成正则表达式的最简单方法是将B的条目与交替操作连接：

reTerms = re.compile('|'.join(B), re.I)

However, to be safe, the entries should first be escaped, in case any contains a metacharacter:但是，为了安全起见，应首先对条目进行转义，以防任何包含元字符：

# map-based
reTerms = re.compile('|'.join(map(re.escape, B)), re.I)
# comprehension-based
reTerms = re.compile('|'.join([re.escape(b) for b in B]), re.I)

If there is any restrictions on the context the terms appear in, sub-patterns for the restrictions would need to be prepended and appended to the pattern.如果对术语出现的上下文有任何限制，则需要将限制的子模式添加到模式之前并附加到模式中。 For example, if the terms must appear as full words:例如，如果术语必须显示为完整的单词：

reTerms = re.compile(f"\b(?:{'|'.join(map(re.escape, B))})\b", re.I)

This regex can be applied to each item of A to get the matching text:可以将此正则表达式应用于A的每个项目以获取匹配的文本：

replaced = [reTerms.search(name).group(0) for name in A]
# result: ['yogesh', 'Numita', 'Hero', 'Yogesh']

Since the terms in the regex are straight string matches, the content will be correct, but the case may not.由于正则表达式中的术语是直接字符串匹配，因此内容将是正确的，但大小写可能不正确。 This could be corrected by a normalization step, passing the matched text through a dict :这可以通过规范化步骤来纠正，通过dict传递匹配的文本：

normed = {term.lower():term for term in B}

replaced = [normed[reTerms.search(name).group(0).lower()] for name in A]
# result: ['Yogesh', 'Numita', 'Hero', 'Yogesh']

One issue remains: what if an item of A doesn't match?一个问题仍然存在：如果 A 的A不匹配怎么办？ Then reTerms.search returns None , which doesn't have a group attribute.然后reTerms.search返回None ，它没有group属性。 If None -propagating attribute access is added to Python (such as suggested by PEP 505 ), this would be easily addressed by using such:如果将None -propagating 属性访问添加到 Python （例如PEP 505建议的），则可以使用以下方法轻松解决此问题：

names = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar", "hrithikroshan"]
normed[None] = None
replaced = [normed[reTerms.search(name)?.group(0).lower()] for name in names]

In the absence of such a feature, there are various approaches, such as using a ternary expression and walrus assignment .在没有这种特征的情况下，有多种方法，例如使用三元表达式和海象赋值。 In the sample below, a list is used as a stand-in to provide a default value for the match:在下面的示例中，使用列表作为替代项来为匹配项提供默认值：

import re

names = ["yogeshgovindan","TNumita","Herohonda","Yogeshkumar", "hrithikroshan"]
terms = ["Yogesh","Numita","Hero"]
normed = {term.lower():term for term in terms}
normed[''] = None

reTerms = re.compile('|'.join(map(re.escape, terms)), re.I)

# index may need to be changed if `reTerms` includes any context
[normed[(reTerms.search(sentence) or [''])[0].lower()] for sentence in sentences]

使用正则表达式匹配和替换

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-05-17 16:55:39

解决方案2
1 2022-09-21 08:40:35

使用正则表达式匹配和替换

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-05-17 16:55:39

解决方案2 1 2022-09-21 08:40:35

解决方案1
1 已采纳 2022-05-17 16:55:39

解决方案2
1 2022-09-21 08:40:35