使用re.sub删除特定子字符串之后的所有内容

Question

I thought this would have been simple, but after 3hrs of trying multiple different re.sub combinations, the answer is still eluding me. 我以为这很简单，但是尝试3种不同的re.sub组合3小时后，答案仍然难以理解。

I have the following string: 我有以下字符串：

a = "99999 Anywhere Dr., Roanoak, VA 88888, ,"

I'd like to remove everything between the 88888 and the ending " (note there could be other characters other than space and comma, but there won't be another string of 5 digits after the 88888). I tried many combinations but the closest I got to what I was trying to accomplish was: 我想删除88888和结尾“”之间的所有内容（请注意，除了空格和逗号以外，还可以使用其他字符，但在88888之后将不会再包含5位数的字符串）。我尝试了许多组合，但最接近的组合我所要完成的工作是：

re.sub('(?=>\d{5})(.*)\".*$','',a)

This results in "99999" since it doesn't look from the end of the string but instead deletes everything after the first occurrence of the 5 digits. 这将导致出现“ 99999”，因为它不是从字符串末尾看，而是会在第一次出现5位数字后删除所有内容。 I want the result to be: 我希望结果是：

"99999 Anywhere Dr., Roanoak, VA 88888"

Thank you 谢谢

Answer 1

Rather than re.sub , I'd recommend re.search + reassignment : 而不是re.sub ，我建议re.search + 再分配 ：

m = re.search('.*\d{5}', text)
if m:
     text = m.group(0)

print(text)
'99999 Anywhere Dr., Roanoak, VA 88888'

.*        # greedy capture
\d{5}     # 5 digits

If you want to get inventive, you can reverse your string, and then call re.sub , so you look from the start. 如果您想发挥创造力，可以反转字符串，然后调用re.sub ，以便从头开始。

text = re.sub('^.*?(?=\d{5})', '', text[::-1])[::-1]
print(text)
'99999 Anywhere Dr., Roanoak, VA 88888'

Reversing the string lets you use a lookahead now, which simplifies things. 反转字符串使您现在可以使用前瞻功能，从而简化了操作。

^           # start of line
.*?         # non-greedy capture
(?=         # lookahead 
\d{5}       # 5 digits
)

Answer 2

Using re.match: 使用重新匹配：

>>> import re
>>> a = "99999 Anywhere Dr., Roanoak, VA 88888, ,"
>>> re.match(r'^.*[\d{5}]?\d{5}', a).group(0)
'99999 Anywhere Dr., Roanoak, VA 88888'

or re.search: 或研究：

>>> re.search(r'^.*[\d{5}]?\d{5}', a).group(0)
'99999 Anywhere Dr., Roanoak, VA 88888'

使用re.sub删除特定子字符串之后的所有内容

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-10-04 00:35:15

解决方案2
0 2017-10-04 01:32:21

使用re.sub删除特定子字符串之后的所有内容

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-10-04 00:35:15

解决方案2 0 2017-10-04 01:32:21

解决方案1
2 已采纳 2017-10-04 00:35:15

解决方案2
0 2017-10-04 01:32:21