[英]Remove everything after a particular substring using re.sub
I thought this would have been simple, but after 3hrs of trying multiple different re.sub combinations, the answer is still eluding me. 我以为这很简单,但是尝试3种不同的re.sub组合3小时后,答案仍然难以理解。
I have the following string: 我有以下字符串:
a = "99999 Anywhere Dr., Roanoak, VA 88888, ,"
I'd like to remove everything between the 88888 and the ending " (note there could be other characters other than space and comma, but there won't be another string of 5 digits after the 88888). I tried many combinations but the closest I got to what I was trying to accomplish was: 我想删除88888和结尾“”之间的所有内容(请注意,除了空格和逗号以外,还可以使用其他字符,但在88888之后将不会再包含5位数的字符串)。我尝试了许多组合,但最接近的组合我所要完成的工作是:
re.sub('(?=>\d{5})(.*)\".*$','',a)
This results in "99999" since it doesn't look from the end of the string but instead deletes everything after the first occurrence of the 5 digits. 这将导致出现“ 99999”,因为它不是从字符串末尾看,而是会在第一次出现5位数字后删除所有内容。 I want the result to be: 我希望结果是:
"99999 Anywhere Dr., Roanoak, VA 88888"
Thank you 谢谢
Rather than re.sub
, I'd recommend re.search
+ reassignment : 而不是re.sub
,我建议re.search
+ 再分配 :
m = re.search('.*\d{5}', text)
if m:
text = m.group(0)
print(text)
'99999 Anywhere Dr., Roanoak, VA 88888'
.* # greedy capture
\d{5} # 5 digits
If you want to get inventive, you can reverse your string, and then call re.sub
, so you look from the start. 如果您想发挥创造力,可以反转字符串,然后调用re.sub
,以便从头开始。
text = re.sub('^.*?(?=\d{5})', '', text[::-1])[::-1]
print(text)
'99999 Anywhere Dr., Roanoak, VA 88888'
Reversing the string lets you use a lookahead now, which simplifies things. 反转字符串使您现在可以使用前瞻功能,从而简化了操作。
^ # start of line
.*? # non-greedy capture
(?= # lookahead
\d{5} # 5 digits
)
Using re.match: 使用重新匹配:
>>> import re
>>> a = "99999 Anywhere Dr., Roanoak, VA 88888, ,"
>>> re.match(r'^.*[\d{5}]?\d{5}', a).group(0)
'99999 Anywhere Dr., Roanoak, VA 88888'
or re.search: 或研究:
>>> re.search(r'^.*[\d{5}]?\d{5}', a).group(0)
'99999 Anywhere Dr., Roanoak, VA 88888'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.