[英]Regex deleting word and finding word
I would like to find and delete the https part in a sentence.我想在一句话中找到并删除https部分。 I use re.search("^https://t.co/.*[a-zA-Z]"
,data)` and the result is:我使用re.search("^https://t.co/.*[a-zA-Z]"
,data)` 结果是:
match='https://xx.x/ekGSeJufuH 7 jalan indonesia yang pa
match='https://xx.x/okbymT3g'
But I want to just take match='https://xx.x/ekGSeJufuH
and delete while keeping the rest of the word.但我只想取match='https://xx.x/ekGSeJufuH
并删除,同时保留单词的 rest。 I there something wrong with my regex?我的正则表达式有问题吗?
.* matches any characters including whitespace. .* 匹配任何字符,包括空格。
An easier way is that一个更简单的方法是
I think it works because the URL doesn't allow any WS inside.我认为它有效,因为 URL 不允许内部有任何 WS。
From what I understand, you just want to exclude the "https://"
from the string.据我了解,您只想从字符串中排除"https://"
。 If so, this may be a regular expression you're looking for:如果是这样,这可能是您正在寻找的正则表达式:
r"https://(.*)"
Using the above regular expression with the addresses you provided:将上述正则表达式与您提供的地址一起使用:
>>> regex = re.compile(r"https://(.*)")
>>> regex.search("https://xx.x/ekGSeJufuH 7 jalan indonesia yang pa").group(1)
'xx.x/ekGSeJufuH 7 jalan indonesia yang pa'
>>> regex.search("https://xx.x/okbymT3g").group(1)
'xx.x/okbymT3g'
If there are more criteria for the regular expression which I missed, just comment on my answer and I'll update the regular expression accordingly.如果我错过的正则表达式有更多标准,只需评论我的答案,我会相应地更新正则表达式。
I tried to tweak it a little bit and solves it with我试着稍微调整一下并解决它
re.search("^https://t.co/\S*",txt)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.