正则表达式删除单词和查找单词

Question

I would like to find and delete the https part in a sentence.我想在一句话中找到并删除https部分。 I use re.search("^https://t.co/.*[a-zA-Z]" ,data)` and the result is:我使用re.search("^https://t.co/.*[a-zA-Z]" ,data)` 结果是：

match='https://xx.x/ekGSeJufuH 7 jalan indonesia yang pa

match='https://xx.x/okbymT3g'

But I want to just take match='https://xx.x/ekGSeJufuH and delete while keeping the rest of the word.但我只想取match='https://xx.x/ekGSeJufuH并删除，同时保留单词的 rest。 I there something wrong with my regex?我的正则表达式有问题吗？

Answer 1

.* matches any characters including whitespace. .* 匹配任何字符，包括空格。

An easier way is that一个更简单的方法是

find a sentense starting with 'https://',找到以“https://”开头的句子，
find the first whitespace(' ') in the sentence,找到句子中的第一个空格（''），
delete substring before the whitespace.删除空格前的 substring。

I think it works because the URL doesn't allow any WS inside.我认为它有效，因为 URL 不允许内部有任何 WS。

Answer 2

From what I understand, you just want to exclude the "https://" from the string.据我了解，您只想从字符串中排除"https://" 。 If so, this may be a regular expression you're looking for:如果是这样，这可能是您正在寻找的正则表达式：

r"https://(.*)"

Using the above regular expression with the addresses you provided:将上述正则表达式与您提供的地址一起使用：

>>> regex = re.compile(r"https://(.*)")
>>> regex.search("https://xx.x/ekGSeJufuH 7 jalan indonesia yang pa").group(1)
'xx.x/ekGSeJufuH 7 jalan indonesia yang pa'
>>> regex.search("https://xx.x/okbymT3g").group(1)
'xx.x/okbymT3g'

If there are more criteria for the regular expression which I missed, just comment on my answer and I'll update the regular expression accordingly.如果我错过的正则表达式有更多标准，只需评论我的答案，我会相应地更新正则表达式。

Answer 3

I tried to tweak it a little bit and solves it with我试着稍微调整一下并解决它

re.search("^https://t.co/\S*",txt)

正则表达式删除单词和查找单词

问题描述

3 个解决方案

解决方案1
1 2021-03-23 03:11:15

解决方案2
1 2021-03-23 03:15:46

解决方案3
-1 2021-03-23 03:17:17

正则表达式删除单词和查找单词

问题描述

3 个解决方案

解决方案1 1 2021-03-23 03:11:15

解决方案2 1 2021-03-23 03:15:46

解决方案3 -1 2021-03-23 03:17:17

解决方案1
1 2021-03-23 03:11:15

解决方案2
1 2021-03-23 03:15:46

解决方案3
-1 2021-03-23 03:17:17