[英]re.findall -> RegEx in Python
import regex
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = regex.findall(r"/((http[s]?:\/\/)?(www\.)?(gamivo\.com\S*){1})", frase)
print(x)
Result:结果:
[('www.gamivo.com/product/sea-of-thieves-pc-xbox-one', '', 'www.', 'gamivo.com/product/sea-of-thieves-pc-xbox-one'), ('www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr', '', 'www.', 'gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]
I want something like:我想要这样的东西:
[('https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]
How can I do this?我怎样才能做到这一点?
You need to你需要
/
char that invalidates the match of https://
/ http://
since /
appears after http
删除使https://
http://
匹配无效的首字母/
字符,因为/
出现在http
之后{1}
quantifier删除不必要的捕获组和{1}
量词See this Python demo :请参阅此 Python 演示:
import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
print( re.findall(r"(?:https?://)?(?:www\.)?gamivo\.com\S*", frase) )
# => ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']
See the regex demo , too.也请参阅正则表达式演示。 Also, see the related re.findall behaves weird post.另外,请参阅相关的re.findall 行为奇怪的帖子。
Try this, it will take string starting from https to single space or newline.试试这个,它将把字符串从 https 开始到单个空格或换行符。
import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = re.findall('(https?://(?:[^\s]*))', frase)
print(x)
# ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.