简体   繁体   English

re.findall -> Python 中的正则表达式

[英]re.findall -> RegEx in Python

import regex
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = regex.findall(r"/((http[s]?:\/\/)?(www\.)?(gamivo\.com\S*){1})", frase) 
print(x)

Result:结果:

[('www.gamivo.com/product/sea-of-thieves-pc-xbox-one', '', 'www.', 'gamivo.com/product/sea-of-thieves-pc-xbox-one'), ('www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr', '', 'www.', 'gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

I want something like:我想要这样的东西:

[('https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

How can I do this?我怎样才能做到这一点?

You need to你需要

  1. Remove the initial / char that invalidates the match of https:// / http:// since / appears after http删除使https:// http://匹配无效的首字母/字符,因为/出现在http之后
  2. Remove unnecessary capturing group and {1} quantifier删除不必要的捕获组和{1}量词
  3. Convert the optional capturing group into a non-capturing one.将可选的捕获组转换为非捕获组。

See this Python demo :请参阅此 Python 演示

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
print( re.findall(r"(?:https?://)?(?:www\.)?gamivo\.com\S*", frase) )
# => ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

See the regex demo , too.也请参阅正则表达式演示 Also, see the related re.findall behaves weird post.另外,请参阅相关的re.findall 行为奇怪的帖子。

Try this, it will take string starting from https to single space or newline.试试这个,它将把字符串从 https 开始到单个空格或换行符。

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = re.findall('(https?://(?:[^\s]*))', frase)
print(x)
# ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM