简体   繁体   English

如果正则表达式找不到匹配项,则返回“错误”

[英]Return “Error” if no match found by regex

I have a string: 我有一个字符串:

link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"

And I have a function which return the domain name from that url or if it not found, returns '' : 我有一个函数可以从该URL返回域名,或者如果找不到,则返回''

def get_domain(url):
    domain_regex = re.compile("\:\/\/(.*?)\/|$")
    return re.findall(domain_regex, str(url))[0].replace('www.', '')

get_domain(link)

returned result: 返回结果:

this_is_my_perfect_url.com

|$ returns '' if regex matches nothing. |$如果正则表达式不匹配,则返回''

Is there a way to implement the default value Error inside regex so I do not have to do any check inside the fuction? 有没有办法在正则表达式中实现默认值Error ,所以我不必在函数内部进行任何检查?

So if link = "there_is_no_domain_in_here" then the fuction returns Error instead of '' . 所以,如果link = "there_is_no_domain_in_here"那么机能的研究将返回Error ,而不是''

As mentioned in the comments above, you cannot set anything in regex to do that for you, but you can check if the output returned by re.findall after applying the extra formatting is empty or not, and if it is empty, which means that no matches were found, return Error 正如上面的评论中提到的那样,您无法在regex中进行任何设置来为您执行此操作,但是您可以在应用额外格式后检查re.findall返回的输出是否为空,以及是否为空,这意味着找不到匹配项,返回Error

import re
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"

def get_domain(url):
    domain_regex = re.compile("\:\/\/(.*?)\/|$")

    #Get regex matches into a list after data massaging
    matches = re.findall(domain_regex, str(url))[0].replace('www.', '')

    #Return the match or Error if output is empty
    return matches or 'Error'

print(get_domain(link))
print(get_domain('there_is_no_domain_in_here'))

The output will be 输出将是

this_is_my_perfect_url.com
Error

Just to put my two cents in - the lazy quantifier ( .*? ) in combination with an alternation ( |$ ) is very ineffective. 仅将我的两分钱放入-懒惰的量词( .*? )与交替词( |$ )组合是非常无效的。 You can vastly ameliorate your expression to: 您可以极大地改善您的表情,以:

://[^/]+

Additionally, as of Python 3.8 you could use the walrus operator as in 另外,从Python 3.8您可以像下面一样使用walrus运算符

if (m := re.search("://[^/]+", your_string)) is not None:
    # found sth.
else
    return "Error"

And no - with regular expressions alone you cannot get sth. 不, 仅凭正则表达式无法胜任。 out of a string which is not there in the first place. 从一开始就没有的字符串中删除。

why not use urlparse to get domain? 为什么不使用urlparse获取域?

# env python 2
# import urlparse
# python 3
from urllib.parse import urlparse


def get_domain(url):
    parsed_uri = urlparse(url)
    domain = parsed_uri.netloc
    return (domain, "ERROR")[domain is '']

url = 'there_is_no_domain_in_here'
print(get_domain(url))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM