简体   繁体   English

使用可验证URL,IPv4地址,IPv6地址和FQDN的正则表达式编写C函数

[英]Writing a C function using regular expression that can validate URL, IPv4 address, IPv6 address and FQDN

While the below C function does a good job to validate any combination of URL/FQDN but it fails to validate IPv4 addresses and Shorthand notation of IPv6 and certain other IPv6 format addresses. 尽管下面的C函数可以很好地验证URL / FQDN的任何组合,但是它无法验证IPv4地址以及IPv6和某些其他IPv6格式地址的简写形式。

Can the below regex be improvised to validate IPv4 addresses and IPv6 addresses? 可以立即使用以下正则表达式来验证IPv4地址和IPv6地址吗?

int validateURLPhase2(char *url)
{
    int    status;
    regex_t    re;

    char *regexp = "^((ftp|http|https)://)?([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)|([0-9].[0-9].[0-9].[0-9])|(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$";

    if ( regcomp(&re, regexp, REG_EXTENDED|REG_NOSUB|REG_ICASE) != 0 )
    {
        printf( "Regex has invalidated FQDN 1\n");
        return -1;
    }
    status = regexec(&re, url, (size_t) 0, NULL, 0);
    regfree(&re);
    if ( status != 0 )
    {
        printf("Regex has invalidated FQDN 2\n");
        return -1;
    }
    return 0;
}

Valid URL format that ideally should be accepted but was failed: http://[2001::1]/abc Regex has invalidated FQDN 2 validation failed 理想情况下应接受但有效的有效URL格式: http:// [2001 :: 1] / abc正则表达式使FQDN 2验证无效

Invalid URL format that ideally should be rejected but was success: http://10.192.1 validation success 无效的URL格式,理想情况下应被拒绝,但可以成功: http://10.192.1验证成功

Other cases passed: http://10.2.1.1/abc http://www.example.com/abc 其他通过的案例: http : //10.2.1.1/abc http://www.example.com/abc

The part of your regexp that matches numeric addresses only allows a single digit in each component. 正则表达式中与数字地址匹配的部分在每个组件中仅允许一个数字。 It also doesn't escape the . 它也无法逃脱. , so it's matching anything. ,因此它可以匹配任何内容。 It should be: 它应该是:

([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3})

Note that this will allow invalid IPs like 123.456.789.0 . 请注意,这将允许使用123.456.789.0类的无效IP。 It just checks that each number is 1-3 digits, not that it's between 0 and 255 . 它只是检查每个数字是1-3位数字,而不是在0255之间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM