简体   繁体   中英

Regex to check for validity of what's after the .com

All these years, I used this regEx in javascript as well as php to check for a valid domain name.

Original Version

/^((http|https):\/{2})([w]{3})([\.]{1})([a-zA-Z0-9-]{2,63})([\.]{1})((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|co.in|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i

Changed broken version

I added the last part so it could accept and validate what comes after the .com. But I found out that it somehow breaks the whole thing and anything gets in. How do I get this correct?

/^((http|https):\/{2})([w]{3})([\.]{1})([a-zA-Z0-9-]{2,63})([\.]{1})((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|co.in|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])([-A-Za-z0-9+&@#\/%=~_|:.]{0,51})$/i

The RegEx works fine. It's only the last part I added that seems to be causing problems ([-A-Za-z0-9+&@#\\/%=~_|:.]{0,51})

What I'm trying to do here, is validate the part after the .com. For example, the part after the .com for this question is questions/20217720/regex-to-check-for-validity-of-whats-after-the-com . That's the part I'm trying to validate. But now the tlds do not validate.

Example: http://www.example.com should validate to true

http://www.example.com/ should also validate to true

http://www.example.com/mail should validate to true

http://www.example.comxx should validate to false

http://www.example.comxx/mail should validate to false

Doe this fit your needs:

(\/[-A-Za-z0-9+&@#\/%=~_|:.]{0,50})?

The whole group is optional, but if anything appears after the TLD then it requires a / to be the first character (reduced 51 to 50 to compensate).

The full regex:

/^((http|https):\/{2})([w]{3})([\.]{1})([a-zA-Z0-9-]{2,63})([\.]{1})((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|co.in|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])(\/[-A-Za-z0-9+&@#\/%=~_|:.]{0,50})?$/i

RegExr Example

For PHP, you could use parse_url ( documentation ) as an alternative.

<?php
    $info = parse_url($url);

    // is .com domain
    if(end(explode('.', $info['host'])) == "com"){
        $behinddotcom = $info['path'] . '?' . $info['query'];
    }
?>

What comes after the TLD is a path/filename. Unless you have any special cases or rules to adhere too there is no need to validate this.

If you just need to extract it this is a simple matter. In eg JavaScript you would do

window.location.pathname // returns "/questions/20217720/regex-to-check-for-validity-of-whats-after-the-com"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM