All these years, I used this regEx in javascript as well as php to check for a valid domain name.
Original Version
/^((http|https):\/{2})([w]{3})([\.]{1})([a-zA-Z0-9-]{2,63})([\.]{1})((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|co.in|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i
Changed broken version
I added the last part so it could accept and validate what comes after the .com. But I found out that it somehow breaks the whole thing and anything gets in. How do I get this correct?
/^((http|https):\/{2})([w]{3})([\.]{1})([a-zA-Z0-9-]{2,63})([\.]{1})((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|co.in|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])([-A-Za-z0-9+&@#\/%=~_|:.]{0,51})$/i
The RegEx works fine. It's only the last part I added that seems to be causing problems ([-A-Za-z0-9+&@#\\/%=~_|:.]{0,51})
What I'm trying to do here, is validate the part after the .com. For example, the part after the .com for this question is questions/20217720/regex-to-check-for-validity-of-whats-after-the-com
. That's the part I'm trying to validate. But now the tlds do not validate.
Example: http://www.example.com should validate to true
http://www.example.com/ should also validate to true
http://www.example.com/mail should validate to true
http://www.example.comxx should validate to false
http://www.example.comxx/mail should validate to false
Doe this fit your needs:
(\/[-A-Za-z0-9+&@#\/%=~_|:.]{0,50})?
The whole group is optional, but if anything appears after the TLD then it requires a /
to be the first character (reduced 51 to 50 to compensate).
The full regex:
/^((http|https):\/{2})([w]{3})([\.]{1})([a-zA-Z0-9-]{2,63})([\.]{1})((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|co.in|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])(\/[-A-Za-z0-9+&@#\/%=~_|:.]{0,50})?$/i
For PHP, you could use parse_url
( documentation ) as an alternative.
<?php
$info = parse_url($url);
// is .com domain
if(end(explode('.', $info['host'])) == "com"){
$behinddotcom = $info['path'] . '?' . $info['query'];
}
?>
What comes after the TLD is a path/filename. Unless you have any special cases or rules to adhere too there is no need to validate this.
If you just need to extract it this is a simple matter. In eg JavaScript you would do
window.location.pathname // returns "/questions/20217720/regex-to-check-for-validity-of-whats-after-the-com"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.