We a list of URL's in this format ( http://www.xyz.gov.ac.in
). Not all of them look like this, some of them have normal domains. I am confused on how to get the domain name from a 3 dotted url. The code we have is working fine for 2 dotted domain names. Here is the code we have:
function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}
echo get_domain($url) ;
How can we modify the above code to accommodate for 3 dotted domains as well as the other types?
The echo results should be in this format xyz.gov.ac.in
Basically, you can't. At least not without a lookup table that has all "TLDs".
For example, in my country (The Netherlands) we have .nl
and .co.nl
. But www.gov.nl
is a normal website (I'm trying to illustrate that you can't automatically say that gov.
isn't a domain). And www.edu.nl
doesn't exist.
Any standard regex that would try to parse them would tell you that the domain is www.gov.nl
, while the domain is actually gov.nl
. Same for edu.nl
.
The only way you can accomplish what you want is by getting a list of all TLDs (and sub-TLDs) and using that to parse them.
I believe that Firefox and Chrome have such a list implemented (for coloring the domain name in the URL) and constantly keep it up-to-date. Maybe look in those sources?
Try this:
/(^[\w|-]+\.)(?P<domain>([\w|-]+\.)+(\w+))/i
Hope this will help..
您应该可以改用此Regex
/(?P<domain>([a-z0-9][a-z0-9\-]{1,63}\.)+[a-z\.]{2,6})$/i
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.