简体   繁体   中英

Regex for Domains having three dots ex:- “gov.ac.in”

We a list of URL's in this format ( http://www.xyz.gov.ac.in ). Not all of them look like this, some of them have normal domains. I am confused on how to get the domain name from a 3 dotted url. The code we have is working fine for 2 dotted domain names. Here is the code we have:

function get_domain($url)
{
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}

echo get_domain($url) ;

How can we modify the above code to accommodate for 3 dotted domains as well as the other types?

The echo results should be in this format xyz.gov.ac.in

Basically, you can't. At least not without a lookup table that has all "TLDs".

For example, in my country (The Netherlands) we have .nl and .co.nl . But www.gov.nl is a normal website (I'm trying to illustrate that you can't automatically say that gov. isn't a domain). And www.edu.nl doesn't exist.

Any standard regex that would try to parse them would tell you that the domain is www.gov.nl , while the domain is actually gov.nl . Same for edu.nl .

The only way you can accomplish what you want is by getting a list of all TLDs (and sub-TLDs) and using that to parse them.

I believe that Firefox and Chrome have such a list implemented (for coloring the domain name in the URL) and constantly keep it up-to-date. Maybe look in those sources?

Try this:

/(^[\w|-]+\.)(?P<domain>([\w|-]+\.)+(\w+))/i    

Hope this will help..

您应该可以改用此Regex

/(?P<domain>([a-z0-9][a-z0-9\-]{1,63}\.)+[a-z\.]{2,6})$/i

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM