Regex for Domains having three dots ex:- “gov.ac.in”

Question

We a list of URL's in this format ( http://www.xyz.gov.ac.in ). Not all of them look like this, some of them have normal domains. I am confused on how to get the domain name from a 3 dotted url. The code we have is working fine for 2 dotted domain names. Here is the code we have:

function get_domain($url)
{
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}

echo get_domain($url) ;

How can we modify the above code to accommodate for 3 dotted domains as well as the other types?

The echo results should be in this format xyz.gov.ac.in

Answer 1

Basically, you can't. At least not without a lookup table that has all "TLDs".

For example, in my country (The Netherlands) we have .nl and .co.nl . But www.gov.nl is a normal website (I'm trying to illustrate that you can't automatically say that gov. isn't a domain). And www.edu.nl doesn't exist.

Any standard regex that would try to parse them would tell you that the domain is www.gov.nl , while the domain is actually gov.nl . Same for edu.nl .

The only way you can accomplish what you want is by getting a list of all TLDs (and sub-TLDs) and using that to parse them.

I believe that Firefox and Chrome have such a list implemented (for coloring the domain name in the URL) and constantly keep it up-to-date. Maybe look in those sources?

Answer 2

Try this:

/(^[\w|-]+\.)(?P<domain>([\w|-]+\.)+(\w+))/i

Hope this will help..

Answer 3

您应该可以改用此Regex

/(?P<domain>([a-z0-9][a-z0-9\-]{1,63}\.)+[a-z\.]{2,6})$/i

Regex for Domains having three dots ex:- “gov.ac.in”

Question

3 answers

solution1
1 2012-04-23 13:07:52

solution2
0 ACCPTED 2012-04-23 12:03:04

solution3
0 2012-04-23 12:07:12

Regex for Domains having three dots ex:- “gov.ac.in”

Question

3 answers

solution1 1 2012-04-23 13:07:52

solution2 0 ACCPTED 2012-04-23 12:03:04

solution3 0 2012-04-23 12:07:12

solution1
1 2012-04-23 13:07:52

solution2
0 ACCPTED 2012-04-23 12:03:04

solution3
0 2012-04-23 12:07:12