简体   繁体   English

如何使用PHP获取URL的子域?

[英]How to get the subdomain of a URL using PHP?

I have some URLs like these: 我有一些像这样的URL:

1.  https://www.example.com/classname/method/arg      // {nothing}
2.  http://www.example.com/classname/method/arg       // {nothing}
3.  https://example.com/classname/method/arg          // {nothing}
4.  http://example.com/classname/method/arg           // {nothing}
5.  www.example.com/classname/method/arg              // {nothing}
6.  example.com/classname/method/arg                  // {nothing}
7.  sub.example.com/classname/method/arg              // sub
8.  www.sub.example.com/classname/method/arg          // sub
9.  http://sub.example.com/classname/method/arg       // sub
10. https://sub.example.com/classname/method/arg      // sub
11. http://www.sub.example.com/classname/method/arg   // sub
12. https://www.sub.example.com/classname/method/arg  // sub

// $url ^                                             // What I want ^

Now, as you see I want to get sobdomain of those URLs. 现在,如您所见,我想获得这些URL的sobdomain。 How? 怎么样?


I have two approaches, but none of them doesn't work for all URLs as well: 我有两种方法,但是它们都不对所有URL都无效:

First: (this just work for 7 ) 首先:( 这仅适用于7

echo array_shift((explode(".",$url)));

Second: (It's better a bit) 第二:( 好一点)

$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
echo $host[0];

Uses the parse_url . 使用parse_url

$url = 'http://sub.example.com/classname/method/arg';

$parsedUrl = parse_url($url);

$host = explode('.', $parsedUrl['host']);

$subdomain = $host[0];
echo $subdomain;

For multiple subdomains you should do like this 对于多个子域,您应该这样做

$url = 'http://en.sub.example.com/classname/method/arg';

$parsedUrl = parse_url($url);

$host = explode('.', $parsedUrl['host']);

$subdomains = array_slice($host, 0, count($host) - 2 );
print_r($subdomains);

You're on the right track using explode() , but you should probably also use the parse_url() function to get the domain from the URL: see here for docs . 您使用explode()正确的轨道上,但您可能还应该使用parse_url()函数从URL中获取域: 有关文档,请参见此处 TL;DR: Give it a URL as its only parameter, and get back an array of all the parts of the URL broken up individually. TL; DR:给它一个URL作为其唯一参数,然后取回URL的所有部分的数组,这些部分分别分解。

That being said, a bigger problem is how you distinguish between subdomain.somesite.com and somesite.co.uk - the first one clearly has a subdomain, but the second one does not. 话虽这么说,一个更大的问题是如何区分subdomain.somesite.com和somesite.co.uk-第一个显然有一个子域,而第二个却没有。 I'm afraid I have no smart solutions to offer for that other than comparing against a list of top-level domains. 恐怕除了与顶级域名列表进行比较以外,我没有其他智能解决方案。

I am going to leave this here... 我要把这个留在这里...

Using @TwoStraws' idea, I created a function which will provide the Sub Domain, Base Domain, and TLD Domain parts from a given URL, using data.iana.org's up to date TLD list. 使用@TwoStraws的想法,我创建了一个函数,该函数将使用data.iana.org的最新TLD列表从给定的URL提供子域,基础域和TLD域部分。

function GetDomainParts($URL,$TLDs_List = 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt') {
    // Get a list of all top level domains
    $TLDs = explode(PHP_EOL,file_get_contents($TLDs_List));
    unset($TLDs[0]); array_values($TLDs);

    // And since that list has all the country codes too, lets assume all 2 letter domains are country codes, and get that list too
    $CC_TLDs = [];
    foreach($TLDs as $TLD) {
        if(strlen($TLD) == 2) {
            $CC_TLDs[] = $TLD;
        }
    }

    // Now lets take our URL and remove some things
    $ParsedUrl = parse_url($URL);
    $Host = explode('.', $ParsedUrl['host']);

    // If we cant find it, we return false...
    $BaseDomain = false;
    $TLDDomain = false;

    // And look at the last 2 items in the Host array, these will be our TLD's (possibly)
    $N_Minus_1 = strtoupper(isset($Host[count($Host)-1])?$Host[count($Host)-1]:null);
    $N_Minus_2 = strtoupper(isset($Host[count($Host)-2])?$Host[count($Host)-2]:null);

    // This has a potential of being our base domain, but may not be there
    $N_Minus_3 = strtoupper(isset($Host[count($Host)-3])?$Host[count($Host)-3]:null);


    // We first check our N Minus 1 against our list of Country Code TLDs
    if(in_array($N_Minus_1,$CC_TLDs)) {
        // If N Minus 1 is in the Country Code, We can check our N Minus 2 and see if it is in the TLDs array
        if(in_array($N_Minus_2,$TLDs)) {
            // If N Minus 2 is in the list of TLDs, we make the assumption that this is part of the TLD, making N Minus 3 our Base Domain
            $BaseDomain = $N_Minus_3;
            $TLDDomain = $N_Minus_2.'.'.$N_Minus_1;

            // We unset the parts that are used, the rest is our sub domain
            unset($Host[count($Host)-1]);
            unset($Host[count($Host)-1]);
            unset($Host[count($Host)-1]);
            $SubDomain = implode('.',$Host);
        } else {
            // If N Minus 2 is NOT in the list of TLDs, we make the assumption that this is our Base Domain
            $BaseDomain = $N_Minus_2;
            $TLDDomain = $N_Minus_1;

            // We unset the parts that are used, the rest is our sub domain
            unset($Host[count($Host)-1]);
            unset($Host[count($Host)-1]);
            $SubDomain = implode('.',$Host);
        }
    } else {
        // If N Minus 1 is NOT in the Country Codes, we can assume it is the TLD, lets check it against the TLDs to make sure
        if(in_array($N_Minus_1,$TLDs)) {
            // If N Minus 1 Is in our List of TLDs, we can assume we found our TLD, so N Minus 2 must be our Base Domain
            $BaseDomain = $N_Minus_2;
            $TLDDomain = $N_Minus_1;

            // We unset the parts that are used, the rest is our sub domain
            unset($Host[count($Host)-1]);
            unset($Host[count($Host)-1]);
            $SubDomain = implode('.',$Host);
        } else {
            // If N Minus 1 is NOT in our list of TLDs it is either a new TLD unheard of by iana.org, or does not exist, lets make the assumption that it is the tld
            $BaseDomain = $N_Minus_2;
            $TLDDomain = $N_Minus_1;

            // We unset the parts that are used, the rest is our sub domain
            unset($Host[count($Host)-1]);
            unset($Host[count($Host)-1]);
            $SubDomain = implode('.',$Host);

            // Not sure if it is needed, but at this point we can swap the checks, checking minus 2 as the country code and minus 1 as the TLD, 
            // but I am not sure this is ever a real world scenerio, and am unable to find any proof to support this theory
        }

    }

    // Return our URL Parts ( DISCLAIMER: Note that this will not solve every URL, such as WWW.AFAMILYCOMPANY.CO, 
    // because both AFAMILYCOMPANY and CO are TLDs one being a TLD and the other being a Country Code, Leaving "WWW" as the Base Domain.
    // I use this functionality to auto-populate a user changeable setting, just in case my assumption is wrong the user can fix it.
    // One should not assume this will work 100% of the time! )
    return [strtolower($SubDomain),strtolower($BaseDomain),strtolower($TLDDomain)];
}

PLEASE READ THE DISCLAIMER... 请阅读免责声明...

Note that this will not solve every URL, such as WWW.AFAMILYCOMPANY.CO, because both AFAMILYCOMPANY and CO are TLDs one being a TLD and the other being a Country Code, Leaving "WWW" as the Base Domain. 请注意,这不会解决所有URL,例如WWW.AFAMILYCOMPANY.CO,因为AFAMILYCOMPANY和CO都是TLD,一个是TLD,另一个是国家/地区代码,而将“ WWW”作为基本域。 I use this functionality to auto-populate a user changeable setting, just in case my assumption is wrong the user can fix it. 我使用此功能来自动填充用户可更改的设置,以防万一我的假设是错误的,用户可以对其进行修复。 One should not assume this will work 100% of the time! 一个人不应该假设这会在100%的时间内起作用!

Further note that, http://whois.domaintools.com/afamilycompany.co is listed as a "Restricted and Reserved Names" domain. 还要注意, http://whois.domaintools.com/afamilycompany.co被列为“受限制和保留名称”域。 If the internet is doing things right, then these type of domains should never be in production anyway, and therefor this function is safe. 如果互联网运行正常,则无论如何这些类型的域都永远不会在生产中,因此此功能是安全的。

A simple way to check if this functionality will work on your domain(s) for sure, is to go to http://data.iana.org/TLD/tlds-alpha-by-domain.txt press Ctrl+F and check if the domain is in the list, if it is, this function will fail, if it is not, this function will work 100% of the time. 确定此功能是否可以在您的域上正常工作的一种简单方法是转到http://data.iana.org/TLD/tlds-alpha-by-domain.txt,按Ctrl + F并检查如果该域在列表中,则该功能将失败,如果不在列表中,则该功能将在100%的时间内运行。 I realize this is only a step in the right direction, so if anyone else can add onto the idea let me know. 我意识到这只是朝着正确方向迈出的一步,因此,如果有其他人可以加入这个想法,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM