简体   繁体   English

用于检查URL是否为短URL的正则表达式/ php代码

[英]Regex / php code to check if a URL is a short URL

I am attempting to create a php function which will check if the passes URL is a short URL. 我正在尝试创建一个php函数,它将检查pass URL是否是一个短URL。 Something like this: 像这样的东西:

/**
 * Check if a URL is a short URL
 *
 * @param string $url
 * return bool
 */
function _is_short_url($url){
    // Code goes here
}

I know that a simpler and a sure shot way would be to check a 301 redirect, but this function aims at saving an external request just for checking. 我知道更简单,更确定的方法是检查301重定向,但此功能旨在保存外部请求,仅用于检查。 Neither should the function check against a list of URL shortners as that would be a less scale-able approach. 该功能也不应该检查URL缩短器列表,因为这将是一种不太可扩展的方法。

So are a few possible checks I was thinking: 我想的是一些可能的检查:

  1. Overall URL length - May be a max of 30 charecters 总URL长度 - 最多可以包含30个字符
  2. URL length after last '/' - May be a max of 10 characters 最后一次'/'后的URL长度 - 最多可包含10个字符
  3. Number of '/' after protocol (http://) - Max 2 协议后的'/'数量(http://) - 最大2
  4. Max length of host 主机的最大长度

Any thoughts on a possible approach or a more exhaustive checklist for this? 有关可能的方法或更详尽的清单的任何想法吗?

EDIT: This function is just an attempt to save an external request, so its ok to return true for a non-short url (but a real short one). 编辑:此函数只是尝试保存外部请求,因此可以为非短网址(但真正的短网址)返回true。 Post passing through this function, I would anyways expand all short URLs by checking 301 redirects. 发布通过此功能后,我会通过检查301重定向来扩展所有短URL。 This is just to eliminate the obvious ones. 这只是为了消除明显的问题。

I would not recommend to use regex, as it will be too complex and difficult to understand. 我不建议使用正则表达式,因为它太复杂且难以理解。 Here is a PHP code to check all your constraints: 这是一个用于检查所有约束的PHP代码:

function _is_short_url($url){
        // 1. Overall URL length - May be a max of 30 charecters
        if (strlen($url) > 30) return false;

        $parts = parse_url($url);

        // No query string & no fragment
        if ($parts["query"] || $parts["fragment"]) return false;

        $path = $parts["path"];
        $pathParts = explode("/", $path);

        // 3. Number of '/' after protocol (http://) - Max 2
        if (count($pathParts) > 2) return false;

        // 2. URL length after last '/' - May be a max of 10 characters
        $lastPath = array_pop($pathParts);
        if (strlen($lastPath) > 10) return false;

        // 4. Max length of host
        if (strlen($parts["host"]) > 10) return false;

        return true;
}

Here is a small function which checks for all your requirements. 这是一个小功能,可以检查您的所有要求。 I was able to check it without using a complex regex,... only preg_split. 我能够在不使用复杂的正则表达式的情况下检查它,...只有preg_split。 You should adapt it yourself easily. 你应该轻松​​自己调整它。

<?php

var_dump(_isShortUrl('http://bit.ly/foo'));

function _isShortUrl($url)
{
    // Check for max URL length (30)
    if (strlen($url) > 30) {
        return false;
    }

    // Check, if there are more than two URL parts/slashes (5 splitted values)
    $parts = preg_split('/\//', $url);
    if (count($parts) > 5) {
        return false;
    }

    // Check for max host length (10)
    $host = $parts[2];
    if (strlen($host) > 10) {
        return false;
    }

    // Check for max length of last URL part (after last slash)
    $lastPart = array_pop($parts);
    if (strlen($lastPart) > 10) {
        return false;
    }

    return true;
}

Why not check if the host matches a known URL shortener. 为什么不检查主机是否与已知的URL缩短器匹配。 You cold get a list of most common url shorteners for example here . 你可以在这里找到最常见的url缩短器列表。

If I was you I would test if the url shows a 301 redirect, and then test if the redirect redirects to another website: 如果我是你,我会测试网址是否显示301重定向,然后测试重定向是否重定向到另一个网站:

function _is_short_url($url) {
   $options['http']['method'] = 'HEAD';
   stream_context_set_default($options); # don't fetch the full page
   $headers = get_headers($url,1);
   if ( isset($headers[0]) ) {
     if (strpos($headers[0],'301')!==false && isset($headers['Location'])) {
       $location = $headers['Location'];
       $url = parse_url($url);
       $location = parse_url($location);
       if ($url['host'] != $location['host'])
         return true;
     }
   }

   return false;
}

echo (int)_is_short_url('http://bit.ly/1GoNYa');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM