简体   繁体   中英

StackOverflow Style A Href Auto Linking in Regex

I am using the below function to search for text links and convert them to a hyperlink. First of all is it correct? It appears to work but do you know of a (perhaps malformed) url that would break this function?

My question is whether it is possible to get this to support port numbers as well, for example stackoverflow.com:80/index will not be converted as the port is not seen as a valid part of the url.

So in summary I am looking for Stackoverflow style url recognition, which I believe is a custom addition to Markdown.

  /**
   * Search for and create links from urls
   */
  static public function autoLink($text) {
    $pattern = "/(((http[s]?:\/\/)|(www\.))(([a-z][-a-z0-9]+\.)?[a-z][-a-z0-9]+\.[a-z]+(\.[a-z]{2,2})?)\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1})/is";
    $text = preg_replace($pattern, " <a href='$1'>$1</a>", $text);
    // fix URLs without protocols
    $text = preg_replace("/href='www/", "href='http://www", $text);

    return $text;
  } 

Thanks for your time,

You should also look at the answers to this question: How to mimic StackOverflow Auto-Link Behavior


I have ended up combining the answers I have got both at stack overflow and talking to colleagues. The below code is the best we could come up with.

/**
   * Search for and create links from urls
   */
  static public function autoLink($text) {
    $pattern = "/\b((?P<protocol>(https?)|(ftp)):\/\/)?(?P<domain>[-A-Z0-9\\.]+)[.][A-Z]{2,7}(([:])?([0-9]+)?)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,\\.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,\\.;]*)?/ise";
$text = preg_replace($pattern, "' <a href=\"'.htmlspecialchars('$0').'\">$0</a>'", $text);

    // fix URLs without protocols
    $text = preg_replace("#href='www#i", "href='http://www", $text);
    $text = preg_replace("#href=['\"](?!(https?|ftp)://)#i", "href='http://", $text);

    return $text;
  } 

Rather than writing your own autolinking routine, which is essentially the beginning of a custom markup engine, you might want to use an open source markup engine, as it is less likely to be vulnerable to cross-site scripting attacks. One example of an open source markup engine for PHP is PHP Markdown , which has the ability to autolink URLs and essentially uses the same Markdown syntax that is in use at Stack Overflow.

One note: you should always escape HTML special characters using htmlspecialchars() before sticking the text into attributes or in the inner text of elements.

$pattern = "/\b(?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(([:])?([0-9]+)?)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?/i";

will match:

http://www.scroogle.org/index.html

http://www.scroogle.org:80/index.html?source=library

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM