简体   繁体   中英

RegEx to remove http://www. if it exists in PHP and JS

Could someone please help me with a regular expression (I need it in php and in js) to remove http:// and www. from the beginning of a url string and remove the trailing / if its there.

For Example

  • http://www.google.com/ would be google.com
  • https://yahoo.com?page=1 would be yahoo.com?page=1
  • fancysite.com/articles/2012/ would be fancysite.com/articles/2012

Heres the code Im using for the JS side:

row.page_href.replace(/^(https?|ftp):\/\//, '')

And heres the code Im using for the php side:

$urlString = rtrim($urlString, '/');
$urlString = preg_replace('~^(?:https?://)?(?:www[.])?~i', '', $urlString);

As you can see the JS regex only removes http:// currently and the php requires two steps to do everything.

function cleanUrl($url)
{
  if (($d= parse_url($url)) !== false) // valid url
  {
    return sprintf('%s%s%s',
      ltrim($d['host'], 'www.'),
      rtrim($d['path']. '/'),
      !empty($d['query']) ? '?'.$d['query'] : '');
  }
  return $url;
}

I would take advantage of parse_url (validate the url along with 'clean' it)

#(https?(://))?(www.?)?(.*)#i

Worked just fine for me. You could change the last (.*) to match the RFC standards of a URL.

Outputs:

david@david-desktop ~ $ php -a
Interactive shell

php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://www.google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'http://google.ca');
php > echo $str . PHP_EOL;
google.ca
php > 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM