Could someone please help me with a regular expression (I need it in php and in js) to remove http:// and www. from the beginning of a url string and remove the trailing / if its there.
For Example
http://www.google.com/
would be google.com
https://yahoo.com?page=1
would be yahoo.com?page=1
fancysite.com/articles/2012/
would be fancysite.com/articles/2012
Heres the code Im using for the JS side:
row.page_href.replace(/^(https?|ftp):\/\//, '')
And heres the code Im using for the php side:
$urlString = rtrim($urlString, '/');
$urlString = preg_replace('~^(?:https?://)?(?:www[.])?~i', '', $urlString);
As you can see the JS regex only removes http:// currently and the php requires two steps to do everything.
function cleanUrl($url)
{
if (($d= parse_url($url)) !== false) // valid url
{
return sprintf('%s%s%s',
ltrim($d['host'], 'www.'),
rtrim($d['path']. '/'),
!empty($d['query']) ? '?'.$d['query'] : '');
}
return $url;
}
I would take advantage of parse_url (validate the url along with 'clean' it)
#(https?(://))?(www.?)?(.*)#i
Worked just fine for me. You could change the last (.*)
to match the RFC standards of a URL.
Outputs:
david@david-desktop ~ $ php -a
Interactive shell
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://www.google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'http://google.ca');
php > echo $str . PHP_EOL;
google.ca
php >
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.