简体   繁体   中英

How to avoid url with mailto:

I'm working in php and I have created a function that is getting links from a submitted url. The code is working fine, but it is picking even links that are not active like mailto:, , javascript:void(0). How can I avoid picking up a tags whose href are like: href="mailto:"; href="tel:"; href="javascript:"? Thanks you in advance.

function check_all_links($url) {
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents($url));
$linklist = $doc->getElementsByTagName("a");
$title = $doc->getElementsByTagName("title");
$href = array();
$page_url = $full_url = $new_url = "";
$full_url = goodUrl($url);
$scheme = parse_url($url, PHP_URL_SCHEME);

$slash = '/';
$links = array();
$linkNo = array();

if ($scheme == "http") {
    
    foreach ($linklist as $link) {
        $href = strtolower($link->getAttribute('href'));
        $page_url = parse_url($href, PHP_URL_PATH);
        $new_url = $scheme."://".$full_url.'/'.ltrim($page_url, '/');
        //check if href has mailto: or # or javascipt() or tel:
          
        if (strpos($page_url, "tel:") === True) { 
            continue;
        } 
    

        if(!in_array($new_url, $linkNo)) {
           echo $new_url."<br>" ;
           array_push($linkNo, $new_url);
           $links[]  = array('Links' => $new_url );
        }
    }
}else if ($scheme == "https") {
    
    foreach ($linklist as $link) {
        $href = strtolower($link->getAttribute('href'));
        $page_url = parse_url($href, PHP_URL_PATH);
        $new_url = $scheme."://".$full_url.'/'.ltrim($page_url, '/'); 

        if (strpos($page_url, "tel:") === True) { 
            continue; 
        } 
        
        if(!in_array($new_url, $linkNo)) {
           echo $new_url."<br>" ;
           array_push($linkNo, $new_url);
           $links[]  = array('Links' => $new_url );
        }
    }
}

You can use the scheme field from the parse_url function result. Instead of:

if (strpos($page_url, "tel:") === True) { 
        continue;
} 

you can use:

if (isset($page_url["scheme"] && in_array($page_url["scheme"], ["mailto", "tel", "javascript"]) {
    continue;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM