简体   繁体   中英

How to get meta tags in php?

我正在尝试导出以下url元标记,但无法正常工作,给出以下结果警告:get_meta_tags( https://www.washingtonpost.com/politics/white-house-reels-as-fbi-director-contradicts-official -claims-about-alleged-abuser / 2018/02/13 / f010f256-10d9-11e8-9570-29c9830535e5_story.html?tid = pm_pop ):无法打开流:已达到重定向限制,正在中止。对此有任何想法吗?

For start you need to make a call to the 1st page to set the cookie else it's not going to work

$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,FALSE);
curl_setopt($ch,CURLOPT_URL,"https://www.washingtonpost.com");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
$cookieName = "";
if(isset($_COOKIE['PHPSESSID'])){
    $cookieName = $_COOKIE['PHPSESSID'];
}
curl_setopt( $ch, CURLOPT_COOKIEJAR, $_SERVER['DOCUMENT_ROOT'].'/logs/'.$cookieName.'.txt'); 
curl_setopt( $ch, CURLOPT_COOKIEFILE, $_SERVER['DOCUMENT_ROOT'].'/logs/'.$cookieName.'.txt');
curl_exec($ch);
curl_close($ch);

then a second call to get the actual page

$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,FALSE);
curl_setopt($ch,CURLOPT_URL,"https://www.washingtonpost.com/politics/white-house-reels-as-fbi-director-contradicts-official-claims-about-alleged-abuser/2018/02/13/f010f256-10d9-11e8-9570-29c9830535e5_story.html?tid=pm_pop");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
$cookieName = "";
if(isset($_COOKIE['PHPSESSID'])){
    $cookieName = $_COOKIE['PHPSESSID'];
}
curl_setopt( $ch, CURLOPT_COOKIEJAR, LOG_DIR.'/'.$cookieName.'.txt');
curl_setopt( $ch, CURLOPT_COOKIEFILE, LOG_DIR.'/'.$cookieName.'.txt');
$page = curl_exec($ch);
curl_close($ch);

and finaly with DOMDocument we parse the dom tree

libxml_use_internal_errors(true);
$siteData = new DOMDocument();
$siteData->loadHTML($page);

$metaElements = $siteData->getElementsByTagName("meta");
if($metaElements->item(0)==null){
    echo "ERROR";
}

$meta = array();
for($i=0;$i<$metaElements->length;$i++){
    $meta[$i] = array();
    for($j=0;$j<$metaElements->item($i)->attributes->length;$j++){
        $meta[$i][$j] = array($metaElements->item($i)->attributes->item($j)->name,$metaElements->item($i)->attributes->item($j)->value);
    }
}
print_r($meta);

meta are stored in the $meta array

you can beautify this code by organizing curl to function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM