最快的方式來檢索 <title>用PHP

Question

我正在做一個書簽系統，並尋找用PHP檢索頁面標題的最快（最簡單）方法。

有一些像$title = page_title($url)這樣的東西會很不錯

Answer 1

<?php
    function page_title($url) {
        $fp = file_get_contents($url);
        if (!$fp) 
            return null;

        $res = preg_match("/<title>(.*)<\/title>/siU", $fp, $title_matches);
        if (!$res) 
            return null; 

        // Clean up title: remove EOL's and excessive whitespace.
        $title = preg_replace('/\s+/', ' ', $title_matches[1]);
        $title = trim($title);
        return $title;
    }
?>

給出了以下輸入的動力：

print page_title("http://www.google.com/");

輸出：谷歌

希望一般足以滿足您的使用需求。 如果您需要更強大的功能，那么花一點時間研究HTML解析器可能不會有什么壞處。

編輯：添加了一些錯誤檢查。 有點沖出第一個版本，對不起。

Answer 2

沒有reg表達式你可以得到它：

$title = '';
$dom = new DOMDocument();

if($dom->loadHTMLFile($urlpage)) {
    $list = $dom->getElementsByTagName("title");
    if ($list->length > 0) {
        $title = $list->item(0)->textContent;
    }
}

Answer 3

或者使這個簡單的功能稍微更具防彈性：

function page_title($url) {

    $page = file_get_contents($url);

    if (!$page) return null;

    $matches = array();

    if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) {
        return $matches[1];
    } else {
        return null;
    }
}


echo page_title('http://google.com');

Answer 4

正則表達式？

使用cURL獲取$ htmlSource變量的內容。

preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches);

print_r($titleMatches);

看看你在那個數組中有什么。

大多數人說HTML遍歷雖然你應該使用解析器，因為正則表達式可能不可靠。

其他答案提供更多細節:)

Answer 5

我也在做一個書簽系統，發現從PHP 5開始，你可以使用stream_get_line加載遠程頁面，直到關閉的標題標簽（而不是加載整個文件），然后在explode開始標題標簽之前刪除它們（而不是正則表達式）。

function page_title($url) {
  $title = false;
  if ($handle = fopen($url, "r"))  {
    $string = stream_get_line($handle, 0, "</title>");
    fclose($handle);
    $string = (explode("<title", $string))[1];
    if (!empty($string)) {
      $title = trim((explode(">", $string))[1]);
    }
  }
  return $title;
}

最后explode感謝PlugTrade的回答誰提醒我標題標簽可以有屬性。

Answer 6

我喜歡使用帶有正則表達式的SimpleXml，這是我用來從我創建的OpenID庫中的頁面中獲取多個鏈接頭的解決方案。 我已經改編它以使用標題（即使通常只有一個）。

function getTitle($sFile)
{
    $sData = file_get_contents($sFile);

    if(preg_match('/<head.[^>]*>.*<\/head>/is', $sData, $aHead))
    {   
        $sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower('<$1>'), $aHead[0]);
        $xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml));

        return (string)$xTitle->head->title;
    }
    return null;
}

echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');

具有諷刺意味的是，這個頁面在標題標簽中有一個“標題標簽”，這有時會導致純正則表達式解決方案出現問題。

這個解決方案並不完美，因為如果格式化/大小寫很重要（例如XML），它可能會導致嵌套標記出現問題的小寫標簽，但是有一些方法可以解決這個問題。

Answer 7

用於處理添加了屬性的標題標記的函數

function get_title($html)
{
    preg_match("/<title(.+)<\/title>/siU", $html, $matches);
    if( !empty( $matches[1] ) ) 
    {
        $title = $matches[1];

        if( strstr($title, '>') )
        {
            $title = explode( '>', $title, 2 );
            $title = $title[1];

            return trim($title);
        }   
    }
}

$html = '<tiTle class="aunt">jemima</tiTLE>';
$title = get_title($html);
echo $title;

最快的方式來檢索 <title>用PHP

問題描述

7 個解決方案

解決方案1
46 已采納 2008-12-30 02:15:34

解決方案2
15 2015-05-29 07:25:26

解決方案3
9 2008-12-30 02:23:51

解決方案4
5 2008-12-30 02:07:04

解決方案5
4 2019-02-08 15:14:43

解決方案6
1 2008-12-31 08:09:28

解決方案7
1 2018-03-24 22:05:14

最快的方式來檢索 <title>用PHP

問題描述

7 個解決方案

解決方案1 46 已采納 2008-12-30 02:15:34

解決方案2 15 2015-05-29 07:25:26

解決方案3 9 2008-12-30 02:23:51

解決方案4 5 2008-12-30 02:07:04

解決方案5 4 2019-02-08 15:14:43

解決方案6 1 2008-12-31 08:09:28

解決方案7 1 2018-03-24 22:05:14

解決方案1
46 已采納 2008-12-30 02:15:34

解決方案2
15 2015-05-29 07:25:26

解決方案3
9 2008-12-30 02:23:51

解決方案4
5 2008-12-30 02:07:04

解決方案5
4 2019-02-08 15:14:43

解決方案6
1 2008-12-31 08:09:28

解決方案7
1 2018-03-24 22:05:14