preg_match_all 輸出所有帶有類型的 h 標簽

Question

我想為特定頁面（seo 原因）創建一個包含所有 h 標簽的表，並用它們填充一個表。

        $str = file_get_contents($Url);
        if(strlen($str)>0){
            preg_match_all(" /<(h\d*)>(\w[^<]*)/i",$str,$headings);

            foreach ($headings as $val) {
                echo "type: " . $val[1] . "\n";
                echo "content: " . $val[2] . "\n";
            }
        }

目前我只是在回應它們並得到奇怪的結果這是我第一次使用正則表達式所以我認為它可能有問題。

此外，如果有人知道處理數組 preg_match_all 返回的好教程，那就太好了。

Answer 1

使用此方法返回一個帶有標題標簽、它們的類型和實例的關聯數組：

public function getHeadingTags()
{
    preg_match_all( "#<h(\d)[^>]*?>(.*?)<[^>]*?/h\d>#i", 
                    $this->html, 
                    $matches,
                    PREG_PATTERN_ORDER
                  );
    $headings = array();
    foreach ($matches[1] as $key => $heading_key) {
        $headings["h$heading_key"][] = $matches[2][$key];
    }

    ksort($headings);
    return $headings;
}

Answer 2

您的正則表達式已經運行良好。 但是preg_match_all返回通常按匹配組排序的結果數組。 但是，您可以將PREG_SET_ORDER標志作為第四個參數添加到preg_match_all ，這就是您的 foreach 期望它的方式：

preg_match_all("/<(h\d*)>(\w[^<]*)/i",$str,$headings, PREG_SET_ORDER);

順便說一句，如果我們可以假設您正在處理自己的應用程序輸出以添加標題表，那么這是對正則表達式的完全合法使用（並且不太可能失敗）。

Answer 3

也可以使用此方法（用於獲取所有標簽 H）。 我測試了它並且它有效。 因為我自己需要。

$str = file_get_contents($Url);
preg_match_all("|<h+[1-6](.*?)<\/h[1-6]+>|", $str , $matches_h_tag);
 $h_tags = "";
for($i=0; $i <= count($matches_h_tag[0]); $i++){
$h_tags .= $matches_h_tag[0][$i]; 
}
 echo $h_tags;

獲取所有標簽的簡單快速方法（h）

Answer 4

我想了解更多關於正則表達式的知識，你最好買一本好書。 或者只是谷歌的好教程。 我個人喜歡正則regular-expressions.info

有關preg_match_all函數的所有信息都可以在here官方文檔中here 。 PHP 社區通常會在手冊頁上分享一些有用的代碼，我相信您可以在那里找到您想要的任何信息。

php > $ch = curl_init('http://stackoverflow.com/questions/7883392/preg-match-all-output-all-h-tags-with-type');                                              
php > curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); $data = curl_exec($ch);
php > preg_match_all("!<h(\d)[^>]*>(.*?)</h\\1>!ism",$data,$headings);
php > var_export($headings);
array (                     
  0 =>                      
....  
2 =>
  array (
    0 => '<a href="/questions/7883392/preg-match-all-output-all-h-tags-with-type" class="question-hyperlink">preg_match_all output all h tags with type</a>',
    1 => '',
    2 => '
            Know someone who can answer?
            Share a <a href="/q/7883392">link</a> to this question via
            <a href="mailto:?subject=Stack%20Overflow%20Question&amp;body=preg_match_all%20output%20all%20h%20tags%20with%20type%0Ahttp%3a%2f%2fstackoverflow.com%2fq%2f7883392">email</a>,
            <a href="http://twitter.com/share?url=http%3a%2f%2fstackoverflow.com%2fq%2f7883392&amp;text=preg_match_all%20output%20all%20h%20tags%20with%20type">twitter</a>, or
            <a href="http://www.facebook.com/sharer.php?u=http%3a%2f%2fstackoverflow.com%2fq%2f7883392&amp;t=preg_match_all%20output%20all%20h%20tags%20with%20type">facebook</a>.
        ',
    3 => 'Your Answer',
    4 => '
            Browse other questions tagged <a href="/questions/tagged/php" class="post-tag" title="show questions tagged \'php\'" rel="tag">php</a> <a href="/questions/tagged/preg-match-all" class="post-tag" title="show questions tagged \'preg-match-all\'" rel="tag">preg-match-all</a>
                or <a href="/questions/ask">ask your own question</a>.
        ',
    5 => 'Hello World!',
    6 => 'Related',
  ),
)

Answer 5

如果您要解析頁面的整個 HTML 內容，我建議您嘗試使用PHP 的 DomDocument ：

$str = file_get_contents($Url);

$dom = new DomDocument();
$dom->loadHTML($str);           

$hs = array();
for($type=1; $type<6; $type++)
{
  $h_es = $dom->getElementsByTagName('h'.$type);
  foreach($h_es as $h)
  {
    $hs[] = array('type'=>$type, 'content'=>$h->textContent);
  }
}

print_r($hs);

preg_match_all 輸出所有帶有類型的 h 標簽

問題描述

5 個解決方案

解決方案1
2 2012-12-26 21:51:15

解決方案2
2 已采納 2011-10-25 00:56:57

解決方案3
0 2020-02-14 03:33:06

解決方案4
0 2011-10-25 00:43:51

解決方案5
0 2011-10-25 00:55:00

preg_match_all 輸出所有帶有類型的 h 標簽

問題描述

5 個解決方案

解決方案1 2 2012-12-26 21:51:15

解決方案2 2 已采納 2011-10-25 00:56:57

解決方案3 0 2020-02-14 03:33:06

解決方案4 0 2011-10-25 00:43:51

解決方案5 0 2011-10-25 00:55:00

解決方案1
2 2012-12-26 21:51:15

解決方案2
2 已采納 2011-10-25 00:56:57

解決方案3
0 2020-02-14 03:33:06

解決方案4
0 2011-10-25 00:43:51

解決方案5
0 2011-10-25 00:55:00