简体   繁体   English

RegEX以使用PHP匹配所有标签之间的匹配

[英]RegEX to match all between tags with PHP

So I'm writing a script which will take everything between two div tags, the way I have it seemed to be working but it's not matching everything I noticed, I'm not sure why, whether it's because of line breaks, or any other issue. 因此,我正在编写一个脚本,该脚本将所有两个div标签之间的所有内容都按照我的方式工作,但它与我注意到的所有内容都不匹配,我不确定为什么,是否是由于换行或其他原因问题。 I want literally everything (including other html tags) matched. 我希望字面上的所有内容(包括其他html标签)都匹配。

     $aPost = preg_match_all('#<div class="posttext">(.*?)</div>#', $rThread, $aPosts);

It appears as if it's only matching whatever is written on one line with no line breaks, and if the div doesn't meet that criteria is ignores it entirely. 看起来好像只匹配一行中写的任何内容而没有换行符,并且如果div不满足该条件,则将其完全忽略。

To fix your regex, use the dotall modifier which forces the . 要修复您的正则表达式,请使用dotall修饰符来强制. to match newline sequences: 匹配换行序列:

preg_match_all('~<div class="posttext">(.*?)</div>~si', $rThread, $aPosts);

But, I would avoid using regex and make effective use of DOM and XPath to do this for you. 但是,我将避免使用正则表达式,并有效地使用DOMXPath为您完成此操作。

$doc = new DOMDocument;
@$doc->loadHTML($html); // load the HTML data

$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[@class="posttext"]');

foreach ($nodes as $node) {
   echo $node->nodeValue, "\n";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM