简体   繁体   English

如何使用preg_match_all获取html标签内容

[英]how to grab a html tag content with preg_match_all

i have some html codes, which contains these : 我有一些html代码,其中包含这些代码:

<table class="qprintable2" width="100%" cellpadding="4" cellspacing="0" border="0">
content goes here !
</table>

i have this function to match the tag inside 我有这个功能来匹配里面的标签

function getTextBetweenTags($string, $tagname)
{
  $pattern = "/<table class=\"class1\" width=\"100%\" cellpadding=\"4\" cellspacing=\"0\" border=\"0\">(.*?)<\/$tagname>/"; 
  preg_match_all($pattern, $string, $matches);
  return $matches[1];
}

but it doesn't have, so i will be highly appreciate if you can give me a good pattern for this :( 但是它没有,所以如果您能给我一个好的模式,我将不胜感激:(

You should avoid this, but you can use a regex like: 您应该避免这种情况,但是可以使用如下正则表达式:

preg_match('#<table[^>]+>(.+?)</table>#ims', $str);

The various tricks here are: 这里的各种技巧是:

  • /ims modifier so that "." /ims修饰符,使“。” also matches newlines, case-insensitive, multiline options (^ and $) 还匹配换行符,不区分大小写的多行选项(^和$)
  • using # instead of / for enclosing the regex, so you don't have to escape html closing tags 使用#而不是/来封闭正则表达式,因此您不必转义html结束标记
  • using [^>]+ to make it unspecific and avoid listing individual html attributes (more reliable) 使用[^>]+使其不确定,并避免列出单个html属性(更可靠)

While this is a case where regexs would work okayish, the general consensus is that you should use QueryPath or phpQuery (or alike) to extract html. 虽然在这种情况下,正则表达式可以正常工作,但通常的共识是您应该使用QueryPath或phpQuery(或类似方式)提取html。 It's also mucho simpler: 它也更简单:

qp($html)->find("table")->text();  //would return just the text content

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM