简体   繁体   English

使用正则表达式搜索html字符串并在PHP中存储在数组中

[英]Search html string using regular expression and store in array in PHP

I need to search string which can be something like this: 我需要搜索字符串,可以是这样的:

<div class="icon_star">&nbsp;</div>

or 要么

<div class="icon_star"></div>

or 要么

<div class="icon_star"> </div>

I need to search above strings in HTML which could be something like this: 我需要在HTML中搜索上面的字符串,这可能是这样的:

<h1 class="redword" tag="h1">
   <span class="BASE">good</span>
</h1>
<span class="headword-definition">&#160;-&#160;definition</span>
</span>
<div class="icon_star"></div>
<!-- End of DIV icon_star-->

<div class="icon_star"></div>
<!-- End of DIV icon_star-->

<div class="icon_star"></div>
<!-- End of DIV icon_star-->

</div><!-- End of DIV -->

<div class="headbar">
   <div id="helplinks-box" class="responsive_hide_on_smartphone">  

String which we are trying to search and store in array can be multiple times 我们试图在数组中搜索和存储的字符串可以多次

I have tried using the following regex: 我试过使用以下正则表达式:

preg_match_all ('/<div(\s)+class="icon_star">(.*?)<\/div>/i', $html1, $result_array1);

This above regex does not work when HTML to be searched is 当要搜索的HTML时,上面的正则表达式不起作用

<div id="headword">
    <div id="headwordright">
        <div style="display: none;" id="showmore"><a class="button" onmousedown="foldingSet(false)"><span class="label">Show more</span></a>
        </div><!-- End of DIV -->
        <div id="showless"><a class="button" onmousedown="foldingSet(true)"><span class="label">Show less</span></a>
        </div><!-- End of DIV -->
    </div><!-- End of DIV -->
    <span class="BASE-FORM">
        <h1 tag="h1" class="redword"><span class="BASE">scenario</span></h1>
        <span class="headword-definition">&nbsp;-&nbsp;definition</span>
    </span>
    <div class="icon_star">&nbsp;</div><!-- End of DIV icon_star-->
</div>

Update 更新

It seems that you are reading your regexp results wrong way. 看来你正在以错误的方式阅读你的正则表达式结果。 Executing 执行

preg_match_all('/<div(\s)+class="icon_star">.*?<\/div>/i', $html, $result_array1);

for($x = 0; $x < count($result_array1); $x++)
    $result_array1[$x] = array_map('htmlentities', $result_array1[$x]);

echo '<pre>' . print_r($result_array1, 1);

prints out 打印出来

   Array
   (
       [0] => Array
       (
           [0] => <div class="icon_star">&nbsp;</div>
       )

       [1] => Array
       (
           [0] =>  
       )

   )   

so you should be checking count of $result_array1[0] instead of $result_array1 所以你应该检查$result_array1[0]而不是$result_array1计数

side note 边注

instead of parsing HTML with regex, you could use DOMDocument class built into PHP, if you can. 而不是用正则表达式解析HTML,你可以使用PHP内置的DOMDocument类,如果可以的话。
Using following code extracts three div's. 使用以下代码提取三个div。

Be aware that you need to have valid HTML for this method to work. 请注意,您需要使用有效的HTML才能使用此方法。

  //your HTML with tag added to make it valid
  $html = '<div>
     <h1 class="redword" tag="h1">
        <span class="BASE">good</span>
     </h1>
     <span class="headword-definition"><span>&#160;-&#160;definition</span></span>
     <div class="icon_star"></div>
     <div class="icon_star"></div>
     <div class="icon_star"></div>
  </div>
  <div class="headbar">
     <div id="helplinks-box" class="responsive_hide_on_smartphone">
     </div>
  </div>';

  $dom = new DOMDocument();
  @$dom->loadHTML($html);
  $x = new DOMXPath($dom);

  //this xpath query looks for all nodes that have "class" attribute value equal to "icon_star"
  $nodes = $x->query("//*[contains(@class, 'icon_star')]");

  $res = '';
  foreach($nodes as $node) {
     /**
      * @var $node DOMElement
      */
     $res .= $dom->saveHTML($node);
  }

  echo htmlentities($res);

You could read following useful questions on stackoverflow 您可以在stackoverflow上阅读以下有用的问题
How do you parse and process HTML/XML in PHP? 你如何在PHP中解析和处理HTML / XML?
Getting DOM elements by classname 按类名获取DOM元素

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM