[英]Search html string using regular expression and store in array in PHP
I need to search string which can be something like this: 我需要搜索字符串,可以是这样的:
<div class="icon_star"> </div>
or 要么
<div class="icon_star"></div>
or 要么
<div class="icon_star"> </div>
I need to search above strings in HTML which could be something like this: 我需要在HTML中搜索上面的字符串,这可能是这样的:
<h1 class="redword" tag="h1">
<span class="BASE">good</span>
</h1>
<span class="headword-definition"> - definition</span>
</span>
<div class="icon_star"></div>
<!-- End of DIV icon_star-->
<div class="icon_star"></div>
<!-- End of DIV icon_star-->
<div class="icon_star"></div>
<!-- End of DIV icon_star-->
</div><!-- End of DIV -->
<div class="headbar">
<div id="helplinks-box" class="responsive_hide_on_smartphone">
String which we are trying to search and store in array can be multiple times 我们试图在数组中搜索和存储的字符串可以多次
I have tried using the following regex: 我试过使用以下正则表达式:
preg_match_all ('/<div(\s)+class="icon_star">(.*?)<\/div>/i', $html1, $result_array1);
This above regex does not work when HTML to be searched is 当要搜索的HTML时,上面的正则表达式不起作用
<div id="headword">
<div id="headwordright">
<div style="display: none;" id="showmore"><a class="button" onmousedown="foldingSet(false)"><span class="label">Show more</span></a>
</div><!-- End of DIV -->
<div id="showless"><a class="button" onmousedown="foldingSet(true)"><span class="label">Show less</span></a>
</div><!-- End of DIV -->
</div><!-- End of DIV -->
<span class="BASE-FORM">
<h1 tag="h1" class="redword"><span class="BASE">scenario</span></h1>
<span class="headword-definition"> - definition</span>
</span>
<div class="icon_star"> </div><!-- End of DIV icon_star-->
</div>
Update 更新
It seems that you are reading your regexp results wrong way. 看来你正在以错误的方式阅读你的正则表达式结果。 Executing
执行
preg_match_all('/<div(\s)+class="icon_star">.*?<\/div>/i', $html, $result_array1);
for($x = 0; $x < count($result_array1); $x++)
$result_array1[$x] = array_map('htmlentities', $result_array1[$x]);
echo '<pre>' . print_r($result_array1, 1);
prints out 打印出来
Array
(
[0] => Array
(
[0] => <div class="icon_star"> </div>
)
[1] => Array
(
[0] =>
)
)
so you should be checking count of $result_array1[0]
instead of $result_array1
所以你应该检查
$result_array1[0]
而不是$result_array1
计数
side note 边注
instead of parsing HTML with regex, you could use DOMDocument
class built into PHP, if you can. 而不是用正则表达式解析HTML,你可以使用PHP内置的
DOMDocument
类,如果可以的话。
Using following code extracts three div's. 使用以下代码提取三个div。
Be aware that you need to have valid HTML for this method to work. 请注意,您需要使用有效的HTML才能使用此方法。
//your HTML with tag added to make it valid
$html = '<div>
<h1 class="redword" tag="h1">
<span class="BASE">good</span>
</h1>
<span class="headword-definition"><span> - definition</span></span>
<div class="icon_star"></div>
<div class="icon_star"></div>
<div class="icon_star"></div>
</div>
<div class="headbar">
<div id="helplinks-box" class="responsive_hide_on_smartphone">
</div>
</div>';
$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom);
//this xpath query looks for all nodes that have "class" attribute value equal to "icon_star"
$nodes = $x->query("//*[contains(@class, 'icon_star')]");
$res = '';
foreach($nodes as $node) {
/**
* @var $node DOMElement
*/
$res .= $dom->saveHTML($node);
}
echo htmlentities($res);
You could read following useful questions on stackoverflow 您可以在stackoverflow上阅读以下有用的问题
How do you parse and process HTML/XML in PHP? 你如何在PHP中解析和处理HTML / XML?
Getting DOM elements by classname 按类名获取DOM元素
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.