简体   繁体   English

使用preg_match_all从HTML获取项目

[英]Using preg_match_all to get items from HTML

I have a number of items in a table, formatted like this 我的表格中有很多项目,格式如下

<td class="product highlighted">
Item Name
</td>

and I am using the following PHP code 我正在使用以下PHP代码

$regex_pattern = "/<td class=\"product highlighted\">(.*)<\/td>/";
preg_match_all($regex_pattern,$buffer,$matches);
print_r($matches);

I am not getting any output, yet I can see the items in the html. 我没有任何输出,但是我可以在html中看到项目。

Is there something wrong with my regexp? 我的正则表达式有问题吗?

Apart from your using regex to parse HTML, yes, there is something wrong: The dot doesn't match newlines. 是的,除了使用正则表达式解析HTML外,还存在问题:点与换行符不匹配。

So you need to use 所以你需要使用

$regex_pattern = "/<td class=\"product highlighted\">(.*?)<\/td>/s";

The /s modifier allows the dot to match any character, including newlines. /s修饰符允许点匹配任何字符,包括换行符。 Note the reluctant quantifier .*? 注意勉强的量词.*? to avoid matching more than one tag at once. 避免一次匹配多个标签。

In order to match your example, you will need to add the dot all flag, s , so the . 为了匹配您的示例,您将需要在上添加所有标记s ,因此需要添加. will match newlines. 将匹配换行符。

Try the following. 请尝试以下方法。

$regex_pattern = "/<td class=\"product highlighted\">(.*?)<\/td>/s";

Also note that I changed the capture to non-greedy, (.*?) . 另请注意,我将捕获更改为非贪婪(.*?) It's best to do so when matching open ended text. 匹配开放式文本时最好这样做。

It's worth noting regular expressions are not the right tool for HTML parsing, you should look into DOMDocument . 值得注意的是,正则表达式不是HTML解析的正确工具,您应该研究DOMDocument However, for such a simple match you can get away with regular expressions provided your HTML is well-formed. 但是,对于这样简单的匹配,只要您的HTML格式正确,就可以避免使用正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM