使用preg_match_all从HTML获取项目

Question

I have a number of items in a table, formatted like this 我的表格中有很多项目，格式如下

<td class="product highlighted">
Item Name
</td>

and I am using the following PHP code 我正在使用以下PHP代码

$regex_pattern = "/<td class=\"product highlighted\">(.*)<\/td>/";
preg_match_all($regex_pattern,$buffer,$matches);
print_r($matches);

I am not getting any output, yet I can see the items in the html. 我没有任何输出，但是我可以在html中看到项目。

Is there something wrong with my regexp? 我的正则表达式有问题吗？

Answer 1

Apart from your using regex to parse HTML, yes, there is something wrong: The dot doesn't match newlines. 是的，除了使用正则表达式解析HTML外，还存在问题：点与换行符不匹配。

So you need to use 所以你需要使用

$regex_pattern = "/<td class=\"product highlighted\">(.*?)<\/td>/s";

The /s modifier allows the dot to match any character, including newlines. /s修饰符允许点匹配任何字符，包括换行符。 Note the reluctant quantifier .*? 注意勉强的量词.*? to avoid matching more than one tag at once. 避免一次匹配多个标签。

Answer 2

In order to match your example, you will need to add the dot all flag, s , so the . 为了匹配您的示例，您将需要在点上添加所有标记s ，因此需要添加. will match newlines. 将匹配换行符。

Try the following. 请尝试以下方法。

$regex_pattern = "/<td class=\"product highlighted\">(.*?)<\/td>/s";

Also note that I changed the capture to non-greedy, (.*?) . 另请注意，我将捕获更改为非贪婪(.*?) 。 It's best to do so when matching open ended text. 匹配开放式文本时最好这样做。

It's worth noting regular expressions are not the right tool for HTML parsing, you should look into DOMDocument . 值得注意的是，正则表达式不是HTML解析的正确工具，您应该研究DOMDocument 。 However, for such a simple match you can get away with regular expressions provided your HTML is well-formed. 但是，对于这样简单的匹配，只要您的HTML格式正确，就可以避免使用正则表达式。

使用preg_match_all从HTML获取项目

问题描述

2 个解决方案

解决方案1
6 已采纳 2011-09-12 20:47:55

解决方案2
3 2011-09-12 20:48:35

使用preg_match_all从HTML获取项目

问题描述

2 个解决方案

解决方案1 6 已采纳 2011-09-12 20:47:55

解决方案2 3 2011-09-12 20:48:35

解决方案1
6 已采纳 2011-09-12 20:47:55

解决方案2
3 2011-09-12 20:48:35