如何编写reg express以在php中获得以下模式？

Question

There is a website and I would like to get all the <td> (any content) </td> pattern string 有一个网站，我想获取所有<td> (any content) </td>模式字符串

So I write like this: 所以我这样写：

preg_match("/<td>.*</td>/", $web , $matches);
            die(var_dump($matches));

That return null, how to fix the problem? 那返回null，如何解决问题？ Thanks for helping 感谢您的帮助

Answer 1

OK. 好。

You are only not escaping properly I guess. 我猜你只是没有适当地逃避。 Also use groups to capture your stuff properly. 也可以使用小组来正确捕获您的东西。

<td>(.*)<\/td>

should do. 应该做。 You can try this regex on your given text here . 您可以在此处的给定文本上尝试使用此正则表达式。 Don't forget the global flag if you are matching ALL td's. 如果您匹配所有td，请不要忘记全局标志 。 ( preg_match_all in PHP) （PHP中的preg_match_all ）

Usually parsing HTML with regex is not a good idea, try to use DOM parsers instead. 通常，使用regex解析HTML不是一个好主意，请尝试使用DOM解析器。 Example -> http://simplehtmldom.sourceforge.net/ 范例-> http://simplehtmldom.sourceforge.net/

Test the above regex with 使用以下命令测试上述正则表达式

$web = file_get_contents('http://www.w3schools.com/html/html_tables.asp' ); 
preg_match_all("/<td>(.*)<\/td>/", $web , $matches); 
print_r( $matches);

Answer 2

Lazy Quantifier, Different Delimiter 惰性量词，不同定界符

You need .*? 您需要.*? rather than .* , otherwise you can overshoot the closing </td> . 而不是.* ，否则您可以超出结束</td> 。 Also, your / delimiter needed to be escaped when it appeared in </td> . 另外，当/分隔符出现在</td>时，需要对其进行转义。 We can replace it with another one that doesn't need escaping. 我们可以用另一个不需要转义的替换它。

Do this: 做这个：

$regex = '~<td>.*?</td>~';
preg_match_all($regex, $web, $matches);
print_r($matches[0]);

Explanation 说明

The ~ is just an esthetic tweak—you can use any delimiter you like around your regex patttern, and in general ~ is more versatile than / , which needs to be escaped more often, for instance in </td> . ~只是一种美学上的调整-您可以在正则表达式样式周围使用任何喜欢的定界符，通常~比/更具通用性，它需要更频繁地转义，例如</td> 。
The star quantifier in .*? .*?的星号.*? is made "lazy" by the ? 被“懒惰”了? so that the dot only matches as many characters as needed to allow the next token to match (shortest match). 因此，点仅匹配所需数量的字符，以允许下一个标记匹配（最短匹配）。 Without the ? 没有? , the .* first matches the whole string, then backtracks only as far as needed to allow the next token to match (longest match). ， .*首先匹配整个字符串，然后仅回溯所需的距离以允许下一个标记匹配（最长匹配）。

如何编写reg express以在php中获得以下模式？

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-07-28 10:41:48

解决方案2
1 2014-07-28 10:56:36

如何编写reg express以在php中获得以下模式？

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-07-28 10:41:48

解决方案2 1 2014-07-28 10:56:36

解决方案1
2 已采纳 2014-07-28 10:41:48

解决方案2
1 2014-07-28 10:56:36