简体   繁体   English

如何编写reg express以在php中获得以下模式?

[英]How to write the reg express to get the following pattern in the php?

There is a website and I would like to get all the <td> (any content) </td> pattern string 有一个网站,我想获取所有<td> (any content) </td>模式字符串

So I write like this: 所以我这样写:

preg_match("/<td>.*</td>/", $web , $matches);
            die(var_dump($matches));

That return null, how to fix the problem? 那返回null,如何解决问题? Thanks for helping 感谢您的帮助

OK. 好。

You are only not escaping properly I guess. 我猜你只是没有适当地逃避。 Also use groups to capture your stuff properly. 也可以使用小组来正确捕获您的东西。

<td>(.*)<\/td>

should do. 应该做。 You can try this regex on your given text here . 您可以在此处的给定文本上尝试使用此正则表达式。 Don't forget the global flag if you are matching ALL td's. 如果您匹配所有td,请不要忘记全局标志 ( preg_match_all in PHP) (PHP中的preg_match_all

Usually parsing HTML with regex is not a good idea, try to use DOM parsers instead. 通常,使用regex解析HTML不是一个好主意,请尝试使用DOM解析器。 Example -> http://simplehtmldom.sourceforge.net/ 范例-> http://simplehtmldom.sourceforge.net/

Test the above regex with 使用以下命令测试上述正则表达式

$web = file_get_contents('http://www.w3schools.com/html/html_tables.asp' ); 
preg_match_all("/<td>(.*)<\/td>/", $web , $matches); 
print_r( $matches);

Lazy Quantifier, Different Delimiter 惰性量词,不同定界符

You need .*? 您需要.*? rather than .* , otherwise you can overshoot the closing </td> . 而不是.* ,否则您可以超出结束</td> Also, your / delimiter needed to be escaped when it appeared in </td> . 另外,当/分隔符出现在</td>时,需要对其进行转义。 We can replace it with another one that doesn't need escaping. 我们可以用另一个不需要转义的替换它。

Do this: 做这个:

$regex = '~<td>.*?</td>~';
preg_match_all($regex, $web, $matches);
print_r($matches[0]);

Explanation 说明

  • The ~ is just an esthetic tweak—you can use any delimiter you like around your regex patttern, and in general ~ is more versatile than / , which needs to be escaped more often, for instance in </td> . ~只是一种美学上的调整-您可以在正则表达式样式周围使用任何喜欢的定界符,通常~/更具通用性,它需要更频繁地转义,例如</td>
  • The star quantifier in .*? .*?的星号.*? is made "lazy" by the ? 被“懒惰”了? so that the dot only matches as many characters as needed to allow the next token to match (shortest match). 因此,点仅匹配所需数量的字符,以允许下一个标记匹配(最短匹配)。 Without the ? 没有? , the .* first matches the whole string, then backtracks only as far as needed to allow the next token to match (longest match). .*首先匹配整个字符串,然后仅回溯所需的距离以允许下一个标记匹配(最长匹配)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM