[英]How to write the reg express to get the following pattern in the php?
There is a website and I would like to get all the <td> (any content) </td>
pattern string 有一个网站,我想获取所有
<td> (any content) </td>
模式字符串
So I write like this: 所以我这样写:
preg_match("/<td>.*</td>/", $web , $matches);
die(var_dump($matches));
That return null, how to fix the problem? 那返回null,如何解决问题? Thanks for helping
感谢您的帮助
OK. 好。
You are only not escaping properly I guess. 我猜你只是没有适当地逃避。 Also use groups to capture your stuff properly.
也可以使用小组来正确捕获您的东西。
<td>(.*)<\/td>
should do. 应该做。 You can try this regex on your given text here .
您可以在此处的给定文本上尝试使用此正则表达式。 Don't forget the global flag if you are matching ALL td's.
如果您匹配所有td,请不要忘记全局标志 。 ( preg_match_all in PHP)
(PHP中的preg_match_all )
Usually parsing HTML with regex is not a good idea, try to use DOM parsers instead. 通常,使用regex解析HTML不是一个好主意,请尝试使用DOM解析器。 Example -> http://simplehtmldom.sourceforge.net/
范例-> http://simplehtmldom.sourceforge.net/
Test the above regex with 使用以下命令测试上述正则表达式
$web = file_get_contents('http://www.w3schools.com/html/html_tables.asp' );
preg_match_all("/<td>(.*)<\/td>/", $web , $matches);
print_r( $matches);
Lazy Quantifier, Different Delimiter 惰性量词,不同定界符
You need .*?
您需要
.*?
rather than .*
, otherwise you can overshoot the closing </td>
. 而不是
.*
,否则您可以超出结束</td>
。 Also, your /
delimiter needed to be escaped when it appeared in </td>
. 另外,当
/
分隔符出现在</td>
时,需要对其进行转义。 We can replace it with another one that doesn't need escaping. 我们可以用另一个不需要转义的替换它。
Do this: 做这个:
$regex = '~<td>.*?</td>~';
preg_match_all($regex, $web, $matches);
print_r($matches[0]);
Explanation 说明
~
is just an esthetic tweak—you can use any delimiter you like around your regex patttern, and in general ~
is more versatile than /
, which needs to be escaped more often, for instance in </td>
. ~
只是一种美学上的调整-您可以在正则表达式样式周围使用任何喜欢的定界符,通常~
比/
更具通用性,它需要更频繁地转义,例如</td>
。 .*?
.*?
的星号.*?
is made "lazy" by the ?
?
so that the dot only matches as many characters as needed to allow the next token to match (shortest match). ?
?
, the .*
first matches the whole string, then backtracks only as far as needed to allow the next token to match (longest match). .*
首先匹配整个字符串,然后仅回溯所需的距离以允许下一个标记匹配(最长匹配)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.