简体   繁体   English

嵌套html标签的preg_match

[英]preg_match for nested html tags

I would like to catch all "dev" tags and their respective content, through php preg_match_all() but can't get the nested ones. 我想通过php preg_match_all()捕获所有“ dev”标签及其各自的内容,但无法获取嵌套的标签。

data: 数据:

<dev>aaa</dev> <dev>bbb</dev> <dev> ccc <dev>ddd</dev> </dev> <dev>aaa</dev> <dev>bbb</dev> <dev> ccc <dev>ddd</dev> </dev>

my expression so far: 到目前为止我的表情:

|<dev>(.*)</dev>|Uis

thanks, for your help, b. 谢谢您的帮助; b。

Don't use regular expressions for parsing. 不要使用正则表达式进行解析。 Use a real parser like DOMDocument or SimpleXML : 使用真正的解析器,例如DOMDocumentSimpleXML

$xml = simplexml_load_string('<root>'.$str.'</root>');

You need to have a recursive matching pattern: 您需要具有递归匹配模式:

/<dev>(.*|(?R))<\/dev>/i

That will just suck up any nested elements, so if you want to then parse those, you will have to run the function again on $matches[1] 那只会吸收任何嵌套的元素,因此,如果您想对其进行解析,则必须在$ matches [1]上再次运行该函数。

The * is a greedy operator, consumes as many characters as possible. *是贪婪的运算符,它消耗尽可能多的字符。 You should use the *? 您应该使用*? non-greedy version instead to find the smallest possible matches. 非贪婪版本,而是查找可能的最小匹配项。 Maybe regexes are not the best tools to do this. 也许正则表达式不是执行此操作的最佳工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM