简体   繁体   English

正则表达式-匹配HTML标记以外的所有内容

[英]Regex - Match everything except HTML tags

I've searched for this but couldn't find a solution that worked for me. 我已经搜索过了,但是找不到适合我的解决方案。 I need regex pattern that will match all text except html tags, so I can make it cyrilic (which would obviously ruin the entire html =)) 我需要正则表达式模式,该模式将匹配除html标签之外的所有文本,因此我可以将其设置为cyrilic(显然会破坏整个html =)

So, for example: 因此,例如:

<p>text1</p>
<p>text2 <span class="theClass">text3</span></p>

I need to match text1, text2, and text3, so something like 我需要匹配text1,text2和text3,所以类似

preg_match_all("/pattern/", $text, $matches)

and then I would just iterate over the matches, or if it can be done with preg_replace, to replace text1/2/3, with textA/B/C, that would be even better. 然后我将遍历所有匹配项,或者如果可以使用preg_replace进行替换,则将text1 / 2/3替换为textA / B / C,那会更好。

As you probably know, regex is not a great choice for this (the general advice here will be to use a Dom parser). 您可能知道,正则表达式不是一个很好的选择(这里的一般建议是使用Dom解析器)。

However, if you needed a quick regex solution, you use this (see demo ): 但是,如果您需要快速的正则表达式解决方案,请使用此方法(请参见demo ):

<[^>]*>(*SKIP)(*F)|[^<]+

How this works is that on the left the <[^>]*> matches complete <tags> , then the (*SKIP)(*F) causes the regex to fail and the engine to advance to the position in the string that follows the last character of the matched tag. 它的工作方式是<[^>]*>在左侧匹配完整的<tags> ,然后(*SKIP)(*F)导致正则表达式失败,并且引擎前进到后面的字符串中的位置匹配标记的最后一个字符。

This is an application of a general technique to exclude patterns from matches (read the linked question for more details). 这是一种通用技术的应用,用于从匹配项中排除模式 (有关更多详细信息,请阅读链接的问题)。

If you don't want to allow the matches to span several lines, add \\r\\n to the negated character class that does your matching, like so: 如果您不想让匹配跨越多行,请将\\r\\n添加到进行匹配的否定字符类中,如下所示:

<[^>]*>(*SKIP)(*F)|[^<\r\n]+

How about this RegEx: 该RegEx怎么样:

/(?<=>)[\w\s]+(?=<)/g

Online Demo 在线演示

Please use PHP DOMDocument class to parse XML content : 请使用PHP DOMDocument类解析XML内容:

PHP Doc PHP文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM