简体   繁体   English

如何使用正则表达式从HTML剥离属性(样式属性除外)?

[英]How do I strip attributes (except the style attribute) from HTML using regular expressions?

Original code: 原始代码:

<div style="height:100px;" id="main" >
<a href="133"></a>
<blockquote color="123">

after replace 更换后

<div style="height:100px;" >
<a></a>
<blockquote>

i try the regex but its not work 我尝试使用正则表达式,但无法正常工作

preg_replace('#<(div|span|a|img|ul|li|blockquote).*( style=".*")?(.*)>#Us', '<$1$2>', $content);

anyone can help me to solve this problem? 任何人都可以帮助我解决这个问题? thank you!! 谢谢!!

Not recommending regex, but this probably works. 不推荐使用正则表达式,但这可能有效。

Edit: fixed option group, was in the wrong place. 编辑:固定选项组,在错误的位置。

Test case here: http://ideone.com/vRk1u 此处的测试用例: http : //ideone.com/vRk1u

'~
( < (?:div|span|a|img|ul|li|blockquote) (?=\s) )         # 1
   (?= 
     (?:
        (?:[^>"\']|"[^"]*"|\'[^\']*\')*? 
        (                                                      # 2
          \s  style \s*=
          (?: (?>  \s* ([\'"]) \s* (?:(?!\g{-1}) .)* \s* \g{-1} )  #3
            | (?>  (?!\s*[\'"]) \s* [^\s>]* (?=\s|>) )
          )
        )
     )?
   )
  \s* (?:".*?"|\'.*?\'|[^>]*?)+ 
( /?> )                                                  # 4
~xs'

I do not have PHP available at this moment, so I'll write you a regex on Javascript, and you can port it easily. 目前没有可用的PHP,因此我将为您编写基于Javascript的正则表达式,您可以轻松地将其移植。 (I'll use the RegExp object so the regex will already be quoted for you) (我将使用RegExp对象,因此将为您引用正则表达式)

'<div style="height:100px;" id="main" >'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div style="height:100px;">

'<div style=\'height:100px;\' id="main" >'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div style='height:100px;'>

'<div style="height:100px;">'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div style="height:100px;">

'<div dfg dfg fdg>'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div>

'<div>'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div>

So its one regex which takes into account most possible situations. 因此,它的一个正则表达式考虑了大多数可能的情况。

Does this answer your question? 这回答了你的问题了吗?

(Btw, you can replace those [ \\t\\r\\n] with the whitespace shorthand if php's regex supports it and it works in multiline mode) (顺便说一句,如果php的regex支持它,并且可以在多行模式下运行,则可以用空格速记代替那些[\\ t \\ r \\ n])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM