简体   繁体   中英

Need regular expression to remove /> between two HTML markup tags except img tag

I need some help crafting a regular expression which removes /> between two HTML markup tags.

<!-- The line could look like this -->
<td align=right valign=bottom nowrap><div>January 24, 2013 /></div></td>

<!-- Or this -->
<div>Is this system supported? /></div>

<!-- Even this -->
<span>This is a span tag /></div>

<!-- It could look like any of these but I do not want /> removed -->
<img src="example.com/example.jpg"/></img>
<img src="example.com/example.jpg"/>
<img src="example.com/example.jpg"/></img>
<div id="example"><img src="example.com/example.jpg"/></div>

(Yes, I realize the img tag has no closing tag associated with it. I am dynamically editing a myriad of pages I have not created; it's not my markup.)

Here's the regex I came up with (using perl):

s|(<.*?>(?!<img).*?)(\s*/>)(?!</img>)(</.*?>)|$1$3|gi;

Is there a better regex that's more efficient or faster?

After regex is applied to the above examples, here are the results:

<!-- The line could look like this -->
<td align=right valign=bottom nowrap><div>January 24, 2013></div></td>

<!-- Or this -->
<div>Is this system supported?></div>

<!-- Even this -->
<span>This is a span tag></div>

<!-- It could look like any of these but I do not want /> removed -->
<img src="example.com/example.jpg"/></img>
<img src="example.com/example.jpg"/>
<img src="example.com/example.jpg"/></img>
<div id="example"><img src="example.com/example.jpg"/></div>

A shorter solution would be:

s/(<[^>]*>[^<]*)\/>/$1/g

It groups an opening tag and the possibly following content, excluding the opening angular bracket - which would indicate another tag. Then it looks for /> . If it is found, substition is used to remove it.

Update: The question was extended to remove possible whitespace before the /> . This can be done by making the [^<]* part "lazy" like so:

s/(<[^>]*>[^<]*?)\s*\/>/$1/g

See for yourself on regex101 (link updated).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM