简体   繁体   中英

How do I use ColdFusion to replace text in HTML without replacing HTML tags?

I have a html source as a String variable. And a word as another variable that will be highlighted in that html source.

I need a Regular Expression which does not highlights tags, but obly text within the tags.

For example I have a html source like

<cfset html =  "<span>Text goes here, forr example it container also **span** </span>" />
<cfset wordToReplace = "span" />
<cfset html = ReReplace(html ,"[^(<#wordToReplace#\b[^>]*>)]","replaced","ALL")>

and what I want to get is

<span>Text goes here, forr example it container also **replaced** </span>

But I have an error. Any tip!

I need a Regular Expression which does not highlights tags, but obly text within the tags.

You wont find one. Not one that is fully reliable against all legal/wild HTML.

The simple reason is that Regular Expressions match Regular languages , and HTML is not even remotely a Regular language.

Even if you're very careful, you run the risk of replacing stuff you didn't want to, and not replacing stuff you did want to, simply due to how complicated HTML syntax can be.


The correct way to parse HTML is using a purpose-built HTML DOM parser.

Annoyingly CF doesn't have one built in, though if your HTML is XHTML, then you can use XmlParse and XmlSearch to allow you to do an xpath search for only text (not tags) that match your text... something like //*[contains(text(), 'span')] should do ( more details here ).

If you've not got XHTML then you'll need to look at using a HTML DOM parser for Java - Google turns up plenty, (I've not tried any yet so can't give any specific recommendations).

what you have to do is use a lookahead to make sure that your text isn't contained within a tag. granted this could probably be written better, but it will get you the results you want. it will even handle when the tag has attributes.

<cfset html =  "<span class='me'>Text goes here, forr example it container also **span** </span>" />
<cfset wordToReplace = "span" />
<cfset html = ReReplace(html ,"(?!/?<)(#wordToReplace#)(?![^.*>]*>)","replaced","ALL")>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM