简体   繁体   中英

Using Javascript and Regex to replace HTML Characters

Thanks in advance for your help.

I have a need within an application to remove all HTML Characters and replace them with their HTML number equivalent.

For example:

‡, •, -, ‰, € and ™

Become:

‡, •, -, ‰, € and ™

There are lot's of questions currently out there, but these do it the other way round.

I have all of the chars I want to convert in a JSON object (this is just a snapsshot of a much larger list, just to prove my JSON is good):

{"ch":"‘","sub":"‘"},
{"ch":"’","sub":"’"},
{"ch":"‚","sub":"‚"},
{"ch":"“","sub":"“"},
{"ch":"”","sub":"”"},
{"ch":"„","sub":"„"},
{"ch":"†","sub":"†"},
{"ch":"‡","sub":"‡"},
{"ch":"•","sub":"•"},
...

And I currently loop through (using Prototype here) and attempt to replace them:

oJSONItems.each(function(o){
    var oRG = new RegExp(o.ch,'g');
    oText = oText.replace(oRG,o.sub);
});

Some are being replaced, but some are not...

‡
•
-
‰
€
™

More than anything I need to know why chars like are failing to be converted.

Thanks.

Rather than code for specific entities, how about one that replaces anything outside the original 7 bit ASCII range:

str = str.replace(/[^\011\012\015\040-\177]/g, function(x) {
    return '&#' + x.charCodeAt(0) + ';'
})

(The regexp matches anything that's not white space or a "normal" ASCII character)

Alternatively, write your map so that the keys are the characters you want to replace, and the values are the entities:

var map = { '£' : '£' }

str = str.replace(/./g, function(x) {
    return (x in map) ? map[x] : x;
});

Note that both versions only make the regexp call once , rather than once for each possible entity in your set. This should make the code somewhat faster than your loop-based method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM