简体   繁体   中英

Remove HTML Encoded characters

I'm getting some data from the database and it has HTML Encoded chars (   ). What options are there for removing these?

I don't want these rendered at all...I want them stripped from the data.

At the moment I'm not worried about the HTML tags...just the encoded characters.

EDIT: If it's relevant these chars are causing some errors in JSON validation.

If you want to get rid of them, obtain a list of such characters or a RegExp matching them all (something like &[az]+; ) and do a search-and replace.

However, if you only want them gone due to errors in JSON validation, you should correctly generated/encode your JSON to avoid the errors. (However, I don't really understand how they can cause invalid JSON.)

Simply trimming by regexp should not be an option here. For example &nbsp; can be coded as &#160; as well, but &\\#[0-9]+; regex would lead to data loss, since almost every char can become encoded like that at some point (ex.: <p>&#72;&#69;&#76;&#76;&#79;</p> ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM