I obtain some text from Internet. There are sometimes characters like "&a mp;", "&q uot;", etc in teh text.
I guess they are some kind of unicode characters in Html. they are HTML encoded string, thanks for jason to point out.
How should I filter all these kinds of things out of the text? I don't want any HTML related code characters. by the way, I am not talking about the HTML tags in the text, only these kinds of unicode things.
thanks
This was answered here:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.