简体   繁体   中英

Issues decoding strings from Xml

I have been given a large quantity of Xml's where I need to pull out parts of the text elements and reuse it for other purposes. (I am using XDocument to pull Xml data).

But, how do I decode the text contained in the elements? What is even the formatting used here? A few examples:

"What is the meaning of this® asks Sonny."
"The big centre cost 1¾ million pounds"
"... lost it. ® The next ..."

I have tried HttpUtility.HtmlDecode but that did not do the trick. If I decode twice the "®" turns into a ® which is obviously not right.

Looks like ® are line breaks. The ® are probably question marks. The 190 one, I don't even know. Perhaps a dot or comma?

Any ideas would be welcome.

It does appear that the strings you show have been HTML encoded, and then XML encoded (or HTML again).

It is correct that ® -> ® -> ® (the registered trademark symbol) per the ISO Latin-1 entities - ® should behave the same way

Similarly &amp#190; would turn into a fraction representing three quarters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM