Issues decoding strings from Xml

Question

I have been given a large quantity of Xml's where I need to pull out parts of the text elements and reuse it for other purposes. (I am using XDocument to pull Xml data).

But, how do I decode the text contained in the elements? What is even the formatting used here? A few examples:

"What is the meaning of this&amp;reg; asks Sonny."
"The big centre cost 1&amp;#190; million pounds"
"... lost it. &amp;#174; The next ..."

I have tried HttpUtility.HtmlDecode but that did not do the trick. If I decode twice the "®" turns into a ® which is obviously not right.

Looks like ® are line breaks. The ® are probably question marks. The 190 one, I don't even know. Perhaps a dot or comma?

Any ideas would be welcome.

Answer 1

It does appear that the strings you show have been HTML encoded, and then XML encoded (or HTML again).

It is correct that &reg; -> ® -> ® (the registered trademark symbol) per the ISO Latin-1 entities - &#174; should behave the same way

Similarly &amp#190; would turn into a fraction representing three quarters.

Issues decoding strings from Xml

Question

1 answers

solution1
0 ACCPTED 2012-04-06 10:20:39

Issues decoding strings from Xml

Question

1 answers

solution1 0 ACCPTED 2012-04-06 10:20:39

solution1
0 ACCPTED 2012-04-06 10:20:39