简体繁体中英

Decoding extended characters in XML

原文 2010-01-07 18:24:12 3 2 .net/ xml/ encoding

I know this is probably simple and has probably been asked before, but I'm having trouble coming up with a solution.

I am parsing some RSS feeds which include HTML as CDATA blocks. One example is here: http://g.msn.com/1ewenus50/news2

The feed changes a lot, but there are almost always some extended characters in it. For example if I make a simple console app and use WebClient.DownloadString and look at the result, I see things like

"learned of the alleged attempted Flight 253 bomberâ€™s extremist links while he was mid-flight on Christmas Day. NBCâ€™s Savannah Guthrie reports.Â (Today Show)"

However those weird characters should be apostrophes, quote marks, em dashes, etc.

What is the trick for getting these to decode correctly?

If it wasn't clear, I'm using C# / .NET for this. In the end this content will be rendered in Silverlight, but I'm seeing the issue in the full .NET 3.5 runtime as well.

2 answers

Download it in binary form and parse it as XML. That should get it right - the XML document should be self-describing in terms of the encoding, but I wouldn't put it past some webservers to advertise it (in headers) as having a different encoding, which would confuse DownloadString .

In general, when XML is involved it's worth doing as much as possible within an XML API rather than with the raw data.

您可能使用了错误的文本编码...我不确定您使用的是哪种还是正确的，但是这可能会让您走上正轨。

Decoding foreign language characters in url

Issues decoding strings from Xml

HttpUtility.ParseQueryString without decoding special characters

MIME Attachment Names containing Extended Characters Fails

How to replace extended ASCII characters in C#?

c# - Replacing extended ascii characters

Decoding 7Bit content-transfer-encoding messages with special characters

Extended ASCII characters such as euro symbol being converted to its unicode equivalent

Invalid XML characters

XML Serialization of Class Which Has a Member Extended From an Abstract Type

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Decoding foreign language characters in url Issues decoding strings from Xml HttpUtility.ParseQueryString without decoding special characters MIME Attachment Names containing Extended Characters Fails How to replace extended ASCII characters in C#? c# - Replacing extended ascii characters Decoding 7Bit content-transfer-encoding messages with special characters Extended ASCII characters such as euro symbol being converted to its unicode equivalent Invalid XML characters XML Serialization of Class Which Has a Member Extended From an Abstract Type

Related Tags

Decoding extended characters in XML

Question

2 answers

solution1
0 ACCPTED 2010-01-07 18:27:27

solution2
0 2010-01-07 18:28:25

Decoding extended characters in XML

Question

2 answers

solution1 0 ACCPTED 2010-01-07 18:27:27

solution2 0 2010-01-07 18:28:25

solution1
0 ACCPTED 2010-01-07 18:27:27

solution2
0 2010-01-07 18:28:25