简体   繁体   中英

C#: shield XmlTextReader from an occasional Unicode character

In C#, I have a XmlTextReader created directly from an HTTP response (I have no control over the XML content of the response).

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
XmlTextReader reader = new XmlTextReader(response.GetResponseStream());

It works, but sometimes one of the XML element nodes will contain a Unicode character (eg "é") which trips the reader. I've tried to use a StreamReader with declared encoding, but now the XmlTextReader quits out on the very first line: "Data invalid. Line 1, position 1":

StreamReader sReader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.Unicode);
XmlTextReader reader = new XmlTextReader(sReader);

Is there a way to fix this? Alternatively, is there a way to prevent the XmlTextReader from parsing an element (I know its name) with a potentially offending character? I don't care about that particular element, I just don't want it to trip the reader.

EDIT: Quick fix: read the response into a StringBuilder ("sb"):

sb.Replace("é", "e");
StringReader strReader = new StringReader(sb.ToString());
XmlTextReader reader = new XmlTextReader(strReader);

It is not a Unicode character, it is an invalid character ( not correctly encoded ).

There is no way to shield an XmlTextReader from invalid XML . You need to either

  • Fix the server side to properly encode characters
  • Pre-process the text to do it yourself

According to UTF8, all such characters ("é") are encoded with 2 or 3 bytes (or more). You can use a hex editor to verify it.

What do you mean by "trips the reader"? Your first snippet of code should be fine - if the XML is genuinely in the encoding it declares (please look at the XML declaration) then it should be absolutely fine.

If the XML is genuinely broken, I would suggest performing some sort of filtering before XML parsing (eg loading the XML into a string with the right encoding, then fixing the declared encoding to match)... but we'll need to work out what's wrong with it first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM