简体   繁体   English

System.Xml.XmlException:给定编码中的无效字符

[英]System.Xml.XmlException: Invalid character in the given encoding

I am using XmlDocument.Load to load the contents of an XML file that has some characters in Thai. 我正在使用XmlDocument.Load加载带有泰语字符的XML文件的内容。 The application is erroring out with the following exception. 应用程序出错,但以下异常。

System.Xml.XmlException: Invalid character in the given encoding. System.Xml.XmlException:给定编码中的无效字符。 Line 2, position 82. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.InvalidCharRecovery(Int32& bytesCount, Int32& charsCount) at System.Xml.XmlTextReaderImpl.GetChars(Int32 maxCharsCount) at System.Xml.XmlTextReaderImpl.ReadData() at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars) at System.Xml.XmlTextReaderImpl.FinishPartialValue() at System.Xml.XmlTextReaderImpl.get_Value() at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace) at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc) at System.Xml.XmlDocument.Load(XmlReader reader) 第2行,位置82。位于System.Xml.Xml.XmlTextReaderImpl.GetChars(Int32 maxCharsCount)的System.Xml.XmlTextReaderImpl.InvalidCharRecovery(Int32&bytesCount,Int32&charsCount)的System.Xml.XmlTextReaderImpl.Throw(Exception e)处。 System.Xml.XmlTextReaderImpl.get_Value()处的System.Xml.XmlTextReaderImpl.FinishPartialValue()处的System.Xml.XmlTextXerTextReaderImpl.ParseText(Int32&startPos,Int32&endPos,Int32&outOrChars)的XmlTextReaderImpl.ReadData()。 System.Xml.XmlDocument.Load(XmlReader阅读器)处的System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)的LoadNode(Boolean skipOverWhitespace)

The XML file begins with this content XML文件以此内容开头 在此处输入图片说明

Notice the strange character before the closing tag. 注意结束标记之前的奇怪字符。 This content is coming from a third-party and I don't have access to the file/content. 该内容来自第三方,我无权访问该文件/内容。

My questions are: 我的问题是:

  1. Why is the strange character appearing in the content sent to my from the third party provider? 为什么从第三方提供商发送给我的内容中出现奇怪的字符?
  2. Is there any way to successfully process the file (load it into the XmlDocument) since I don't have access to modifying its content before processing it? 有什么方法可以成功处理文件(将其加载到XmlDocument中),因为在处理文件之前我无权修改其内容?

The data supplied by the third party is not valid XML. 第三方提供的数据不是有效的XML。 I think there's only two solutions ie Get the third party to supply valid XML or strip the invalid characters from the XML and process what you can. 我认为只有两种解决方案,即让第三方提供有效的XML或从XML中剥离无效字符并处理您可以采取的措施。 You could do this... 你可以做...

string invalidXML = File.ReadAllText(path);
var validXml = invalidXML.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray()
if (validXml != invalidXML)
   // log the invalid

// process (what you can in) the validXml 

If you are very sure that they are Thai characters, Then try correct data encoding in Load. 如果您非常确定它们是泰文字符,请尝试在“加载”中正确的数据编码。

For Thai the Character encoding is - ISO 8859-11 对于泰语,字符编码为ISO 8859-11

So could you please try below way of doc load: 因此,您可以尝试以下文档加载方式:

 xmlDoc.Load(new StreamReader(File.Open("YourXMLFile.xml"), 
                         Encoding.GetEncoding("iso-8859-11"))); 

Answer to first question, you may need to talk to the third party and ask them to look into their source code to find out why those unwanted characters are appearing in the generated XML. 回答第一个问题,您可能需要与第三方交谈,并请他们调查其源代码以找出为什么那些不需要的字符出现在生成的XML中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM