简体   繁体   English

Java XMLReader在XML中的特殊字符上获取SAXParseException

[英]Java XMLReader getting SAXParseException on special characters in XML

I have a problem parsing a XML file which contains special characters like ", <, > or & in attributes of an element. At the moment I use XMLReader with an own ContentHandler. Unfortunatel changing the XML is not an option since I get a huge bunch of files. Any idea what I could do?? 我在解析包含特殊字符(例如元素的属性中的“,<,>或&”之类的XML文件时遇到问题。此刻,我将XMLReader与自己的ContentHandler结合使用。不愿意更改XML是不可行的,因为我得到了很多一堆文件。我能做什么?

Best! 最好!

You have to change the XML in order to make it well-formed. 您必须更改XML才能使其格式正确。 The five magic characters must be encoded properly OR wrapped in a CDATA section to tell the parser to allow them to pass. 这五个魔术字符必须正确编码或包装在CDATA节中,以告知解析器允许它们通过。

If the five magic characters are not encoded properly, you aren't receiving well-formed XML. 如果五个魔术字符未正确编码,则说明您没有收到格式正确的XML。 That ought to be the foundation of your contract with users. 那应该是您与用户签订合同的基础。

Do a one-shot change. 一键更改。

It's not XML. 不是XML。 Don't call it XML, because you are misleading yourself. 不要称其为XML,因为您会误导自己。 You're dealing with a proprietary data syntax, and you are missing out on all the benefits of using XML for data interchange. 您正在使用专有的数据语法,却错过了使用XML进行数据交换的所有好处。 You can't use any of the wonderful tools that exist for processing XML, because your data is not XML. 您不能使用任何用于处理XML的出色工具,因为您的数据不是XML。 You're in the dark ages of data interchange that existed before XML was invented, where everyone had to write their own parsers and port them to multiple platforms, at vast cost. 您正处于XML发明之前存在的数据交换的黑暗时代,在那里每个人都必须编写自己的解析器并将其移植到多个平台上,而代价不菲。 It may be expensive to switch from this mess to the modern world of open standards, but the investment will pay off quickly. 从混乱中转换到开放标准的现代世界可能会很昂贵,但是投资会很快得到回报。 Just don't let any of the stakeholders delude themselves into thinking that because your syntax is "almost XML", you are almost there in terms of reaping the benefits. 只是不要让任何利益相关者自欺欺人地认为,因为您的语法是“几乎XML”,所以就可以从中受益。 XML is all or nothing. XML是全有还是全无。

It's not best practice , but you could use regex to transform your almost-XML into proper XML before you open it with XMLReader. 不是最佳实践 ,但是在使用XMLReader打开它之前,可以使用regex将几乎XML转换为正确的XML。 Something along these lines (just using javascript for a quick proof-of-concept): 遵循以下原则(仅使用javascript进行快速概念验证):

var xml = '<root><node attr="bad attr chars...<"&>..."/></root>';
xml = xml.replace(/("[^"]*)&([^"]*")/, '$1&amp;$2')
xml = xml.replace(/("[^"]*)<([^"]*")/, '$1&lt;$2')
xml = xml.replace(/("[^"]*)>([^"]*")/, '$1&gt;$2')
xml = xml.replace(/("[^"]*)"([^"]*")/, '$1&quot;$2')
alert(xml);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM