简体   繁体   English

验证XML中不可打印的ASCII字符的最佳方法

[英]Best way to validate non-printable ascii characters in XML

Application needs to validate the different input XML(s) messages for non-printable ascii characters. 应用程序需要针对不可打印的ascii字符验证不同的输入XML消息。 We currently know two options to do this. 目前,我们知道执行此操作的两种选择。

  1. Change the XSD to include the restriction. 更改XSD以包括限制。

  2. Validate the input xml string in java application using Regular Expression 使用正则表达式验证Java应用程序中的输入xml字符串

Which approach is better in terms of performance as our application has to return the response within a few seconds? 哪种方法在性能方面更好,因为我们的应用程序必须在几秒钟内返回响应? Is there any other option available to do this? 还有其他选择可以做到这一点吗?

It's mainly a matter of opinion but if you have an XSD that seems to be the natural place to include the validations. 这主要是一个见解,但如果您拥有XSD似乎是包含验证的自然地方。 The only thing you may need to consider is that via XSD you will either fail or pass, whereas with ad-hoc java validation you can ignore non-printable, or replace or take an action without failing the input completely. 您可能需要考虑的唯一一件事是,通过XSD您将失败或通过,而通过临时Java验证,您可以忽略不可打印的内容,或者在不完全失败输入的情况下进行替换或采取措施。

The only characters that are (a) ASCII, (b) non-printable, and (c) allowed in XML 1.0 documents are CR, NL, and TAB. XML,1.0文档中唯一允许(a)ASCII,(b)不可打印和(c)的字符是CR,NL和TAB。 I find it hard to see why excluding those three characters is especially important, but if you already have an XSD schema, then it makes sense to add the restriction there. 我很难理解为什么排除这三个字符特别重要,但是如果您已经具有XSD架构,则在其中添加限制是有意义的。

The usual approach is not to make these three characters invalid, but to treat them as equivalent to space characters, which you can do by using a data type that has the whitespace facet value "normalize" or "collapse". 通常的方法不是使这三个字符无效,而是将它们等同于空格字符,这可以通过使用具有空白构面值“ normalize”或“ collapse”的数据类型来实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM