简体   繁体   English

自动更正无效的XML?

[英]Automatically correct invalid XML?

I am currently using SSIS on a project where I need to verify the correct XML file structure.我目前在需要验证正确的 XML 文件结构的项目上使用 SSIS。 In particular, I have to check that there is no missing tag in the XML file and if so, I have to reassemble this line without tag.特别是,我必须检查 XML 文件中是否没有丢失标签,如果是,我必须重新组装没有标签的这一行。 I'll give you an example to better understand.我会给你一个例子来更好地理解。

<?xml version="1.0"?>
<catalog>
<DATA>0000000061E82D821590010000409525CD</DATA> 
<DATA>0000000061E82D8C163001000140AD0DF6</DATA> 
<DATA>0000000061E82D9616E301000240776CAB</DATA>
<DATA> 0000000061E82DA0178001000340C56B6</DATA> 
<DATA>0000000061E82DAA188001000440C0C7CB</DATA>
 0000000061E82DDAEA4001000540BB9A276
</catalog>

For example in the above XML there is a <DATA> tag missing.例如在上面的 XML 中缺少一个<DATA>标签。 I have no influence on the creation of the XML. How could I notice that a <DATA> tag is missing (the number of data lines is not fixed), and subsequently retrieve that line where there is no tag?我对 XML 的创建没有影响。我怎么会注意到缺少<DATA>标记(数据行数不固定),然后检索没有标记的那一行?

For example in the above xml there is a <DATA> tag missing.例如在上面的 xml 中缺少一个<DATA>标签。 I have no influence on the creation of the XML.我对 XML 的创建没有影响。

The solution can be a suite of SSIS components or a c# script.该解决方案可以是一套 SSIS 组件或 c# 脚本。

It is impossible to automatically correct invalid XML in the general case.一般情况下不可能自动更正无效的XML。

Terminology correction术语更正

For example in the above XML there is a <DATA> tag missing.例如在上面的 XML 中缺少一个<DATA>标签。

There is not a <DATA> tag missing.没有缺少<DATA>标记。 You probably mean that there are supposed to be begin and end DATA tags surrounding 0000000061E82DDAEA4001000540BB9A276 .您可能是说0000000061E82DDAEA4001000540BB9A276周围应该有开始结束DATA标签。 The difference is significant because if there were only a single tag missing, the "XML" would not be well-formed .差别很大,因为如果只缺少一个标签,“XML”就不会是格式良好的 If a schema says that a catalog element may only have DATA children, then the XML is not valid .如果模式表明catalog元素可能只有DATA子元素,则 XML无效

Don't try to automatically correct invalid XML不要尝试自动更正无效的 XML

Best practice is to reject the input and force the sender/creator to fix the document.最佳做法是拒绝输入并强制发件人/创建者修复文档。 The entire raison d'être for a schema is to express the invariants that can be relied upon to process the data.模式的全部存在理由是表达可依赖于处理数据的不变量。 Violating those invariants means all bets are off.违反这些不变量意味着所有的赌注都被取消了。

Don't be seduced by the superficial simplicity of peep-hole repair ideas不要被窥视孔修复想法的表面简单所诱惑

Every repair idea implies an assumption about the data that is not expressed in the schema, which would be bad because:每个修复想法都意味着对模式中未表达的数据的假设,这很糟糕,因为:

  • There should be a clearly and explicitly expressed definition of validity and应该有一个清晰明确的有效性定义和
  • The assumptions假设
    • will likely not be expressed unambiguously.很可能不会明确表达。
    • may not be expressed at all.可能根本无法表达。
    • may be incomplete or entirely incorrect.可能不完整或完全不正确。
    • will probably go unconfirmed because an errant producer that can/will not fix validity against a schema is unlikely to be able to assess the validity of an assumption over all data that it is, or could be, sending over all time.可能会 go 未经证实,因为错误的生产者可以/不会根据模式修复有效性,不太可能能够评估对所有数据的假设的有效性,即它正在或可能一直在发送。

See also也可以看看

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM