简体   繁体   English

XML文件中的无形差异-打破了自制的XML解析器

[英]Invisible differences in XML files - breaking self-made XML parser

I have two programs that interact with an well-defined XML file. 我有两个程序可以与定义明确的XML文件进行交互。 The first program (Model) reads it in, parses it, and uses content from the file to direct the running of a model. 第一个程序(模型)将其读入,解析并使用文件中的内容来指导模型的运行。 The second program (Controller) opens up and rewrites the XML file, allowing different settings to be run in the Model. 第二个程序(控制器)打开并重写XML文件,从而允许在模型中运行不同的设置。

Model is written in C++, worked with in VS2010 and VS2012, has no GUI, and uses a home-made (is this the correct term?) XML parser that has worked for many years without fail - I just checked the SVN for revisions to the files that make it up - nothing since 2013. Controller is written in C#, in VS2012, with a GUI that has drop downs that set the content of the XML file, and uses the XmlDocument class to read in, edit, and print out the XML file . 模型是用C ++编写的,可以在VS2010和VS2012中使用,没有GUI,并且使用了已经使用了很多年的XML解析器(这是正确的术语吗?)-我只是检查了SVN的修订版组成文件的文件-自2013年以来什么都没有。控制器是用C#编写的,在VS2012中带有一个GUI,该GUI带有下拉菜单,用于设置XML文件的内容,并使用XmlDocument类读取,编辑和打印输出XML文件。

Suddenly, the Controller no longer spits out XML files that can be read by Model. 突然,Controller不再吐出Model可以读取的XML文件。 When Model tries to read the XML file, the first character it encounters it reads as '-17'. 当Model尝试读取XML文件时,遇到的第一个字符将读为'-17'。 AS far as I have been able to tell this means that it doesn't recognize it as an UTF-8 character. 据我所知,这意味着它无法将其识别为UTF-8字符。 This cause model to cout the error and then crash. 这会导致模型指出错误,然后崩溃。 Older XML file (which looks identical to the ones written by Controller) reads in fine. 较旧的XML文件(看起来与Controller编写的文件相同)可以很好地读取。

Below are examples of the files - ignore the content inside the elements please. 以下是文件示例-请忽略元素内的内容。 Some of you may say that the content might be causing the problem, but I've checked it again and again, and it is correct. 某些人可能会说内容可能是引起问题的原因,但是我已经一遍又一遍地检查了它,这是正确的。 And if the content mattered, why would the parser in Model fail at the very first character ('<' = '-17') when reading in the XmlDocument created file? 如果内容很重要,那么在读取XmlDocument创建的文件时,为什么Model中的解析器在第一个字符('<'='-17')处失败?

Older file: 旧文件:

<?xml version="1.0" encoding="UTF-8" ?> 
<Config>
<Mode value="false" Id="Modeflag" />
<Timestep OutputTimestep="Hourly"  CalibrationTimestep="Houry" />
<InitialInput SubCatchmentNumber="1" ModelCalibration="true" SnowSimulation="false" VegSimulation="Method 1" CatchmentNumber="1" FractionalCatchmentArea="1" />
<InputResource Name="All" Location="C:\AutoRun_Newest\AutoRun" Id="Directory" />
<SimulationScheme SchemeForCatchmentNo="8" Infiltration="true" ChannelRouting="false" Saturation="true" TopographicIndex="true" KDecayWithSoilDepthExp="false" SoilTopoIndex="false" KDecayInPower="true" />
<SnowInput InputCatchmentNumber="1" TempIndexMethod_Hourly="false" RadiationTempIndex_With_SnowInterception="true" EnergyBudgetMethod_With_SnowInterception="false" />
<SnowInputResource Name="All" Location="C:\AutoRun_Newest\AutoRun" Id="SnowDirectory" />
<OutputDirectory Location="C:\AutoRun_Newest\Inputs\Output_Timestamp_07012015215112" Name="Toronto_Output" />
</Config>

Newer file: 较新的文件:

<?xml version="1.0" encoding="UTF-8" ?>
<Config>
  <Mode value="false" Id="Modeflag" />
  <Timestep OutputTimestep="Hourly" CalibrationTimestep="Hourly" />
  <InitialInput SubCatchmentNumber="1" ModelCalibration="true" SnowSimulation="false" VegSimulation="Method 1" CatchmentNumber="1" FractionalCatchmentArea="1" />
  <InputResource Name="All" Location="C:\AutoRun_Newest\AutoRun" Id="Directory" />
  <SimulationScheme SchemeForCatchmentNo="8" Infiltration="true" ChannelRouting="false" Saturation="true" TopographicIndex="true" KDecayWithSoilDepthExp="false" SoilTopoIndex="false" KDecayInPower="true" />
  <SnowInput InputCatchmentNumber="1" TempIndexMethod_Hourly="false" RadiationTempIndex_With_SnowInterception="true" EnergyBudgetMethod_With_SnowInterception="false" />
  <SnowInputResource Name="All" Location="C:\AutoRun_Newest\AutoRun" Id="SnowDirectory" />
  <OutputDirectory Location="C:\AutoRun_Newest\Inputs\Output_Timestamp_07012015215112" Name="Toronto_Output" />
</Config>

Adding or taking away the indentation (proper formatting by the XmlDocument class in C#) changes nothing about the behavior of Model. 添加或删除缩进(C#中XmlDocument类的正确格式)不会改变Model的行为。

These files are visually identical, and I can see no odd characters or spacing. 这些文件在视觉上是相同的,并且看不到任何奇数字符或空格。 What invisible objects/forces/characters or other settings could be causing this new bug? 哪些不可见的对象/力/字符或其他设置可能导致此新错误?

Is there some background encoding that the XML document class enforces that is new to my home made parser? XML文档类强制执行某些后台编码,这对于我的自制解析器来说是新的吗?

You have a byte order mark (BOM) at the start of the file. 在文件的开头,您有一个字节顺序标记(BOM)。 https://en.wikipedia.org/wiki/Byte_order_mark https://en.wikipedia.org/wiki/Byte_order_mark

The BOM is Unicode character U+FEFF, or in UTF-8 the bytes 0xEF,0xBB,0xBF. BOM是Unicode字符U + FEFF,或者在UTF-8中为字节0xEF,0xBB,0xBF。 0xEF is -17 if you reinterpret it as a signed byte. 如果将其重新解释为带符号字节,则0xEF为-17。 Many Windows tools in particular will put a BOM at the start of the file if you save it. 如果保存,特别是许多Windows工具会将BOM表放在文件的开头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM