简体   繁体   中英

Two equal XML with differences when reading the file on Ruby on Rails

I have to parse XML files coming from two different software. One of the files failed on the parsing process. So I started debugging the problem and I come to the point where I copied the "good file" content and paste it to the "bad file". But the error persisted! I also paste the "bad file" content into the good one, and everything worked!

I think that this has to do with some encoding problem.

If an XML file has no encoding declared, is there some metadata that I could be missing?

The output when I read the file on ruby

File.read(Rails.root.join('bad-file.xml'))

\xFF\xFE<\u0000f\u0000i\u0000l\u0000e\u0000>\u0000\r\u0000<\u0000A\u0000L\u0000L\u0000_\u0000I\u0000N\u0000S\u0000T\u0000A\u0000N\u0000C\u0000E\u0000S\u0000>\u0000\r\u0000\r\u0000<\u0000i\u0000n\u0000s\u0000t\u0000a\u0000n\u0000c\u0000e\u0000>\u0000\r\u0000<\u0000I\u0000D\u0000>\u00009\u00005\u00003\u0000<\u0000/\u0000I\u0000D\u0000>\u0000\r\u0000<\u0000s\u0000t\u0000a\u0000r\u0000t\u0000>\u00005\u00000\u00005\u00009\u0000.\u00002\u00006\u00002\u00002\u00000\u00001....

File.read(Rails.root.join('good-file.xml'))

<file>\r\n<ALL_INSTANCES>\r\n\r\n<instance>\r\n<ID>953</ID>\r\n<start>5059.2622016567</start>\r\n<end>5060.2622016567</end>\r\n<code>timer-1sec</code>\r\n<label>\r\n<group>result</group>\r\n<text>Dabang Eindringen SK</text>\r\n</label>\r\n</instance>\r\n</ALL_INSTANCES>\r\n\r\n<ROWS>\r\n<row>\r\n<code>timer-1sec</code>\r\n<R>0</R>\r\n<G>0</G>\r\n<B>0</B>\r\n</row>\r\n</ROWS>\r\n</file>

Those first 2 bytes \\xFF\\xFE are a unicode byte order mark - they signify that the rest of the data is UTF16, in little endian order

If you do

File.read(path, mode: 'r:UTF-16LE')

Then the external encoding for the file will be set to that. The data is transcoded to the default internal encoding before being returned. You can force that to utf-8 by doing

File.read(path, mode: 'r:UTF-16LE:UTF-8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM