简体   繁体   English

ruby 1.8.7为什么.to_yaml将一些字符串转换为不可读的字节

[英]ruby 1.8.7 why .to_yaml converts some Strings to non-readable bytes

Parsing some webpages with nokogiri, i've got some issues while cleaning some Strings and saving them with YAML. 用nokogiri解析一些网页,我在清理一些字符串并用YAML保存时遇到了一些问题。 To reproduce the problem look at this IRB session that reproduces the same problem: 要重现此问题,请查看再现相同问题的IRB会话:

irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> "1,000 €".to_yaml
=> "--- !binary |\nMSwwMDAg4oKs\n\n"
irb(main):003:0> "1,0000 €".to_yaml
=> "--- \"1,0000 \\xE2\\x82\\xAC\"\n"
irb(main):004:0> "1,00 €".to_yaml
=> "--- !binary |\nMSwwMCDigqw=\n\n"
irb(main):005:0> "1 €".to_yaml
=> "--- !binary |\nMSDigqw=\n\n"
irb(main):006:0> "23 €".to_yaml
=> "--- !binary |\nMjMg4oKs\n\n"
irb(main):007:0> "12000 €".to_yaml
=> "--- !binary |\nMTIwMDAg4oKs\n\n"
irb(main):008:0> "1200000 €".to_yaml
=> "--- \"1200000 \\xE2\\x82\\xAC\"\n"
irb(main):009:0> "120000 €".to_yaml
=> "--- \"120000 \\xE2\\x82\\xAC\"\n"
irb(main):010:0> "12000 €".to_yaml
=> "--- !binary |\nMTIwMDAg4oKs\n\n"

To sum up, sometimes .to_yaml outputs are readable while other times the output is unreadable. 总而言之,有时.to_yaml输出是可读的,而其他时候输出是不可读的。 The most intriguing aspect is that the strings are very similar. 最有趣的方面是字符串非常相似。

How can I avoid those !binary ... outputs? 我怎么能避免那些!二进制......输出?

Whether YAML prefers to dump a string as text or binary is a matter of ratio between ASCII and non ASCII characters. YAML是否倾向于将字符串转储为文本或二进制是ASCII和非ASCII字符之间的比例问题。

If you want to avoid !binary as much as possible, you should use the ya2yaml gem. 如果你想尽可能避免使用!binary ,你应该使用ya2yaml gem。 It tries hard to dump strings as ASCII + escaped UTF-8. 它努力将字符串转储为ASCII +转义的UTF-8。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM