[英]Ruby Character Encoding Confusion When Reading Same File In Different Environments
I have a Rails application that accepts file uploads of CSV files. 我有一个Rails应用程序,可以接受CSV文件的文件上传。 When developing the feature locally on my Mac, I received an "invalid byte sequence in UTF-8" error when trying to parse the uploaded file (using Ruby's standard library CSV).
在Mac上本地开发功能时,尝试解析上传的文件时(使用Ruby的标准库CSV),我收到“ UTF-8中无效的字节序列”错误。
So after doing some research and reading some answers to similar questions on StackOverflow, I tried using a gem to sniff out the character encoding (namely CharDet), and then when opening the file via the CSV library, I would specify the encoding. 因此,在做完一些研究并阅读了关于StackOverflow上类似问题的一些答案之后,我尝试使用gem来嗅探字符编码(即CharDet),然后在通过CSV库打开文件时,我将指定编码。 And this solved all my problems, and life was good.
这解决了我所有的问题,生活很美好。
content = File.read(fullpath)
self.file_encoding = CharDet.detect(content)['encoding']
CSV.table(fullpath, :encoding => file_encoding, :header_converters => :downcase).headers
But then I deployed this code to the production Linux environment, and again with the "invalid byte sequence in UTF-8" errors. 但是随后,我将此代码部署到了生产Linux环境,并再次出现了“ UTF-8中无效的字节序列”错误。 What a mystery (to me anyway)!
真是个谜(无论如何对我来说)! After quite some time trying to resolve the error, I tried removing the code that specified the encoding upon opening the file.
经过一段时间的尝试来解决该错误之后,我尝试在打开文件时删除指定编码的代码。 And miraculously it fixed the problem on production, but now local Mac development is broken.
奇迹般地,它解决了生产中的问题,但是现在本地Mac的开发已中断。
Keep in mind, that in both cases I'm uploading the same file using the same browser. 请记住,在两种情况下,我都是使用相同的浏览器上传相同的文件。 Does anyone have any insight on what is going on here?
有人对这里发生的事情有任何见解吗?
By the way, versions of ruby are close, but not the same. 顺便说一句,红宝石的版本很接近,但并不相同。 The Mac is ruby 1.9.3-p0 , and the Linux server is 1.9.2-p180 .
Mac是ruby 1.9.3-p0 ,而Linux服务器是1.9.2-p180 。 The app is Rails 3.2.6 .
该应用程序是Rails 3.2.6 。
A few thoughts: 一些想法:
I'm not aware of any differences in behavior with regard to encoding between 1.9.2 and 1.9.3, but I haven't specifically researched it either. 我不知道1.9.2和1.9.3之间在编码方面的行为差异,但是我也没有专门研究它。 It could also be a difference in the configuration of the MRI build.
MRI版本的配置也可能有所不同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.