简体   繁体   English

如何为Rails 3将文件编码为UTF-8?

[英]How do I encode files to UTF-8 for Rails 3?

I've been working on outlook imports (linked in exports to outlook format) but I'm having troubles with encoding. 我一直在研究outlook导入(导出与outlook格式相关联),但我遇到了编码问题。 The outlook format CSV I get from exporting my LinkedIn contacts are not in UTF-8. 我从导出LinkedIn联系人获得的Outlook格式CSV不是UTF-8。 Letters like ñ cause an exception in the mongoid_search gem when calling str.to_s.mb_chars.normalize . 像字母ñ在导致异常mongoid_search打电话时宝石str.to_s.mb_chars.normalize I think encoding is the issue, because when I call mb_chars ( see first code example ). 我认为编码是个问题,因为当我调用mb_chars参见第一个代码示例 )。 I am not sure if this is a bug in the gem, but I was advised to sanitize the data nonetheless. 我不确定这是否是宝石中的错误,但我仍被建议对数据进行消毒。

From File Picker, I tried using their new, community-supported gem to upload CSV data. 从File Picker,我尝试使用他们新的社区支持的gem来上传CSV数据。 I tried three encoding detectors and transcoders: 我尝试了三种编码检测器和转码器:

  1. Ruby port of a Python lib chardet Python lib chardet Ruby端口
    • Didn't work as expected 没有按预期工作
    • The port still contained Python code, preventing it from running in my app 该端口仍然包含Python代码,阻止它在我的应用程序中运行
  2. rchardet19 gem rchardet19宝石
    • Detected iso-8859 with .8/1 confidence. 检测到iso-8859 ,置信度为.8/1
    • Tried to transcode with Iconv, but crashed on "illegal characters" at ñ 试图用Iconv转码,但在ñ “非法人物”上坠毁
  3. Charlock_Holmes gem Charlock_Holmes宝石
    • Detected windows-1252 with 33/100 confidence 检测到windows-125233/100度为33/100
    • I assume that's the actual encoding, and rchardet got iso-8859 because this ones based of that. 我假设这是实际编码,而rchardet得到了iso-8859因为这是基于此的。
    • This gem uses ICU and has a maintained branch "bundle-icu" which supports Heroku. 这个宝石使用ICU,并有一个支持Heroku的维护分支“bundle-icu”。 When I try to transcode using charlock , I get the error U_FILE_ACCESS_ERROR , an ICU error code meaning "could not open file" 当我尝试使用charlock进行转码时,我收到错误U_FILE_ACCESS_ERROR ,ICU错误代码意味着“无法打开文件”

Anybody know what to do here? 有人知道该怎么办吗?

Ruby 1.9 has encoding built in, have you tried: Ruby 1.9内置了编码,你尝试过:

s.force_encoding 'utf-8'

mb_chars is a wrapper for ruby 1.8, so you shouldn't need it. mb_chars是ruby 1.8的包装器,所以你不需要它。

See duplicate 看到重复

how to convert character encoding with ruby 1.9 如何使用ruby 1.9转换字​​符编码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM