简体   繁体   English

字符编码,如何区分?

[英]Character encoding, how do I tell the difference?

Characters coming out of my database are encoded differently than the same characters written directly in the source. 来自数据库的字符的编码方式与直接写入源代码的相同字符的编码方式不同。 For exmaple, the word Permissões shows a different result when the string is written directly in the HTML, than when the string is output from a db record. 例如,当直接将字符串写入HTML中时,与从db记录中输出字符串时相比, Permissões一词显示出不同的结果。

# From the source
Addressable::URI.encode("Permissões.pdf") #=> "Permiss%C3%B5es.pdf"

# From the db
Addressable::URI.encode("Permissões.pdf") #=> "Permisso%CC%83es.pdf"

The encodings are different. 编码是不同的。 But my database is set to UTF-8 , and I am using HTML5. 但是我的数据库设置为UTF-8 ,并且我正在使用HTML5。 What could be causing this? 是什么原因造成的?

在此处输入图片说明

I am unable to download files I upload to S3 because of this issue. 由于此问题,我无法下载上传到S3的文件。 I tried to force the encoding attachment.path.encode("UTF-8") but that makes no diffrence. 我试图强制编码attachment.path.encode("UTF-8")但这没有什么区别。

To solve this, since I am using Rails, I used ActiveSupport::Multibyte::Unicode to normalize any unicode characters before they get inserted into the database. 为了解决这个问题,因为我使用的是Rails,所以我使用ActiveSupport::Multibyte::Unicode对所有Unicode字符进行规范化,然后再将它们插入数据库。

before_save do
  self.path = ActiveSupport::Multibyte::Unicode.normalize(path)
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM