简体   繁体   English

Ruby on Rails:UTF-8编码字符串,其内容为%F1

[英]Ruby on Rails: UTF-8 encoding string that has %F1 in content

I'm struggling to find the right method in Rails that can convert UTF-8 codes to its displayable value. 我正在努力在Rails中找到可以将UTF-8代码转换为其可显示值的正确方法。

In my case, it's converting some user input like "John%20Da%F1e" to "John Dañe" if possible. 就我而言,它会将一些用户输入(例如“ John%20Da%F1e”)转换为“ JohnDañe”。

Currently, i have the following: 目前,我有以下内容:

unescaped_name = CGI::unescape(params[:name]) # this turns "John%20Da%F1e" into "John Da\xF1e"
@q = I18n.transliterate(unescaped_q) #this yields an 'invalid byte sequence in UTF-8' error

In essence, i'm trying to go from "John%20Da%F1e" (already encoded in UTF-8) to "John Dañe". 本质上,我正在尝试从“ John%20Da%F1e”(已经使用UTF-8编码)转到“ JohnDañe”。

One thing i've tried was 我尝试过的一件事是

.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')

but that replaces the ascii (% to \\x) to "John Dae". 但这会将“ ascii”(%到\\ x)替换为“ John Dae”。

You need to tell Ruby what the encoding of the parsed string should be. 您需要告诉Ruby解析字符串的编码应该是什么。 It looks like you are working in Latin-1 to start with ('ISO-8859-1'). 看来您使用的是Latin-1,开头是('ISO-8859-1')。 There are a few different options. 有几种不同的选择。 If you want to limit this decision to just the string you are processing, you can use .force_encoding like this 如果要将此决定限制为仅处理的字符串,可以使用.force_encoding这样

require 'cgi'
unescaped_name = CGI::unescape( "John%20Da%F1e" ).force_encoding('ISO-8859-1')
#  => "John Da\xF1e"
unescaped_name.encode('UTF-8')
#  => "John Dañe"

Note that once the encoding is set up correctly, it already contains the correct characters, but you won't necessarily see that until you convert it to an encoding that you can display. 请注意,一旦正确设置了编码,它就已经包含正确的字符,但是在将其转换为可以显示的编码之前,您不一定会看到它。 So where I show "John Da\\xF1e" that's only because my terminal is set to display UTF-8 - \\xF1 is the byte for ñ in Latin-1 encoding. 因此,在我显示"John Da\\xF1e" ,仅是因为我的终端设置为显示UTF-8- \\xF1是Latin-1编码中ñ的字节。


As far as I can tell, the URI encoding for UTF-8 bytes of the same string in a single step looks like this: 据我所知,单个步骤中同一字符串的UTF-8字节的URI编码如下所示:

"John%20Da%C3%B1e"
CGI::unescape( "John%20Da%C3%B1e" )
#  => "John Dañe"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM