I'm struggling to find the right method in Rails that can convert UTF-8 codes to its displayable value.
In my case, it's converting some user input like "John%20Da%F1e" to "John Dañe" if possible.
Currently, i have the following:
unescaped_name = CGI::unescape(params[:name]) # this turns "John%20Da%F1e" into "John Da\xF1e"
@q = I18n.transliterate(unescaped_q) #this yields an 'invalid byte sequence in UTF-8' error
In essence, i'm trying to go from "John%20Da%F1e" (already encoded in UTF-8) to "John Dañe".
One thing i've tried was
.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
but that replaces the ascii (% to \\x) to "John Dae".
You need to tell Ruby what the encoding of the parsed string should be. It looks like you are working in Latin-1 to start with ('ISO-8859-1'). There are a few different options. If you want to limit this decision to just the string you are processing, you can use .force_encoding
like this
require 'cgi'
unescaped_name = CGI::unescape( "John%20Da%F1e" ).force_encoding('ISO-8859-1')
# => "John Da\xF1e"
unescaped_name.encode('UTF-8')
# => "John Dañe"
Note that once the encoding is set up correctly, it already contains the correct characters, but you won't necessarily see that until you convert it to an encoding that you can display. So where I show "John Da\\xF1e"
that's only because my terminal is set to display UTF-8 - \\xF1
is the byte for ñ
in Latin-1 encoding.
As far as I can tell, the URI encoding for UTF-8 bytes of the same string in a single step looks like this:
"John%20Da%C3%B1e"
CGI::unescape( "John%20Da%C3%B1e" )
# => "John Dañe"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.