Ruby on Rails: UTF-8 encoding string that has %F1 in content

Question

I'm struggling to find the right method in Rails that can convert UTF-8 codes to its displayable value.

In my case, it's converting some user input like "John%20Da%F1e" to "John Dañe" if possible.

Currently, i have the following:

unescaped_name = CGI::unescape(params[:name]) # this turns "John%20Da%F1e" into "John Da\xF1e"
@q = I18n.transliterate(unescaped_q) #this yields an 'invalid byte sequence in UTF-8' error

In essence, i'm trying to go from "John%20Da%F1e" (already encoded in UTF-8) to "John Dañe".

One thing i've tried was

.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')

but that replaces the ascii (% to \\x) to "John Dae".

Answer 1

You need to tell Ruby what the encoding of the parsed string should be. It looks like you are working in Latin-1 to start with ('ISO-8859-1'). There are a few different options. If you want to limit this decision to just the string you are processing, you can use .force_encoding like this

require 'cgi'
unescaped_name = CGI::unescape( "John%20Da%F1e" ).force_encoding('ISO-8859-1')
#  => "John Da\xF1e"
unescaped_name.encode('UTF-8')
#  => "John Dañe"

Note that once the encoding is set up correctly, it already contains the correct characters, but you won't necessarily see that until you convert it to an encoding that you can display. So where I show "John Da\\xF1e" that's only because my terminal is set to display UTF-8 - \\xF1 is the byte for ñ in Latin-1 encoding.

As far as I can tell, the URI encoding for UTF-8 bytes of the same string in a single step looks like this:

"John%20Da%C3%B1e"
CGI::unescape( "John%20Da%C3%B1e" )
#  => "John Dañe"

Ruby on Rails: UTF-8 encoding string that has %F1 in content

Question

1 answers

solution1
2 ACCPTED 2014-02-11 23:00:17

Ruby on Rails: UTF-8 encoding string that has %F1 in content

Question

1 answers

solution1 2 ACCPTED 2014-02-11 23:00:17

solution1
2 ACCPTED 2014-02-11 23:00:17