简体   繁体   中英

How to replace multibyte characters in ruby using gsub?

I have a problem with saving records in MongoDB using Mongoid when they contain multibyte characters. This is the string:

a="Chris \xA5\xEB\xAE\xDFe\xA5"

I first convert it to BINARY and I then gsub it like this:

a.force_encoding("BINARY").gsub(0xA5.chr,"oo")

...which works fine:

=> "Chris oo\xEB\xAE\xDFeoo"

But it seems that I can not use the chr method if I use Regexp :

a.force_encoding("BINARY").gsub(/0x....?/.chr,"")
NoMethodError: undefined method `chr' for /0x....?/:Regexp

Anybody with the same issue?

Thanks a lot...

You can do that with interpolation

a.force_encoding("BINARY").gsub(/#{0xA5.chr}/,"") 

gives

"Chris \xEB\xAE\xDFe"

EDIT: based on the comments, here a version that translates the binary encode string to an ascii representation and do a regex on that string

a.unpack('A*').to_s.gsub(/\\x[A-F0-9]{2}/,"")[2..-3] #=>"Chris "

the [2..-3] at the end is to get rid of the beginning [" and and trailing "]

NOTE: to just get rid of the special characters you also could just use

a.gsub(/\W/,"") #=> "Chris"

The actual string does not contain the literal characters \\xA5: that is just how characters that would otherwise be unprintable are shown to you (similar when a string contains a newline ruby shows you \\n).

If you want to change any non ascii stuff you could do this

a="Chris \xA5\xEB\xAE\xDFe\xA5"
a.force_encoding('BINARY').encode('ASCII', :invalid => :replace, :undef => :replace, :replace => 'oo')

This starts by forcing the string to the binary encoding (you always want to start with a string where the bytes are valid for its encoding. binary is always valid since it can contain arbitrary bytes). Then it converts it to ASCII. Normally this would raise an error since there are characters that it doesn't know what to do with but the extra options we've passed tell it to replace invalid/undefined sequences with the characters 'oo'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM