简体   繁体   中英

Ruby regexp handling of nbsp

In ruby 1.9.3 the regex engine doesn't treat nbsp's (\ ) as a space (\\s). This is often a bummer for me.

So my question is, will this change in 2.0? If not, is there any way to monkey patch a solution?

Use Unicode properties (you need to declare a matching source code encoding for this to work):

# encoding=utf-8 
if subject ~= /\p{Z}/
    # subject contains whitespace or other separators

or use POSIX character classes:

if subject ~= /[[:space:]]/

According to the docs , \\s will only match [ \\t\\r\\n\\f] now and in the future.

In Ruby, I recommend using the Unicode character class of " Space separators " \\p{Zs} :

/\p{Zs}/u  =~  "\xC2\xA0"
/\p{Zs}/u  =~  "\u00A0"
/\p{Zs}/u  =~  HTMLEntities.new.decode(' ')

See the Ruby-documentation for more Unicode character properties .

Note: Make sure, that your input-string is valid UTF-8 encoding. There are non-breaking spaces in other encodings too, eg "\\xA0" in ISO-8859-1 (Latin1). More info on the "non-breaking space" .

FYI: In most RegExp flavors and programming languages that support Unicode, character class \\s usually includes all characters from the Unicode "separator" property \\p{Z} (as mentioned by Tim Pietcker); However, Java and Ruby are popular exceptions here and \\s only matches [ \\t\\r\\n\\f] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM