[英]How to specify Regexp for unicode cyrillic characters in Ruby 1.9
#coding: utf-8
str2 = "asdfМикимаус"
p str2.encoding #<Encoding:UTF-8>
p str2.scan /\p{Cyrillic}/ #found all cyrillic characters
str2.gsub!(/\w/u,'') #removes only latin characters
puts str2
The question is why \\w
ignore cyrillic characters? 问题是为什么\\w
忽略西里尔字符?
I have installed latest ruby package from http://rubyinstaller.org/ . 我已经从http://rubyinstaller.org/安装了最新的ruby软件包。 Here is my output of ruby -v
这是我的ruby -v
输出
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]
As far as i know 1.9 oniguruma regular expression library has full support for unicode characters. 据我所知1.9 oniguruma正则表达式库完全支持unicode字符。
This is as specified in the Ruby documentation : \\w
is equivalent to [a-zA-Z0-9_]
and thus doesn't target any unicode character. 这是在Ruby文档中指定的: \\w
等同于[a-zA-Z0-9_]
,因此不针对任何unicode字符。
You probably want to use [[:alnum:]]
instead, which includes all unicode alphabetic and numeric characters. 您可能希望使用[[:alnum:]]
,其中包括所有unicode字母和数字字符。 Check also [[:word:]]
and [[:alpha:]]
. 还要检查[[:word:]]
和[[:alpha:]]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.