如何在Ruby 1.9中为unicode西里尔字符指定Regexp

Question

#coding: utf-8
str2 = "asdfМикимаус"
p str2.encoding #<Encoding:UTF-8> 
p str2.scan /\p{Cyrillic}/ #found all cyrillic characters
str2.gsub!(/\w/u,'') #removes only latin characters
puts str2

The question is why \\w ignore cyrillic characters? 问题是为什么\\w忽略西里尔字符？

I have installed latest ruby package from http://rubyinstaller.org/ . 我已经从http://rubyinstaller.org/安装了最新的ruby软件包。 Here is my output of ruby -v 这是我的ruby -v输出

ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]

As far as i know 1.9 oniguruma regular expression library has full support for unicode characters. 据我所知1.9 oniguruma正则表达式库完全支持unicode字符。

Answer 1

This is as specified in the Ruby documentation : \\w is equivalent to [a-zA-Z0-9_] and thus doesn't target any unicode character. 这是在Ruby文档中指定的： \\w等同于[a-zA-Z0-9_] ，因此不针对任何unicode字符。

You probably want to use [[:alnum:]] instead, which includes all unicode alphabetic and numeric characters. 您可能希望使用[[:alnum:]] ，其中包括所有unicode字母和数字字符。 Check also [[:word:]] and [[:alpha:]] . 还要检查[[:word:]]和[[:alpha:]] 。

如何在Ruby 1.9中为unicode西里尔字符指定Regexp

问题描述

1 个解决方案

解决方案1
11 已采纳 2010-04-27 17:26:30

如何在Ruby 1.9中为unicode西里尔字符指定Regexp

问题描述

1 个解决方案

解决方案1 11 已采纳 2010-04-27 17:26:30

解决方案1
11 已采纳 2010-04-27 17:26:30