简体   繁体   English

如何在 Ruby 中将非 Unicode 字符串与正则表达式匹配?

[英]How to match non-Unicode string with regexp in Ruby?

I wanna to match string that contain \\xa0 , Like:我想匹配包含\\xa0字符串,例如:

"\xa0" =~ /\xa0/

But error will throw with:但是错误会抛出:

SyntaxError: (eval):2: invalid multibyte escape: /\xa0/

I am try to use Unicode to match:我尝试使用 Unicode 来匹配:

"\xa0" =~ /\u00a0/

error will throw too:错误也会抛出:

ArgumentError: invalid byte sequence in UTF-8

So, how to match \\xa0 in ruby那么,如何在 ruby​​ 中匹配 \\xa0

Not every byte sequence is a valid Unicode string.并非每个字节序列都是有效的 Unicode 字符串。 (or more specifically UTF-8) (或更具体地说是 UTF-8)

Your single-byte string for example is not:例如,您的单字节字符串不是:

str = "\xa0"

str.encoding        #=> #<Encoding:UTF-8>
str.valid_encoding? #=> false
str.codepoints      #   ArgumentError (invalid byte sequence in UTF-8)

To work with an arbitrary string, you have set its encoding to binary / ASCII:要使用任意字符串,您已将其编码设置为二进制/ASCII:

str = "\xa0".b      # <-- note the .b

str.encoding        #=> #<Encoding:ASCII-8BIT>
str.valid_encoding? #=> true
str.codepoints      #=> [160]

and also set the regexp encoding to ASCII: (via the n modifier)并将正则表达式编码设置为 ASCII:(通过n修饰符)

str =~ /\xa0/n
#=> 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM