[英]How to match non-Unicode string with regexp in Ruby?
I wanna to match string that contain \\xa0
, Like:我想匹配包含
\\xa0
字符串,例如:
"\xa0" =~ /\xa0/
But error will throw with:但是错误会抛出:
SyntaxError: (eval):2: invalid multibyte escape: /\xa0/
I am try to use Unicode to match:我尝试使用 Unicode 来匹配:
"\xa0" =~ /\u00a0/
error will throw too:错误也会抛出:
ArgumentError: invalid byte sequence in UTF-8
So, how to match \\xa0 in ruby那么,如何在 ruby 中匹配 \\xa0
Not every byte sequence is a valid Unicode string.并非每个字节序列都是有效的 Unicode 字符串。 (or more specifically UTF-8)
(或更具体地说是 UTF-8)
Your single-byte string for example is not:例如,您的单字节字符串不是:
str = "\xa0"
str.encoding #=> #<Encoding:UTF-8>
str.valid_encoding? #=> false
str.codepoints # ArgumentError (invalid byte sequence in UTF-8)
To work with an arbitrary string, you have set its encoding to binary / ASCII:要使用任意字符串,您已将其编码设置为二进制/ASCII:
str = "\xa0".b # <-- note the .b
str.encoding #=> #<Encoding:ASCII-8BIT>
str.valid_encoding? #=> true
str.codepoints #=> [160]
and also set the regexp encoding to ASCII: (via the n
modifier)并将正则表达式编码设置为 ASCII:(通过
n
修饰符)
str =~ /\xa0/n
#=> 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.