[英]how to remove whitespace but not utf-8 character in ruby
I want to prevent users to write an empty comment (whitespaces,
, etc.). 我想防止用户写空评论(空格,
等)。 so I apply the following: 所以我应用以下内容:
var.gsub(/^\s+|\s+\z|\s* \s*/.'')
However, then a smart user find a hole by using \\302
or \\240
unicode characters so I filtered out these characters too. 但是,然后一个聪明的用户通过使用
\\302
或\\240
unicode字符找到了一个漏洞,因此我也过滤掉了这些字符。
Then I ran into problem as I introduced several languages support, then a word like Déjà vu
becomes an error. 然后在介绍几种语言支持时遇到了问题,然后像
Déjà vu
这样的词就变成了错误。 because part of the à
character contains \\240
. 因为
à
字符的一部分包含\\240
。 is there any way to remove the whitespaces but leave the latin characters untouched? 有什么方法可以删除空白但不影响拉丁字符?
A way around this is to use iconv
to discard the invalid unicode characters (such as \\230
on its own) before using your regexp to remove the whitespaces: 一种解决方法是在使用正则表达式删除空白之前,使用
iconv
丢弃无效的unicode字符(例如\\230
本身):
require 'iconv'
var1 = "Déjà vu"
var2 = "\240"
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid1 = ic.iconv(var1) # => "D\303\251j\303\240 vu"
valid2 = ic.iconv(var2) # => ""
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.