简体   繁体   English

从字符串中删除重音符号/变音符号,同时保留其他特殊字符(尝试过mb_chars.normalize和iconv)

[英]Removing accents/diacritics from string while preserving other special chars (tried mb_chars.normalize and iconv)

There is a very similar question already. 有一个非常类似的问题 One of the solutions uses code like this one: 其中一个解决方案使用如下代码:

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

Which works wonders, until you notice it also removes spaces, dots, dashes, and who knows what else. 这会产生奇迹,直到你注意到它还会移除空格,圆点,破折号以及谁知道还有什么。

I'm not really sure how the first code works, but could it be made to strip only accents? 我真的不知道第一个代码是如何工作的,但可以把它进行剥离口音? Or at the very least be given a list of chars to preserve? 或至少给出一个保留的字符列表? My knowledge of regexps is small, but I tried (to no avail): 我对regexps的了解很少,但我尝试过(无济于事):

/[^\-x00-\x7F]/n # So it would leave the dash alone

I'm about to do something like this: 我要做这样的事情:

string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
  (/[^x00-\x7F]/n, '').gsub('__DASH__', '-').to_s

Atrocious? 残暴? Yes... 是...

I've also tried: 我也尝试过:

iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"

Help please? 请帮助?

it also removes spaces, dots, dashes, and who knows what else. 它还可以删除空格,圆点,短划线以及谁知道还有什么。

It shouldn't. 它不应该。

string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s

You've mistyped, there should be a backslash before the x00, to refer to the NUL character. 您输入错误,在x00之前应该有反斜杠,以引用NUL字符。

/[^\-x00-\x7F]/n # So it would leave the dash alone

You've put the '-' between the '\\' and the 'x', which will break the reference to the null character, and thus break the range. 你在'\\'和'x'之间放了' - ',这将打破对空字符的引用,从而打破范围。

It's not as neat as Iconv, but does what I think you want: 它不像Iconv那样整洁,但做了我认为你想要的东西:

http://snippets.dzone.com/posts/show/2384 http://snippets.dzone.com/posts/show/2384

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM