使用ruby gsub和regexp更智能地替换字符

Question

I'm trying to create permalink like behavior for some article titles and i don't want to add a new db field for permalink. 我正在尝试为某些文章标题创建类似行为的永久链接，我不想为永久链接添加新的数据库字段。 So i decided to write a helper that will convert my article title from: 所以我决定写一个帮助器来转换我的文章标题：

" O "focoasă" a pornit cruciada, împotriva bărbaţilor zgârciţi " to " o-focoasa-a-pornit-cruciada-impotriva-barbatilor-zgarciti ". “ O”focoasă“a pornit cruciada，împotrivabărbaţilorzgârciţi ”to“ o-focoasa-a-pornit-cruciada-impotriva-barbatilor-zgarciti ”。

While i figured out how to replace spaces with hyphens and remove other special characters (other than -) using: 虽然我想出了如何用连字符替换空格并删除其他特殊字符（除了 - ）使用：

title.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase

I am wondering if there is any other way to replace a character with a specific other character from only one .gsub method call, so I won't have to chain title.gsub("ă", "a") methods for all the UTF-8 special characters of my localization. 我想知道是否有任何其他方法可以从一个.gsub方法调用替换一个特定的其他字符的字符，所以我不必为所有的链接title.gsub（“ă”，“a”）方法我本地化的UTF-8特殊字符。

I was thinking of building a hash with all the special characters and their counterparts but I haven't figured out yet how to use variables with regexps. 我正在考虑用所有特殊字符和它们的对应物构建一个哈希，但我还没有弄清楚如何使用regexp的变量。

What I was looking for is something like: 我在寻找的是：

title.gsub(/\s/, "-").gsub(*replace character goes here*).gsub(/[^\w-]/, '').downcase

Thanks! 谢谢！

Answer 1

I solved this in my application by using the Unidecoder gem: 我在我的应用程序中使用Unidecoder gem解决了这个问题：

require 'unidecode'

def uninternationalize(str)
  Unidecoder.decode(str).gsub("[?]", "").gsub(/`/, "'").strip
end

Answer 2

If you want to only transliterate from one character to another, you can use the String#tr method which does exactly the same thing as the Unix tr command: replace every character in the first list with the character in the same position in the second list: 如果你只想从一个字符音译到另一个字符，你可以使用String#tr方法，它与Unix tr命令完全相同：用第二个列表中相同位置的字符替换第一个列表中的每个字符：

'Ünicöde'.tr('ÄäÖöÜüß', 'AaOoUus') # => "Unicode"

However, I agree with @Daniel Vandersluis: it would probably be a good idea to use some more specialized library. 但是，我同意@Daniel Vandersluis：使用一些更专业的库可能是个好主意。 Stuff like this can get really tedious, really fast. 像这样的东西可以变得非常乏味，非常快。 Also, a lot of those characters actually have standardized transliterations (ä → ae, ö → oe, ..., ß → ss), and users may be expecting to have the transliterations be correct (I certainly don't like being called Jorg – if you really must, you may call me Joerg but I very much prefer Jörg) and if you have a library that provides you with those transliterations, why not use them? 此外，很多这些角色实际上都有标准化的音译（ä→ae，ö→oe，...，ß→ss），用户可能期望音译正确（我当然不喜欢被称为Jorg） - 如果你真的必须，你可以叫我Joerg，但我更喜欢Jörg）如果你有一个图书馆为你提供这些音译，为什么不使用它们呢？ Note that there are a lot of transliterations which are not single characters and thus can't be used with String#tr anyway. 请注意，有很多音译不是单个字符，因此无论如何都不能与String#tr一起使用。

使用ruby gsub和regexp更智能地替换字符

问题描述

2 个解决方案

解决方案1
5 已采纳 2010-04-20 19:55:09

解决方案2
4 2010-04-20 21:43:48

使用ruby gsub和regexp更智能地替换字符

问题描述

2 个解决方案

解决方案1 5 已采纳 2010-04-20 19:55:09

解决方案2 4 2010-04-20 21:43:48

解决方案1
5 已采纳 2010-04-20 19:55:09

解决方案2
4 2010-04-20 21:43:48