简体   繁体   English

使用ruby gsub和regexp更智能地替换字符

[英]smarter character replacement using ruby gsub and regexp

I'm trying to create permalink like behavior for some article titles and i don't want to add a new db field for permalink. 我正在尝试为某些文章标题创建类似行为的永久链接,我不想为永久链接添加新的数据库字段。 So i decided to write a helper that will convert my article title from: 所以我决定写一个帮助器来转换我的文章标题:

" O "focoasă" a pornit cruciada, împotriva bărbaţilor zgârciţi " to " o-focoasa-a-pornit-cruciada-impotriva-barbatilor-zgarciti ". O”focoasă“a pornit cruciada,împotrivabărbaţilorzgârciţi ”to“ o-focoasa-a-pornit-cruciada-impotriva-barbatilor-zgarciti ”。

While i figured out how to replace spaces with hyphens and remove other special characters (other than -) using: 虽然我想出了如何用连字符替换空格并删除其他特殊字符(除了 - )使用:

title.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase

I am wondering if there is any other way to replace a character with a specific other character from only one .gsub method call, so I won't have to chain title.gsub("ă", "a") methods for all the UTF-8 special characters of my localization. 我想知道是否有任何其他方法可以从一个.gsub方法调用替换一个特定的其他字符的字符,所以我不必为所有的链接title.gsub(“ă”,“a”)方法我本地化的UTF-8特殊字符。

I was thinking of building a hash with all the special characters and their counterparts but I haven't figured out yet how to use variables with regexps. 我正在考虑用所有特殊字符和它们的对应物构建一个哈希,但我还没有弄清楚如何使用regexp的变量。

What I was looking for is something like: 我在寻找的是:

title.gsub(/\s/, "-").gsub(*replace character goes here*).gsub(/[^\w-]/, '').downcase

Thanks! 谢谢!

I solved this in my application by using the Unidecoder gem: 我在我的应用程序中使用Unidecoder gem解决了这个问题:

require 'unidecode'

def uninternationalize(str)
  Unidecoder.decode(str).gsub("[?]", "").gsub(/`/, "'").strip
end

If you want to only transliterate from one character to another, you can use the String#tr method which does exactly the same thing as the Unix tr command: replace every character in the first list with the character in the same position in the second list: 如果你只想从一个字符音译到另一个字符,你可以使用String#tr方法,它与Unix tr命令完全相同:用第二个列表中相同位置的字符替换第一个列表中的每个字符:

'Ünicöde'.tr('ÄäÖöÜüß', 'AaOoUus') # => "Unicode"

However, I agree with @Daniel Vandersluis: it would probably be a good idea to use some more specialized library. 但是,我同意@Daniel Vandersluis:使用一些更专业的库可能是个好主意。 Stuff like this can get really tedious, really fast. 像这样的东西可以变得非常乏味,非常快。 Also, a lot of those characters actually have standardized transliterations (ä → ae, ö → oe, ..., ß → ss), and users may be expecting to have the transliterations be correct (I certainly don't like being called Jorg – if you really must, you may call me Joerg but I very much prefer Jörg) and if you have a library that provides you with those transliterations, why not use them? 此外,很多这些角色实际上都有标准化的音译(ä→ae,ö→oe,...,ß→ss),用户可能期望音译正确(我当然不喜欢被称为Jorg) - 如果你真的必须,你可以叫我Joerg,但我更喜欢Jörg)如果你有一个图书馆为你提供这些音译,为什么不使用它们呢? Note that there are a lot of transliterations which are not single characters and thus can't be used with String#tr anyway. 请注意,有很多音译不是单个字符,因此无论如何都不能与String#tr一起使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM