将字符串中的非拉丁字符更改为拉丁字符

Question

I'm trying to match by regex in Ruby or in Javascript a string that contains non english characters. 我正在尝试通过Ruby或Javascript中的正则表达式来匹配包含非英文字符的字符串。

So is there a way to replace the string "täglichen" with the string "taglichen" ? 那么有没有办法用字符串“ taglichen”代替字符串“täglichen”？ I know that i can replace non english characters by options like: 我知道我可以用以下选项代替非英文字符：

/(?i)t[aä]glichen/

But for this i need dictionary of possible characters and set all of them in searched word. 但是为此，我需要包含可能字符的字典，并将所有字符设置为搜索到的单词。 Maybe there is a more efficient way to do this ? 也许有一种更有效的方法可以做到这一点？

Answer 1

There is a legit solution for modern ruby, using String#unicode_normalize 使用String#unicode_normalize有一种针对现代红宝石的合法解决方案

"täglichen".unicode_normalize(:nfd).
            codepoints.
            reject(&128.method(:<)).
            pack('U*')
#⇒ "taglichen"

To match: 匹配：

"täglichen".unicode_normalize(:nfc) =~ /t[aä]glichen/i
#⇒ 0

The normalization is needed because umlaut might be either a single codepoint 228 or a combined diacritics [97, 776] . 需要归一化是因为变音符号可能是单个代码点228或组合的变音符号[97, 776] 。 Check this (try to copy-paste into your REPL): 对此进行检查（尝试将其复制粘贴到您的REPL中）：

"ä" == "ä"
#⇒ false

Answer 2

One thing you can do is slugify your strings before matching ( https://www.npmjs.com/package/slugify ) 您可以做的一件事是在匹配之前对字符串进行束缚（ https://www.npmjs.com/package/slugify ）

Input: "Ich heiße Fred"
Output: "ich-heisse-fred"

If you don't like the - characters as separators you can change that, as stated by the docs 如果您不喜欢-字符作为分隔符，则可以按照docs的说明进行更改

将字符串中的非拉丁字符更改为拉丁字符

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-03-06 17:32:08

解决方案2
0 2018-03-06 16:09:21

将字符串中的非拉丁字符更改为拉丁字符

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-03-06 17:32:08

解决方案2 0 2018-03-06 16:09:21

解决方案1
1 已采纳 2018-03-06 17:32:08

解决方案2
0 2018-03-06 16:09:21