简体   繁体   English

将字符串中的西里尔字母音译为 Ruby 中的拉丁语?

[英]Transliterate cyrillic symbols in string into latin in Ruby?

How do I transliterate Cyrillic symbols in string into Latin in Ruby?如何在 Ruby 中将字符串中的西里尔字母音译为拉丁语? I can't find any docs on that.我找不到任何关于它的文档。 I thought there should be some standard function for that.我认为应该有一些标准功能。

You can use the translit gem :您可以使用translit gem

require 'translit'

str = "Кириллица"
Translit.convert(str, :english)
#=> "Kirillica"

The most mature gem for working with Cyrillic/Russian is https://github.com/yaroslav/russian/使用 Cyrillic/Russian 的最成熟的 gem 是https://github.com/yaroslav/russian/

It also supports transliteration, alongside with many other services:它还支持音译以及许多其他服务:

require 'russian'
# => true
Russian.translit('Транслит, english letters untouched')
# => "Translit, english letters untouched"

It also provides pluralisation, dates formatting, Rails i18n integration and many other goodies.它还提供复数、日期格式、Rails i18n 集成和许多其他好处。

Disclaimer: I'm not in any sense affilated with the gem, just happy user.免责声明:我与宝石没有任何关系,只是快乐的用户。

There's a gem for that.有一个宝石。 I haven't tried it but it sounds promising...我还没有尝试过,但听起来很有希望......

https://github.com/dalibor/cyrillizer https://github.com/dalibor/cyrillizer

def transliterate cyrillic_string

    ru = { 'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', \
    'е' => 'e', 'ё' => 'e', 'ж' => 'j', 'з' => 'z', 'и' => 'i', \
    'к' => 'k', 'л' => 'l', 'м' => 'm', 'н' => 'n', 'о' => 'o', \
    'п' => 'p', 'р' => 'r', 'с' => 's', 'т' => 't', 'у' => 'u', \
    'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch', 'ш' => 'sh', \
    'щ' => 'shch', 'ы' => 'y', 'э' => 'e', 'ю' => 'u', 'я' => 'ya', \
    'й' => 'i', 'ъ' => '', 'ь' => ''}

    identifier = ''

    cyrillic_string.downcase.each_char do |char|
      identifier += ru[char] ? ru[char] : char
    end

    identifier.gsub!(/[^a-z0-9_]+/, '_'); # remaining non-alphanumeric => hyphen
    identifier.gsub(/^[-_]*|[-_]*$/, ''); # remove hyphens/underscores and numbers at beginning and hyphens/underscores at end
end

I didn't want to add a dependency, just wanted a simple thing in a script, so I did this:我不想添加依赖项,只想在脚本中做一个简单的事情,所以我这样做了:

transmap = [["кс", "x"], ["Кс", "X"], ["а", "a"], ["А", "A"], ["б", "b"], ["Б", "B"], ["в", "v"], ["В", "V"], ["г", "g"], ["Г", "G"], ["д", "d"], ["Д", "D"], ["е", "e"], ["Е", "E"], ["ё", "yo"], ["Ё", "Yo"], ["ё", "jo"], ["Ё", "Jo"], ["ё", "ö"], ["Ё", "Ö"], ["ж", "zh"], ["Ж", "Zh"], ["з", "z"], ["З", "Z"], ["и", "i"], ["И", "I"], ["й", "j"], ["Й", "J"], ["к", "k"], ["К", "K"], ["л", "l"], ["Л", "L"], ["м", "m"], ["М", "M"], ["н", "n"], ["Н", "N"], ["о", "o"], ["О", "O"], ["п", "p"], ["П", "P"], ["р", "r"], ["Р", "R"], ["с", "s"], ["С", "S"], ["т", "t"], ["Т", "T"], ["у", "u"], ["У", "U"], ["ф", "f"], ["Ф", "F"], ["х", "h"], ["Х", "H"], ["ц", "ts"], ["Ц", "Ts"], ["ч", "ch"], ["Ч", "Ch"], ["ш", "sh"], ["Ш", "Sh"], ["в", "w"], ["В", "W"], ["щ", "shch"], ["Щ", "Shch"], ["щ", "sch"], ["Щ", "Sch"], ["ъ", "#"], ["Ъ", "#"], ["ы", "y"], ["Ы", "Y"], ["ь", ""], ["Ь", ""], ["э", "je"], ["Э", "Je"], ["э", "ä"], ["Э", "Ä"], ["ю", "yu"], ["Ю", "Yu"], ["ю", "ju"], ["Ю", "Ju"], ["ю", "ü"], ["Ю", "Ü"], ["я", "ya"], ["Я", "Ya"], ["я", "ja"], ["Я", "Ja"], ["я", "q"], ["Я", "Q"]]
translit = ->(string) { transmap.inject(string) { |s, (k, v)| s.gsub(k, v) } }

translit.call("Пoo")  # "Poo"

Note that Translit maps the same Cyrillic to multiple Latin strings, eg "я" to "q" and "ja" and "ya" – so this code (like Translit) will just pick one of those, of course.请注意,Translit 将相同的西里尔字母映射到多个拉丁字符串,例如“я”到“q”“ja”“ya”——所以这个代码(如 Translit)当然只会选择其中之一。

That's it, but details below.就是这样,但下面的细节。


I generated transmap from https://github.com/tjbladez/translit/blob/master/lib/translit.rb with this snippet:我用这个片段从https://github.com/tjbladez/translit/blob/master/lib/translit.rb生成了transmap

transmap = translit_map.flat_map { |k, (up, down)| [ [ down, k ], [ up, k.capitalize ] ] }.sort_by { |k, _| -k.length }

It needs to be sorted longest-first so it does кс => x before the one-letter transliterations.它需要按最长优先排序,因此它在单字母音译之前执行 кс => x。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM