[英]Convert unicode codepoint to string character in Ruby
I have these values from a unicode database but I'm not sure how to translate them into the human readable form.我有来自 unicode 数据库的这些值,但我不确定如何将它们转换为人类可读的形式。 What are these even called?这些甚至叫什么?
Here they are:他们来了:
U+2B71F
U+2A52D
U+2A68F
U+2A690
U+2B72F
U+2B4F7
U+2B72B
How can I convert these to there readable symbols?如何将这些转换为可读符号?
How about:怎么样:
# Using pack
puts ["2B71F".hex].pack("U")
# Using chr
puts (0x2B71F).chr(Encoding::UTF_8)
In Ruby 1.9+ you can also do:在 Ruby 1.9+ 中,您还可以执行以下操作:
puts "\u{2B71F}"
Ie the \\u{}\u003c/code> escape sequence can be used to decode Unicode codepoints.
即
\\u{}\u003c/code>转义序列可用于解码 Unicode 代码点。
The unicode symbols like U+2B71F
are referred to as a codepoint
.像的unicode符号U+2B71F
被称为codepoint
。
The unicode system defines a unique codepoint
for each character in a multitude of world languages, scientific symbols, currencies etc. This character set is steadily growing. unicode 系统为多种世界语言、科学符号、货币等中的每个字符定义了一个唯一的codepoint
。这个字符集正在稳步增长。
For example, U+221E
is infinity.例如, U+221E
是无穷大。
The codepoints
are hexadecimal numbers. codepoints
是十六进制数。 There is always exactly one number defined per character.每个字符总是定义一个数字。
There are many ways to arrange this in memory.有很多方法可以在内存中安排它。 This is known as an encoding
of which the common ones are UTF-8
and UTF-16
.这被称为一种encoding
,其中常见的是UTF-8
和UTF-16
。 The conversion to and fro is well defined.来回转换是明确定义的。
Here you are most probably looking for converting the unicode codepoint
to UTF-8
characters.在这里,您很可能正在寻找将 unicode codepoint
转换为UTF-8
字符的方法。
codepoint = "U+2B71F"
You need to extract the hex part coming after U+
and get only 2B71F
.您需要提取U+
之后的十六进制部分并仅获得2B71F
。 This will be the first group capture.这将是第一组捕获。 See this .看到这个。
codepoint.to_s =~ /U\+([0-9a-fA-F]{4,5}|10[0-9a-fA-F]{4})$/
And you're UTF-8 character will be:而你的 UTF-8 字符将是:
utf_8_character = [$1.hex].pack("U")
References:参考:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.