Golang中的字符串转换和Unicode

Question

I am reading Go Essentials : 我正在阅读Go Essentials ：

String in Go is an immutable sequence of bytes (8-bit byte values) This is different than languages like Python, C#, Java or Swift where strings are Unicode. Go中的字符串是字节的不可变序列（8位字节值），这与Python，C＃，Java或Swift等语言（其中字符串为Unicode）不同。

I am playing around with following code: 我在玩以下代码：

s := "日本語"
b :=[]byte{0xe6, 0x97, 0xa5, 0xe6, 0x9c, 0xac, 0xe8, 0xaa, 0x9e}
fmt.Println(string(b) == s) // true

for i, runeChar := range b {
    fmt.Printf("byte position %d: %#U\n", i, runeChar)
}

//byte position 0: U+00E6 'æ'
//byte position 1: U+0097
//byte position 2: U+00A5 '¥'
//byte position 3: U+00E6 'æ'
//byte position 4: U+009C
//byte position 5: U+00AC '¬'
//byte position 6: U+00E8 'è'
//byte position 7: U+00AA 'ª'
//byte position 8: U+009E

for i, runeChar := range string(b) {
    fmt.Printf("byte position %d: %#U\n", i, runeChar)
}

//byte position 0: U+65E5 '日'
//byte position 3: U+672C '本'
//byte position 6: U+8A9E '語'

Questions: 问题：

From where does Golang get Unicode for encoding byte array when custing to string? 当从字符串捕获到字符串时，Golang从何处获得Unicode编码字节数组？ How does rune form? rune如何形成？ Does Golang compilator get Unicode from text file encoding during compilation? Golang编译器在编译期间是否从文本文件编码中获取Unicode？
What are advantages and disadvantages of implementing String like a byte array, instead of utf-16 chars array like in Java? 以字节数组而不是Java中的utf-16 chars数组实现String的优缺点是什么？

Answer 1

You are quoting from a weak, unreliable source: Go Essentials: Strings . 您引用的是一个不可靠的可靠资源： Go Essentials：Strings 。 Amongst other things, there is no mention of Unicode codepoints or UTF-8 encoding. 除其他外，没有提及Unicode代码点或UTF-8编码。

For example, 例如，

package main

import "fmt"

func main() {
    s := "日本語"
    fmt.Printf("Glyph:             %q\n", s)
    fmt.Printf("UTF-8:             [% x]\n", []byte(s))
    fmt.Printf("Unicode codepoint: %U\n", []rune(s))
}

Playground: https://play.golang.org/p/iaYd80Ocitg 游乐场： https : //play.golang.org/p/iaYd80Ocitg

Output: 输出：

Glyph:             "日本語"
UTF-8:             [e6 97 a5 e6 9c ac e8 aa 9e]
Unicode codepoint: [U+65E5 U+672C U+8A9E]

References: 参考文献：

The Go Blog: Strings, bytes, runes and characters in Go Go博客：Go中的字符串，字节，符文和字符

The Go Programming Language Specification Go编程语言规范

Unicode FAQ: UTF-8, UTF-16, UTF-32 & BOM Unicode常见问题解答：UTF-8，UTF-16，UTF-32和BOM

The Unicode Consortium Unicode联盟

Golang中的字符串转换和Unicode

问题描述

1 个解决方案

解决方案1
5 已采纳 2018-06-15 15:40:52

Golang中的字符串转换和Unicode

问题描述

1 个解决方案

解决方案1 5 已采纳 2018-06-15 15:40:52

解决方案1
5 已采纳 2018-06-15 15:40:52