简体   繁体   English

如何从字符串中获取单个 Unicode 字符

[英]How to get a single Unicode character from string

I wonder how I can I get a Unicode character from a string.我想知道如何从字符串中获取 Unicode 字符。 For example, if the string is "你好", how can I get the first character "你"?例如,如果字符串是“你好”,我怎样才能得到第一个字符“你”?

From another place I get one way:从另一个地方我得到一种方法:

var str = "你好"
runes := []rune(str)
fmt.Println(string(runes[0]))

It does work.它确实有效。 But I still have some questions:但我还有一些疑问:

  1. Is there another way to do it?还有另一种方法吗?

  2. Why in Go does str[0] not get a Unicode character from a string, but it gets byte data?为什么在 Go 中str[0]不是从字符串中获取 Unicode 字符,而是获取字节数据?

First, you may want to read https://blog.golang.org/strings It will answer part of your questions.首先,您可能需要阅读https://blog.golang.org/strings它将回答您的部分问题。

A string in Go can contains arbitrary bytes. Go 中的字符串可以包含任意字节。 When you write str[i], the result is a byte, and the index is always a number of bytes.当你写 str[i] 时,结果是一个字节,索引总是字节数。

Most of the time, strings are encoded in UTF-8 though.大多数情况下,字符串都是用 UTF-8 编码的。 You have multiple ways to deal with UTF-8 encoding in a string.您有多种方法可以处理字符串中的 UTF-8 编码。

For instance, you can use the for...range statement to iterate on a string rune by rune.例如,您可以使用 for...range 语句逐个迭代字符串 rune。

var first rune
for _,c := range str {
    first = c
    break
}
// first now contains the first rune of the string

You can also leverage the unicode/utf8 package.您还可以利用 unicode/utf8 包。 For instance:例如:

r, size := utf8.DecodeRuneInString(str)
// r contains the first rune of the string
// size is the size of the rune in bytes

If the string is encoded in UTF-8, there is no direct way to access the nth rune of the string, because the size of the runes (in bytes) is not constant.如果字符串以 UTF-8 编码,则无法直接访问字符串的第 n 个符文,因为符文的大小(以字节为单位)不是恒定的。 If you need this feature, you can easily write your own helper function to do it (with for...range, or with the unicode/utf8 package).如果您需要此功能,您可以轻松编写自己的辅助函数来实现(使用 for...range,或使用 unicode/utf8 包)。

If you want the first rune as string you can do如果你想要第一个符文作为string你可以做

func firstChar(str string) string {
    return strings.SplitN(str, "",2)[0]
}

But if you want it as rune the @DidierSpezia solution is the best但是如果你想要它作为rune ,@DidierSpezia 解决方案是最好的

func firstRune(str string) (r rune) {
  for _, r = range str {
      return
  }
  return
}

You can check it in the go playground .您可以在go playground 中查看。

you can do this:你可以这样做:

func main() {
  str := "cat"
  var s rune
  for i, c := range str {
    if i == 2 {
      s = c
    }
  }
}

s is now equal to a s 现在等于 a

You can use the utf8string package:您可以使用utf8string包:

package main
import "golang.org/x/exp/utf8string"

func main() {
   s := utf8string.NewString("ÄÅàâäåçèéêëìîïü")
   // example 1
   r := s.At(1)
   println(r == 'Å')
   // example 2
   t := s.Slice(1, 3)
   println(t == "Åà")
}

https://pkg.go.dev/golang.org/x/exp/utf8string https://pkg.go.dev/golang.org/x/exp/utf8string

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从字符串中提取单个Unicode字符 - Extract single unicode character from string 如何从 COBOL 中的字符串中获取单个字符? - How to get a single character from a string in COBOL? 如何从字符串变量中打印 unicode 个字符? - How to print unicode character from a string variable? 如何在python中将unicode字符表示形式从字符串转换为unicode? - How to convert a unicode character representation from string to unicode in python? 如何在Swift中获取字符/字符串的Unicode代码点表示形式? - How to get unicode code point(s) representation of character/string in Swift? 如何从字符串中删除单个字符 - How to remove single character from a String 从十六进制字符串表示形式创建原始Unicode字符/输入单个反斜杠 - Create raw unicode character from hex string representation/enter single backslash 从字符串中获取Unicode字符并将其解码 - Take a Unicode character from within a string and decode it Unicode字符串中每个字符的二进制数据如何与下一个字符的二进制数据分开? - How is the binary data of each character in a unicode string separated from the binary data of the next character? 将Unicode字符串转换为Unicode字符,Python 3 - Unicode string to Unicode character, Python 3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM