简体   繁体   English

如何在 R 中输出汉字(汉字/汉字/汉字)?

[英]How can I output Chinese characters (hanzi/kanji/hanja) in R?

How can I output Chinese characters (hanzi/kanji/hanja) in R?如何在 R 中输出汉字(汉字/汉字/汉字)? Unexpectedly, they are being escaped into their Unicode codepoint:出乎意料的是,它们被转义到了它们的 Unicode 代码点中:

> "中文"
[1] "\u4e2d\u6587"
> print("中文")
[1] "\u4e2d\u6587"

This is the case both in a Terminal R session as well as in RStudio.在终端 R 会话和 RStudio 中都是这种情况。

Desired output would be:期望的输出是:

> "中文"
[1] "中文"

What settings do I need to change to get this output?我需要更改哪些设置才能获得此输出?

Most other posts with similar problems seem to resolve this by changing the locale to a UTF-8 one, but I am already using one: 大多数其他有类似问题的帖子似乎通过将语言环境更改为 UTF-8 来解决这个问题,但我已经在使用一个:

> Sys.getlocale()
[1] "de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8"

This example with other Unicode examples works as expected: 此示例与其他 Unicode 示例按预期工作:

> "\U0001f600\U0001f601\U0001f602\U0001f603\U0001f604\U0001f605\U0001f606\U0001f607\U0001f608\U0001f609\U0001f60a\U0001f60b\U0001f60c\U0001f60d\U0001f60e\U0001f60f\U0001f610\U0001f611\U0001f612\U0001f613\U0001f614\U0001f615\U0001f616\U0001f617\U0001f618\U0001f619\U0001f61a\U0001f61b\U0001f61c\U0001f61d\U0001f61e\U0001f61f\U0001f620\U0001f621\U0001f622\U0001f623\U0001f624\U0001f625\U0001f626\U0001f627\U0001f628\U0001f629\U0001f62a\U0001f62b\U0001f62c\U0001f62d\U0001f62e\U0001f62f\U0001f630\U0001f631\U0001f632\U0001f633\U0001f634\U0001f635\U0001f636\U0001f637\U0001f638\U0001f639\U0001f63a\U0001f63b\U0001f63c\U0001f63d\U0001f63e\U0001f63f\U0001f640\U0001f641\U0001f642\U0001f643\U0001f644\U0001f645\U0001f646\U0001f647\U0001f648\U0001f649\U0001f64a\U0001f64b\U0001f64c\U0001f64d\U0001f64e\U0001f64f"
[1] "😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡😢😣😤😥😦😧😨😩😪😫😬😭😮😯😰😱😲😳😴😵😶😷😸😹😺😻😼😽😾😿🙀🙁🙂🙃🙄🙅🙆🙇🙈🙉🙊🙋🙌🙍🙎🙏"
> "😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡😢😣😤😥😦😧😨😩😪😫😬😭😮😯😰😱😲😳😴😵😶😷😸😹😺😻😼😽😾😿🙀🙁🙂🙃🙄🙅🙆🙇🙈🙉🙊🙋🙌🙍🙎🙏"
[1] "😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡😢😣😤😥😦😧😨😩😪😫😬😭😮😯😰😱😲😳😴😵😶😷😸😹😺😻😼😽😾😿🙀🙁🙂🙃🙄🙅🙆🙇🙈🙉🙊🙋🙌🙍🙎🙏"
> print("ひらがな") # Japanese hiragana
[1] "ひらがな"
> print("한글") # Korean
[1] "한글"

The problem strangely enough seems to only apply to Chinese characters (of course also in Japanese, print("源氏物語") naturally does not work).奇怪的是,这个问题似乎只适用于汉字(当然也适用于日语, print("源氏物語")自然不起作用)。 Other packages are apparently capable of outputting the correct characters:其他包显然能够输出正确的字符:

> string_zh <- c("中", "文")
> string_zh
[1] "\u4e2d" "\u6587"
> tibble::tibble(string_zh)
# A tibble: 2 x 1
  string_zh
  <chr>
1 中
2 文

The following also works:以下也有效:

> utf8::utf8_print("中文")
[1] "中文"
> cat("中文")
中文

Here is what I am running:这是我正在运行的内容:

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.4 utf8_1.2.1

显然,这是R 4.0.4一个错误(参见错误报告),应该在下一个版本中修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM