[英]How to get the valid encoding output in the Chinese characters on RStudio in Mac?
We are cleansing some marketing data in traditional Chinese.我们正在清理一些繁体中文的营销数据。 We found R can read UTF-8 traditional Chinese variable names without any problem.
我们发现 R 可以毫无问题地读取 UTF-8 繁体中文变量名。 However, we can not get valid UTF-8 output there.
但是,我们无法在那里获得有效的 UTF-8 输出。 For example,
例如,
If we command: unique(rframe$性別)
如果我们命令:
unique(rframe$性別)
This is what we got: [1] "\女" "\男"
这就是我们得到的:
[1] "\女" "\男"
In which 性別 is "gender," \女 means female (女), and \男 means male (男).其中性别为“性别”,\女表示女性(女),\男表示男性(男)。
The most interesting thing is R on the Linux platform generates the valid UTF-8 Chinese output if we use the same UTF-8 CSV file.最有趣的是,如果我们使用相同的 UTF-8 CSV 文件,Linux 平台上的 R 会生成有效的 UTF-8 中文输出。 Why does the same RStudio, which can generate Chinese output encoding in UTF-8 on the Linux platform successfully, cannot output valid UTF-8 Chinese output on the Mac system?
为什么同样的RStudio,在Linux平台上可以成功生成UTF-8的中文输出编码,在Mac系统上却无法输出有效的UTF-8中文输出?
This very troublesome issue has been there for a long while.这个很麻烦的问题已经存在很久了。 In fact, in the older RStudio version, we could get valid UTF-8 output.
事实上,在较旧的 RStudio 版本中,我们可以获得有效的 UTF-8 输出。 Can any friend help us?
有朋友可以帮帮我们吗?
Much obliged.多谢。
Chandler钱德勒
The error may be in the import of the data.错误可能出在数据的导入中。 How did you import your data?
你是如何导入数据的?
I tried by importing some data with Chinese characters and using specifically encoding="UTF-8" and I don't have any issues.我尝试通过导入一些带有中文字符的数据并使用专门的 encoding="UTF-8" 来尝试,我没有任何问题。
So my first suggestion is to try this:所以我的第一个建议是试试这个:
data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE)
An additional approach could be to specify your variables as characters.另一种方法是将变量指定为字符。 According to following answer .
根据以下答案。 So you get the Chinese character instead of the unicode.
所以你得到的是汉字而不是unicode。
as.character(unique(rframe$性別))
If you provide an excerpt from the data, I can check and possibly confirm this.如果您提供数据的摘录,我可以检查并可能确认这一点。
After a few trials and errors, we found this issue probably coming from the process of generating the R application on Mac.经过几次尝试和错误,我们发现这个问题可能来自在 Mac 上生成 R 应用程序的过程。
We downloaded R from Git and compile an application, thru the Apple clang version 12.0.0 (clang-1200.0.32.29, Target: x86_64-apple-darwin19.6.0), from source code.我们从 Git 下载了 R 并编译了一个应用程序,通过 Apple clang 版本 12.0.0(clang-1200.0.32.29,目标:x86_64-apple-darwin19.6.0),从源代码。 It works fine.
它工作正常。 Our troublesome issue does not emerge again.
我们的麻烦问题不再出现。 We reported to R society our findings today.
我们今天向 R 社会报告了我们的发现。 We hope people can see a quick response soon.
我们希望人们能尽快看到快速响应。
To: Bug-Report-Request bug-report-request@r-project.org致:Bug-Report-Request bug-report-request@r-project.org
Hi,你好,
I am more of a system programmer that helps my friend (Chandler) use R to process Data.我更像是一个系统程序员,帮助我的朋友 (Chandler) 使用 R 来处理数据。 He has quite some trouble getting Chinese / Unicode output on the terminal.
他在终端上获取中文/Unicode 输出时遇到了一些麻烦。 However, that only happens on Mac.
但是,这只发生在 Mac 上。 I can't reproduce it on Linux.
我无法在 Linux 上重现它。
I think something that might be wrong on R - Mac version.我认为 R - Mac 版本可能有问题。 I re-compile R with the source code from GitHub, and I can't reproduce this issue.
我用 GitHub 的源代码重新编译了 R,我无法重现这个问题。 With the one download from the website, it can be reproduced, with failed rate 100%.
网站下载一次即可重现,失败率100%。
The details live in https://www.facebook.com/groups/RnRStudio/permalink/4555694011125386/详细信息位于 https://www.facebook.com/groups/RnRStudio/permalink/4555694011125386/
I think that's because the toolchain to compile R / MAC could be out of date.我认为这是因为编译 R/MAC 的工具链可能已经过时。
If you can create a bug on Bugzilla and enable me to comment there, I won't need a Bugzilla account.如果您可以在 Bugzilla 上创建错误并允许我在那里发表评论,我将不需要 Bugzilla 帐户。 Or if any of you can sponsor on this issue, that's even better.
或者,如果你们中的任何人可以在这个问题上提供赞助,那就更好了。
Or I'll need a Bugzilla account.或者我需要一个 Bugzilla 帐户。
Thank you!谢谢!
This issue comes from a bug in R, version 4.0.4, source code.此问题来自 R 版本 4.0.4 源代码中的错误。 The UTF-8 code could not be displayed validly on both Windows and Mac.
UTF-8 代码无法在 Windows 和 Mac 上有效显示。 It is fixed on version 4.0.5.
它已在 4.0.5 版中修复。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.