简体   繁体   English

如何在Mac的RStudio上获取汉字中的有效编码输出?

[英]How to get the valid encoding output in the Chinese characters on RStudio in Mac?

We are cleansing some marketing data in traditional Chinese.我们正在清理一些繁体中文的营销数据。 We found R can read UTF-8 traditional Chinese variable names without any problem.我们发现 R 可以毫无问题地读取 UTF-8 繁体中文变量名。 However, we can not get valid UTF-8 output there.但是,我们无法在那里获得有效的 UTF-8 输出。 For example,例如,

If we command: unique(rframe$性別)如果我们命令: unique(rframe$性別)

This is what we got: [1] "\女" "\男"这就是我们得到的: [1] "\女" "\男"

In which 性別 is "gender," \女 means female (女), and \男 means male (男).其中性别为“性别”,\女表示女性(女),\男表示男性(男)。

The most interesting thing is R on the Linux platform generates the valid UTF-8 Chinese output if we use the same UTF-8 CSV file.最有趣的是,如果我们使用相同的 UTF-8 CSV 文件,Linux 平台上的 R 会生成有效的 UTF-8 中文输出。 Why does the same RStudio, which can generate Chinese output encoding in UTF-8 on the Linux platform successfully, cannot output valid UTF-8 Chinese output on the Mac system?为什么同样的RStudio,在Linux平台上可以成功生成UTF-8的中文输出编码,在Mac系统上却无法输出有效的UTF-8中文输出?

This very troublesome issue has been there for a long while.这个很麻烦的问题已经存在很久了。 In fact, in the older RStudio version, we could get valid UTF-8 output.事实上,在较旧的 RStudio 版本中,我们可以获得有效的 UTF-8 输出。 Can any friend help us?有朋友可以帮帮我们吗?

Much obliged.多谢。

Chandler钱德勒

The error may be in the import of the data.错误可能出在数据的导入中。 How did you import your data?你是如何导入数据的?

I tried by importing some data with Chinese characters and using specifically encoding="UTF-8" and I don't have any issues.我尝试通过导入一些带有中文字符的数据并使用专门的 encoding="UTF-8" 来尝试,我没有任何问题。

So my first suggestion is to try this:所以我的第一个建议是试试这个:

data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE)

An additional approach could be to specify your variables as characters.另一种方法是将变量指定为字符。 According to following answer .根据以下答案 So you get the Chinese character instead of the unicode.所以你得到的是汉字而不是unicode。

as.character(unique(rframe$性別))

If you provide an excerpt from the data, I can check and possibly confirm this.如果您提供数据的摘录,我可以检查并可能确认这一点。

After a few trials and errors, we found this issue probably coming from the process of generating the R application on Mac.经过几次尝试和错误,我们发现这个问题可能来自在 Mac 上生成 R 应用程序的过程。

We downloaded R from Git and compile an application, thru the Apple clang version 12.0.0 (clang-1200.0.32.29, Target: x86_64-apple-darwin19.6.0), from source code.我们从 Git 下载了 R 并编译了一个应用程序,通过 Apple clang 版本 12.0.0(clang-1200.0.32.29,目标:x86_64-apple-darwin19.6.0),从源代码。 It works fine.它工作正常。 Our troublesome issue does not emerge again.我们的麻烦问题不再出现。 We reported to R society our findings today.我们今天向 R 社会报告了我们的发现。 We hope people can see a quick response soon.我们希望人们能尽快看到快速响应。

The following message is the report we sent to R.以下消息是我们发送给 R 的报告。

To: Bug-Report-Request bug-report-request@r-project.org致:Bug-Report-Request bug-report-request@r-project.org

Hi,你好,

I am more of a system programmer that helps my friend (Chandler) use R to process Data.我更像是一个系统程序员,帮助我的朋友 (Chandler) 使用 R 来处理数据。 He has quite some trouble getting Chinese / Unicode output on the terminal.他在终端上获取中文/Unicode 输出时遇到了一些麻烦。 However, that only happens on Mac.但是,这只发生在 Mac 上。 I can't reproduce it on Linux.我无法在 Linux 上重现它。

I think something that might be wrong on R - Mac version.我认为 R - Mac 版本可能有问题。 I re-compile R with the source code from GitHub, and I can't reproduce this issue.我用 GitHub 的源代码重新编译了 R,我无法重现这个问题。 With the one download from the website, it can be reproduced, with failed rate 100%.网站下载一次即可重现,失败率100%。

The details live in https://www.facebook.com/groups/RnRStudio/permalink/4555694011125386/详细信息位于 https://www.facebook.com/groups/RnRStudio/permalink/4555694011125386/

I think that's because the toolchain to compile R / MAC could be out of date.我认为这是因为编译 R/MAC 的工具链可能已经过时。

If you can create a bug on Bugzilla and enable me to comment there, I won't need a Bugzilla account.如果您可以在 Bugzilla 上创建错误并允许我在那里发表评论,我将不需要 Bugzilla 帐户。 Or if any of you can sponsor on this issue, that's even better.或者,如果你们中的任何人可以在这个问题上提供赞助,那就更好了。

Or I'll need a Bugzilla account.或者我需要一个 Bugzilla 帐户。

Thank you!谢谢!

This issue comes from a bug in R, version 4.0.4, source code.此问题来自 R 版本 4.0.4 源代码中的错误。 The UTF-8 code could not be displayed validly on both Windows and Mac. UTF-8 代码无法在 Windows 和 Mac 上有效显示。 It is fixed on version 4.0.5.它已在 4.0.5 版中修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM