简体   繁体   English

如何在R中读取一种以上语言的.csv文件?

[英]How to read a .csv file with more than one language in R?

I want to read a.csv file in R.我想读取 R 中的 a.csv 文件。

However, the.csv file contains words (specific letters) from many languages.但是,.csv 文件包含来自多种语言的单词(特定字母)。 For example, german, polish, and greek.例如,德语、波兰语和希腊语。

Some rows of the.csv file are the following: .csv 文件的一些行如下:

1 Zürich
2 Östra Mellansverige
3 Åland
4 Stredné Slovensko
5 Małopolskie
6 Ελλάδα

The first 4 rows are readable using:前 4 行可使用以下命令读取:

Sys.setlocale(category = "LC_ALL", locale = "german")

The 5th row is readable using:第 5 行可使用以下命令读取:

Sys.setlocale(category = "LC_ALL", locale = "polish")

However, the last row is not readable using:但是,最后一行不可读

Sys.setlocale(category = "LC_ALL", locale = "greek")

How can I read all the 6 rows containing all the necessary languages?我怎样才能阅读包含所有必要语言的所有 6 行?

Noting that I use the following in order to read the.csv file:请注意,我使用以下内容来读取 .csv 文件:

read.csv("file_name.csv",header=TRUE,sep=";",na.strings = "",encoding="UTF-8")

The language of the text does not matter when it comes to loading the data using read.csv .使用read.csv加载数据时,文本的语言无关紧要。 There is no semantics or grammar here, just the encoding of the individual characters.这里没有语义或语法,只有单个字符的编码。 UTF-8 contains characters and symbols from almost all languages around the world, so you can always use it if the file was written with UTF-8 encoding in the first place. UTF-8 包含来自世界上几乎所有语言的字符和符号,因此如果文件最初是用 UTF-8 编码编写的,您可以随时使用它。 Function Sys.setlocale is mostly for formatting decimal points vs commas or to set time zones. Function Sys.setlocale主要用于格式化小数点与逗号或设置时区。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM