[英]Importing .csv files with “special” characters
I'm trying to read a .csv file into R. The .csv file was created in Excel, and it contains "long" dashes, which are the result of Excel "auto-correcting" the sequence space-dash-space. 我正在尝试将.csv文件读入R。该.csv文件是在Excel中创建的,它包含“长”破折号,这是Excel“自动更正”序列空间-破折号-空格的结果。 Sample entries that contain these "long" dashes:
包含这些“长”破折号的示例条目:
US – California – LA
美国–加利福尼亚–洛杉矶
US – Washington – Seattle美国–华盛顿–西雅图
I've experimented with different encoding, including the following three options: 我尝试了不同的编码,包括以下三个选项:
x <- read.csv(filename, encoding="windows-1252") # Motivated by http://www.perlmonks.org/?node_id=551123
x <- read.csv(filename, encoding="latin1")
x <- read.csv(filename, encoding="UFT-8")
But, the long dashes either show up as (first and second option) or as <U+0096>
(third option). 但是,长破折号显示为 (第一个和第二个选项)或
<U+0096>
(第三个选项)。
I realize that I can store the file in different formats or use different software ( Excel to CSV with UTF8 encoding ) but that's not the point. 我意识到我可以用不同的格式存储文件或使用不同的软件( 使用UTF8编码的Excel到CSV ),但这不是重点。
Has anyone figured out what encoding option in R works in such cases? 有谁知道在这种情况下R中的哪种编码选项有效?
If you are using RStudio, use Import Dataset. 如果您使用的是RStudio,请使用“导入数据集”。
when your document is loaded you can simply remove the columns that now show as '?' 加载文档后,您只需删除现在显示为“?”的列即可 You can see this is column 2 and column 4. If you have a dataframe, mydf, then you would delete the second column like this.
您可以看到这是第2列和第4列。如果您有数据框mydf,则可以像这样删除第二列。
mydf_new<-mydf[-2]
You could do the same thing for the other column, which is now column 3. 您可以对另一列(即现在的第3列)执行相同的操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.